Building of acoustic models using KALDI

In this document, we describe building of acoustic models using the KALDI toolkit and the provided scripts. These acoustic models can be used with the Kaldi decoders and especially with the Python wrapper of LatgenFasterDecoder which is integrated with Alex.

We build a different acoustic model for a each language and acoustic condition pair – LANG_RCOND. At this time, we provide two sets of scripts for building English and Czech acoustic models using the VOIP data.

In general, the scripts can be described for the language and acoustic condition LANG_RCOND as follows:


  • Requires KALDI installation and Linux environment. (Tested on Ubuntu 10.04, 12.04 and 12.10.) Note: We recommend Kaldi fork Pykaldi, because you will need it also for integrated Kaldi decoder to Alex.
  • Recipes deployed with the Kaldi toolkit are located at $KALDI_ROOT/egs/name_of_recipe/s[1-5]/. This recipe requires to set up $KALDI_ROOT variable so it can use Kaldi binaries and scripts from $KALDI_ROOT/egs/wsj/s5/.


  • The recommended settings are stored at e.g
  • We recommend to adjust the settings in file` e.g. See below. Do not commit this file to the git repository!
  • Our scripts prepare the data to the expected format to $WORK directory.
  • Experiment files are stored to $EXP directory.
  • The symbolic links to $KALDI_ROOT/wsj/s5/utils and $KALDI_ROOT/wsj/s5/steps are automatically created.
  • The files, are necessary to utils and steps scripts. Do not relocate them!
  • Language model (LM) is either built from the training data using SRILM or specified in

Example of

# uses every utterance for the recipe every_N=10 is nice for debugging
export EVERY_N=1
# path to built Kaldi library and scripts
export KALDI_ROOT=/net/projects/vystadial/lib/kronos/pykaldi/kaldi

export DATA_ROOT=/net/projects/vystadial/data/asr/cs/voip/
export LM_paths="build0 $DATA_ROOT/arpa_bigram"
export LM_names="build0 vystadialbigram"

export CUDA_VISIBLE_DEVICES=0  # only card 0 (Tesla on Kronos) will be used for DNN training

Running experiments

Before running the experiments, check that:

# build openfst
pushd kaldi/tools
make openfst_tgt
# download ATLAS headers
pushd kaldi/tools
make atlas
# generate Kaldi makefile ```` and compile Kaldi
pushd kaldi/src
make && make test
  • you have SRILM compiled. (This is needed for building a language model) unless you supply your own LM in the ARPA format.)
pushd kaldi/tools
# download the srilm.tgz archive from
  • the train_LANG_RCOND script will see the Kaldi scripts and binaries. Check for example that $KALDI_ROOT/egs/wsj/s5/utils/ is valid path.
  • in, you switched to run the training on a SGE[*] grid if required (disabled by default) and njobs is less than number of your CPU cores.

Start the recipe by running bash

[*]Sun Grid Engine

Extracting the results and trained models

The main script, bash, performs not only training of the acoustic models, but also decoding. The acoustic models are evaluated during running the scripts and evaluation reports are printed to the standard output.

The local/ exp command extracts the results from the $EXP directory. It is invoked at the end of the script.

If you want to use the trained acoustic model outside the prepared script, you need to build the HCLG decoding graph yourself. (See for general introduction to the FST framework in Kaldi.) The HCLG.fst decoding graph is created by utils/ See for details.

Credits and license

The scripts were based on Voxforge KALDI recipe . The original scripts as well as theses scripts are licensed under APACHE 2.0 license.