Utils for building decoding graph HCLG¶
build_hclg.sh script formats language model (LM) and acoustic model (AM)
into files (e.g. HCLG) formated for Kaldi decoders.
The scripts extracts phone lists and sets from lexicon given the acoustic model (AM), the phonetic decision tree (tree) and the phonetic dictionary(lexicon).
The script silently supposes the same phone lists are generated from lexicon as the these used for training AM. If they are not the same, the script crashes.
The use case. Run the script with trained AM on full phonetic set for given language, pass the script also the tree used for tying the phonetic set and also give the script your LM and corresponding lexicon. The lexicon and the LM should also cover the full phonetic set for given language.
decode_indomain.py script uses
HCLG.fst and the rest of files
build_hclg.sh and performes decoding on prerecorded wav files.
The reference speech transcription and path to the wav files are extracted from collected call logs.
The wav files should be from one domain and the LM used to build
should be from the same domain.
decode_indomain.py also evaluates the decoded transcriptions.
The Word Error Rate (WER), Real Time Factor (RTF) and other minor statistics are collected.
Dependencies of build_hclg.sh¶
The build_hclg.sh script requires the scripts listed belofw from
The “utils scripts transitevely uses scripts from
The dependency is solved in
path.sh script which create corresponding symlinks
and adds Kaldi binaries to your system path.
You just needed to set up
KALDI_ROOT root variable and provide correct arguments.
Try to run
Needed scripts from
utils symlinked directory.
Scripts from the list use Kaldi binaries,
so you need Kaldi compiled on your system.
path.sh adds Kaldi binaries to the
and also creates symlinks to
where the helper scripts are located.
You only need to set up