RepeatAfterMe (RAM) for Czech - speech data collection

This application is useful for bootstraping of speech data. It asks the caller to repeat sentences which are randomly sampled from a set of preselected sentences.

  • The Czech sentences (sentences_es.txt) are from Karel Capek novels Matka and RUR, and the Prague’s Dependency Treebank.
  • The Spanish sentences (sentences_es.txt) are taken from the Internet

If you want to run ram_hub.py on some specific phone number than specify the appropriate extension config:

$ ./ram_hub.py -c ram_hub_LANG.cfg  ../../resources/private/ext-PHONENUMBER.cfg

After collection desired number of calls, use copy_wavs_for_transcription.py to extract the wave files from the call_logs subdirectory for transcription. The files will be copied into into RAM-WAVs directory.

These calls must be transcribed by the Transcriber or some similar software.