Speech to Text Apps

Speech to Text

  • DeepSpeech : simpler although inferior
  • Kaldi : STT supports hybrid NN-HMM and lattice-free MMI models. Kaldi is used by many people both in research and in production.
  • Lingvo is the open source version of Google speech recognition toolkit, with support mostly for end-to-end models.
  • ESPNet is good and well known for end-to-end models as well.
  • RASR + RETURNN are very good as well, both for end-to-end models and hybrid NN-HMM, but they are for non-commercial applications only (or you need a commercial licence) (disclaimer: I work at the university chair which develops these frameworks).
  • http://gkarsay.github.io/parlatype/
  • https://github.com/juanerasmoe/pmTrans
  • https://pythonbasics.org/transcribe-audio/
  • Wav2Letter, the tool by Facebook.
  • snakers4/silero-models at mlnews Silero Speech to Text
  • coqui Coqui STT and TTS
  • [voice2json Command-line tools for speech and intent recognition on Linux](https://voice2json.org/#supported-languages)
  • VOSK Offline Speech Recognition API
  • Dataset
    • English: Tedlium, Librispeech, etc.
    • https://github.com/gooofy/zamia-speech
    • https://commonvoice.mozilla.org/en/datasets
    • https://www.openslr.org/resources.php

Speech to Text Indonesian Support