Language model and Acoustic Model Speech recognition development for Malayalam using CMU Sphinx


#1

Discussion thread for LM and AM development for Speech recognition of Malayalam uisng CMU Sphinx

Mentor : Mrs. Deepa Gopinath

Link to the current work I have done: ml-lm-am

The workflow I followed for my initial trial of the idea during my Major project:

  1. Use The Datuk corpus as the source for Malayalam corpus (http://olam.in/open/datuk/)
  2. Build a Phonetic dictionary using this corpus
  3. Develop unigram, digram, and trigram Statistical language model using CMU­CSLM
    toolkit.
  4. Training phase and acoustic model development using five different people.
  5. Testing for accuracy, could only achieve roughly 60% due to time limitations for the
    project submission.

Find the complete Proposal over here


GSoC 2016 - Projects
#2

Blog update : Here

:grin:


#3

I will create a repo in gitlab under smc, please fork it and give PR’s regularly.
what should be the name of the repo?


#4

That’s fine.

How about naming it “samsaaram” ?


#5

Blog Updates :yum:

Follow me here for further updates and also you can read my blog updates from the past three weeks.


#6

How about naming it “samsaaram” ?

I think a name that non-malayalees can recognize will be a better choice


#7

Ohh… right!

Well then normal naming like ‘ml-speech’ or something similar. Can’t get anything simple enough in my mind. :smiley:

One more doubt, Is it possible to edit the description?
There is a mistake in the hyperlink I provided for ‘ml-lm-am’ . Its supposed to be this : https://github.com/sreecodeslayer/ml-am-lm-cmusphinx


#8

[Blog Update] (https://medium.com/@imSreenadh/slow-as-a-snail-58c09b46ec32)

Am currently pushing my works to the previously mentioned repository on GitHub :slight_smile: :relaxed:


#9

Yes, it is possible to edit discourse posts. Just find the edit button among the tools on the bottom of the post.


#10

##Status update:
Completed an initial build of Language Model and Acoustic Model.
Need to test accuracy and see if I can improve a bit perhaps.