Phoneme Level Pronunciation Assessment for a Major Language Learning Platform

Client

Our client is a leading language learning platform based in North America that caters to a global audience. The platform focuses on enhancing language proficiency, with a special emphasis on improving pronunciation skills. The client envisions a user-friendly interface where individuals can see a word on the screen, record their pronunciation, and receive a detailed performance assessment.

Challenges

Out-of-the-box pronunciation scoring APIs from GCP, Azure, and AWS could not provide the level of detail we wanted to have while displaying pronunciation scoring. We were able to achieve this by building our own model using an open-source framework.

Approach

  • Kaldi- an open-source Automatic Speech Recognition (ASR) toolkit is used that contains various recipes for training customized acoustic models 
  • Labeled Audio Data (recordings and spoken words) and Pronunciation Lexicon (words and corresponding sequences of phonemes) are collected from Librispeech dataset of English language audio files
  • MFCC (Mel-frequency Cepstral Coefficients), CMVN (Cepstral mean and variance normalization), and i-vectors audio features are extracted
  • The model is trained using TDNN (Time Delay Neural Network) on an NVIDIA A10 GPU instance for cost-effectiveness, parallel execution, and speed.
  • The Deep Neural Network (DNN) framework is built using the GOP_Speechocean recipe from Kaldi, the acoustic model, and the language model.
  • Trained acoustic model performs phoneme detection on new audio data outputting the most likely sequence of phonemes.
  • The model is evaluated against Word Error Rate (WER)
  • ‘compute-gop’ script calculates GOP, flags mispronunciations below a threshold, and displays results with antilog conversion for human readability.
flowchart for phoneme detection and GOP calculation

Impact

  • Increased user engagement by 12% as the solution enriched the user experience.
  • 8% surge in user retention with a more engaging and effective language learning platform. 
  • 10% rise in user referrals and testimonials, driving positive word-of-mouth recommendations.