Rudder Admin

a computer screen showing audio waveform and audio fetures

A Deep Dive into Phoneme-Level Pronunciation Assessment

A Deep Dive into Phoneme-Level Pronunciation Assessment

In the rapidly evolving digital education domain, our team at Rudder Analytics embarked on a pioneering project. We aimed to enhance language learning through cutting-edge AI and machine learning technologies. Partnering with a premier language learning platform, we sought to address a significant challenge in the field: providing detailed and actionable feedback on pronunciation at the phoneme level, a critical aspect of mastering any language. This case study delves into the sophisticated technical landscape we navigated to develop an advanced phoneme-level pronunciation assessment tool, showcasing our data analytics, engineering, and machine learning expertise.

Navigating the Challenge: Beyond Conventional Solutions

The initial challenge was the limitations of out-of-the-box pronunciation scoring APIs provided by major cloud services like GCP, Azure, and AWS. These services, while robust, fell short of delivering the granular level of detail required for effective pronunciation assessment. To overcome this problem, the decision was made to construct a bespoke model that could meet the specific needs of the platform. 

Our objective was clear: to architect a solution that transcends these limitations, enabling a more personalized and impactful learning experience.

Holistic Approach: Integrating Advanced Algorithms with Linguistic Insights

Our strategy was anchored in a holistic approach, merging advanced machine learning techniques with deep linguistic insights to achieve higher accuracy in pronunciation assessment.

If you are interested in the codebase, check out our GitHub repository

Goodness of Pronunciation (GOP)

A cornerstone of our approach was the implementation of the Goodness of Pronunciation (GOP) metric. GOP, a posterior probability variant, is a quantitative measure of pronunciation accuracy at the phoneme level. It’s an important tool for identifying mispronunciations, enabling targeted feedback for language learners. GOP is used to evaluate the system’s performance in recognizing and scoring the pronunciation of a given utterance.

The Strategic Employment of Kaldi ASR Toolkit

Kaldi, an open-source ASR framework, stands at the core of our solution. Renowned for its flexibility and efficiency in handling speech recognition tasks, Kaldi offers a range of recipes for tailoring acoustic models to specific needs. Our choice to utilize Kaldi was driven by its comprehensive feature set and its ability to be customized for phoneme level detection, a critical requirement for our project.

flowchart of acoustic model training

Data Collection and Preparation

The foundation of our solution was a robust data infrastructure, engineered to handle vast datasets. We utilized the Librispeech dataset, a comprehensive collection of English language audio files. It contains over 1000 hours of speech, recorded by 2,484 speakers, and has been designed to be representative of the different accents and dialects of spoken English. These recordings were made using high-quality microphones and a sound-treated recording environment to ensure high-quality audio.

This dataset contains labeled audio data. We also collected the pronunciation lexicon which included words and their corresponding sequences of phonemes, essential for training our model to recognize and evaluate the smallest sound units in speech.

Major Components

In Kaldi, when computing Goodness of Pronunciation (GOP), the acoustic model, pronunciation lexicon, and language model each play distinct roles in evaluating how well a speaker’s utterance matches the expected pronunciation of words in a given language.

There are 3 main parts of the GOP Speechocean recipe from Kaldi. 

Acoustic Model: The acoustic model is trained to recognize the various sounds (phonemes) that make up speech. It maps the raw audio features to phonetic units. In the context of GOP, the acoustic model evaluates how closely the sounds in the speaker’s utterance match the expected phonemes of the correct pronunciation. The model’s confidence in phoneme predictions plays a key role in calculating the GOP score.

Pronunciation Lexicon: The pronunciation lexicon provides the expected phonetic transcriptions of words. It is a reference for how words should be pronounced in terms of phonemes. When calculating GOP, the system uses the lexicon to determine the target pronunciation of words or phrases being evaluated. The comparison between this target pronunciation and the spoken pronunciation (as interpreted by the acoustic model) is fundamental to assessing pronunciation quality.

prepare_lang.sh script is used to prepare lexicon and language-specific data files. It includes creating a lexicon.txt file that contains word-to-phone mapping (eg. Hello -> HH EH L OW)

Language Model: While the language model is primarily used to predict the likelihood of word sequences in speech recognition, its role in GOP can be indirect but important. It can help disambiguate phonetically similar words or provide context that makes certain pronunciations more likely than others, thus influencing the assessment of pronunciation quality. The language model can also ensure that the phoneme sequences being evaluated are within plausible linguistic constructs, which can affect the interpretation of pronunciation accuracy.

Training Process

Preparation of Resources

We gathered a phonetically rich and transcribed speech corpus. Then, set up a pronunciation dictionary (lexicon), language models, and necessary configuration files.

Feature Extraction

Extracted acoustic features from the speech corpus. Commonly used features include MFCCs (Mel-Frequency Cepstral Coefficients) or FBANK (Filterbank Energies).

Training Acoustic Models

We then used the extracted features and transcriptions to train acoustic models. The models learn the relationship between the acoustic features and the phonetic units or words.

Training starts with building a simple model:

Monophone Models: These models recognize phonemes without considering context (neighboring phonemes). They are simpler and less accurate but provide a good starting point. Kaldi’s train_mono.sh script is used to perform the monophone training.

Triphone Models: These models consider the context of phonemes (typically the immediate previous and next phonemes). They are more complex and capture more details about speech patterns. Kaldi’s train_deltas.sh script is used to perform the triphone training.

Refinement: Once triphone models are trained, Kaldi’s train_SAT.sh script is used to refine the model to handle different speakers. SAT stands for Speaker Adaptive Training.

Alignment

Performed forced alignment using the trained acoustic models to align the phonetic transcription with the acoustic features. This step is crucial for GOP as it determines how well the predicted phonemes match the phonemes spoken in the audio.

Kaldi provides align_si.sh script just for this purpose!

The script uses the transcriptions and a lexicon (which maps words to their phonetic representations) to compile training graphs. These graphs represent how words (and their phonetic components) can transition during speech according to the language model.

The script performs the alignment task using the training graphs, the existing acoustic model, and the normalized features. This involves determining the most likely sequence of states (which correspond to phonemes or groups of phonemes) that the acoustic model believes were spoken in each training utterance.

GOP Calculation

The Goodness of Pronunciation score is calculated based on the likelihoods produced by the acoustic model during alignment. GOP is a log-likelihood ratio for each phoneme, normalized by the phoneme duration. It indicates how well the phoneme matches the expected model of that phoneme.

GOP is calculated using the compute-gop script of Kaldi. The steps include:

Compute Posteriors: It first computes the posterior probabilities of different phoneme sequences given the acoustic model and the observed features.

Calculate Log-Likelihoods: The script computes the log-likelihoods for each phoneme occurring at each time frame.

Evaluate Pronunciation: GOP is calculated by comparing the log-likelihood of the most likely phoneme sequence (as per the hypothesis) to alternative phoneme sequences. To avoid bias toward longer phonemes, the score is normalized by the duration of the phoneme.

Pronunciation Profiling

The GOP scores can be used to profile the speaker’s pronunciation. Low scores indicate areas where the speaker’s pronunciation deviates from the expected model.

Model Refinement

Based on the GOP scores, we identified the need for additional training data in areas where the pronunciation model is weak. Additional training and refinement of models may occur iteratively.

Application of GOP

Once the system is well-calibrated, GOP scores can be applied in various ways, such as in language learning applications to provide feedback on pronunciation, in speech recognition systems to improve robustness, or in speaker assessment and training tools.

If you are interested in the codebase, check out our GitHub repository

Model Evaluation

A critical phase of our implementation process was rigorous system testing and evaluation. We assessed the model’s performance using Word Error Rate (WER), a common metric in speech recognition that helped us understand how often the model incorrectly predicted phonemes. WER is a critical metric in the evaluation of Automatic Speech Recognition (ASR) systems, serving as a quantifiable measure of transcription accuracy. It is calculated by comparing the ASR system’s output against a reference transcription, taking into account the number of substitutions, deletions, and insertions needed to match the system’s output to the reference.

flowchart for phoneme detection and GOP calculation

Measurable Impact: Enhancing User Experience and Engagement

The deployment of this phoneme-level pronunciation assessment tool has had a profound impact on the platform’s user engagement metrics. We observed a 12% increase in user engagement, a testament to the enriched learning experience provided by our solution. Furthermore, the platform saw an 8% rise in user retention, indicating that users found the tool engaging and effective in improving their skills. Perhaps most telling was the 10% increase in user referrals and testimonials, a clear indicator of the tool’s impact on users’ language learning journeys and its contribution to positive word-of-mouth for the platform.

Conclusion

Our comprehensive approach to enhancing phoneme detection in language learning platforms has set a new standard in pronunciation training. We have crafted a system that improves pronunciation accuracy and enriches the language learning experience by utilizing advanced technological solutions like the Kaldi ASR toolkit. This project exemplifies our commitment to harnessing advanced technology in addressing educational challenges, contributing significantly to the advancement of language learning methodologies.

Elevate your projects with our expertise in cutting-edge technology and innovation. Whether it’s advancing language learning tools or pioneering in new tech frontiers, our team is ready to collaborate and drive success. Join us in shaping the future—explore our services, and let’s create something remarkable together. Connect with us today and take the first step towards transforming your ideas into reality.


5 Reasons Why Data Analytics is Important for Your Business

5 Reasons Why Data Analytics is Important for Your Business

01 Improve Customer Insights

Identify customer’s key problem areas to identify profitable segments and improve customer service through your products and services. 

Gatorade was able to achieve 15% growth in a single year through customer insights. (Source: Prophet- Gatorade Case Study)

02 Reduce Costs

Set up efficient processes, ensure optimum resource utilization and introduce flexibility to address market trends.

Carlsberg discovered that giving customers magnetic cards and allowing them to self-pour their beer; resulted in 30% boost in beer consumption and reduced costs as customers moved from fixed price to fixed quantity model. (Source: SAP HANA Blog-Using Big Data to Brew Profits One Pint at a Time) 

03 Save Valuable Time

Utilize the ability to collect, structure and analyze the data from every aspect of your business at a rapid pace. 

Wal-Mart was able to boost revenues through sales of strawberry pop-tarts and beer before hurricane Frances with the help of predictive analytics. (Source: NYT- What Wal-Mart Knows About Customers’ Habits)

04 Propel Decision Making

Access high-quality insights in order to adjust to real-time scenarios and make simultaneous data driven decisions.

According to OKCupid founder, people of the United States hit it off but lose track of their romantic connections mostly in Wal-Mart stores! Now that’s some decision to make! (Source: Dataclysm: Who We Are When We Think No One’s Looking; and Datafloq- 4 Surprising Discoveries from Big Data Insights)

05 Collaboration on-the-go

Access business intelligence insights from the device of your choice whenever you want and wherever you are. Enable your team to gain competitive advantage through data.

TotallyMoney is taking calculated risks and delivering value to their customers before their competition by building a data-driven culture! (Source: TotallyMoney Blog- Building a Data-Driven Culture)


Rudder Analytics Receives a Clutch Leader Award for 2019!

Since 2015, Rudder Analytics has been working hard to steer businesses in the right direction through end to end data analytics services including Data ETL, Machine Learning, Business Intelligence and Data Visualization services.

According to a Forbes article, a CMO Survey concluded that companies are projected to “increase the marketing analytics portion of their marketing budgets by 60% in the next three years.” Marketing analytics insights are what keeps our gears turning day in and day out. Rudder Analytics will be right there alongside the boom, which makes us even more excited to announce that this month, we have received a Clutch Leader Award as one of the leading marketing analytics companies on their site!

Clutch is a B2B ratings and reviews site that works to connect businesses together worldwide. Their research centers around a company’s market presence, work quality, and client experience. Clutch’s team of analysts interview prior clients and publish the feedback on the company’s Clutch profile to help companies gain credibility and success in the eyes of others.

We take this opportunity to thank our clients for their continued trust in our ability to deliver and for their support throughout these years of excellence!

Furthermore, we have received recognition on Clutch’s sister websites, The Manifest and Visual Objects. The former is a business news and how-to website that compiles and analyzes practical business wisdom and the latter, a platform for companies to showcase their portfolios to potential clients. We are thrilled to be listed in their big data companies and developers directories.


Rudderite featured on Microsoft Power BI Data Stories Gallery

Congratulations to our data-storyteller Ketan Deshpande for getting featured on Microsoft Power BI Data Stories Gallery and earning much-deserved appreciation from the Microsoft Power BI community.

These entries were judged by the Microsoft Data Journalism Team, and the featured stories were selected for telling a compelling story, being original and creative, and effectively using Microsoft Power BI.


Rudder Analytics ranked highly in Top Indian IT Services

In this age of data, all business decisions are backed by analyzing data scientifically and Rudder Analytics has been helping businesses in all aspects of analytics.

It has been one and a half year since we started operations and we have already received rave reviews for our quality services. Rudder Analytics was recently ranked highly in the press release of IT Services review by Clutch.co.

Clutch, an independent ratings and reviews site based in Washington, D.C. that covers top companies in IT services, awarded us at Rudder Analytics earlier this month a distinction as being one of the top companies in its research and thus a place in its India IT Services Leaders Matrix.

Our focus on niche categories of IT services from our basket of visual analytics with Tableau, Qliksense, Power BI etc., data analytics and statistical modeling services, along with strong praise from past clients earned us the compliment. For example, consider the all 5-star rating from one of our clients and their report that: “Rudder Analytics was able to provide that extra kick.”

We are really pleased to receive any feedback from our clients but especially happy to see a very satisfied customer – it means we have done our job and done it well. Thank you to everyone who has participated in the review process. Please take a minute to see our Rudder Analytics coverage in full.


Rudder Analytics got featured on CNBC as one of the leading IT services firms in India

Rudder Analytics has been featured on CNBC as one of the leading IT services firms in India and this is what the analysts had to say about us:

“These companies are the finest examples of outsourced IT services firms in India based on our current book of research,” said Clutch analyst Clayton Kenerson. “All of these companies should be proud of their clients’ reviews – the proof of their professionalism and technical prowess that lends them hard-earned distinction in an otherwise crowded market.”

http://www.cnbc.com/2016/09/08/pr-newswire-clutch-recognizes-leading-it-services-firms-in-india.html


Will QuickSight Kill Tableau?

Over the past years we have seen an exponential increase in the amount of data being generated. On an average, nearly 2.5 quintillion bytes of data was being generated every day in 2012. Extracting actionable insights from such prodigious amount of data can be a nearly impossible task at times.

Fig 1: Gartner: Magic Quadrant for Cloud Infrastructure as a Service, Worldwide report, Lydia Leong et al, published 18 May, 2015

Amazon Web Services has been one of the pioneers in catering to the ever-changing data needs of enterprises with its highly scalable and pay-as-you-use services.

AWS senior vice-president, Andy Jassy recently unveiled QuickSight, a brand new service from AWS in Business Intelligence domain, during an annual re-invent conference held for developers. Some of its salient features are

  • Extremely fast, cloud-powered, BI service for 1/10th the cost of traditional BI software.
  • Fast Calculation with in-built SPICE (Super-fast, Parallel, In-memory Calculation Engine)
  • Easily scalable with thousands of customers and terabytes of data.
  • Provides SQL-like interface for other BI Tools to access data stored in SPICE.

Fig 2: Working module of Amazon QuickSight (aws.amazon.com/quicksight/)

Most of the organizations using traditional BI solutions (SAP Business Objects/Crystal Reports, IBM Cognos, Oracle BI etc.) invest in substantial resources to get their first visualization. Agile BI solutions (Tableau, Qliksense, Sisense, Domo etc.) have their own constraints in terms of processing and customization capability, which limit their potential use.

Bearing in mind all these aspects, we need to ask, “Can Amazon QuickSight really kill the old guard BI services?” Let us take a look at some of the many crucial characteristics that we can consider to compare Amazon QuickSight with the market leader in BI services i.e., Tableau.

QUICKSIGHT v/s TABLEAU

Price: 

Amazon QuickSight has come up with an extremely competitive pricing structure (90% less as compared to other BI products). Its standard edition is priced at $9 per month per user and the enterprise edition is priced at as low as $18 per month per user (one year contractual basis).

Fig 3: Pricing Structure for Amazon QuickSight (aws.amazon.com/quicksight/)

Tableau on the other hand has its cloud-based service (Tableau Online) starting at $500 per user per year. This gives QuickSight a huge edge over other competitors, especially while catering to small to medium enterprises, which are sensitive on budget.

Data Processing:

One of the major features of Amazon QuickSight is SPICE- a Super-fast, Parallel, In-memory Calculation Engine. Based on the technique of columnar storage coupled with in-memory technologies, SPICE helps in running queries at a lightning rate producing results in a few milliseconds with following salient features:

  • 2 to 4x compression columnar data
  • Compiled queries with machine code generation
  • Rich calculations and SQL-like syntax

For the data to be analyzed using QuickSight it is mandatory for the data to be in SPICE. This enables organizations to scale their data to large volume without any additional overhead.

Tableau’s Data Engine is an in memory analytics, high-performing database on one’s PC. It uses memory mapped I/O, i.e. the data is loaded in disks after it is imported. This results in low usage of RAM which eventually provides the desired performance.

Target Group:

Both Amazon and Tableau, have a significant customer base ranging from small scale organizations to some of the top firms in the world. Nevertheless, AWS has been a promising player in providing customer satisfaction, as a result of which most of the organizations tend to use AWS for storing massive data. It makes sense here to say that Amazon will target its AWS users as potential customers for QuickSight. Many organizations are shifting their base from on-premise to AWS Cloud for several reasons. On the contrary, some enterprises still prefer to store their critical datasets on their local premise for legal and security reasons.

Ease of Use:

Amazon posits QuickSight as a self-discovery tool which does not require people to have in-depth knowledge about data visualization. Tableau on the other hand requires certain level of expertise.

Features:

Finding the right visualization for data is extremely important. QuickSight’s Autograph feature automatically predicts the best visual for your data to be displayed. Likewise, Tableau provides us with a “Show Me” which detects data based on data types, cardinality etc.

Limitations:

QuickSight is most appropriate for data that is stored in AWS’s cloud and also more importantly the data that can be put onboard SPICE. As quoted by Ashley Jaschke (Director, Product Management, Tableau) QuickSight will be suitable for lightweight visualizations. Many companies still keep their data outside of AWS’s cloud. Since, Tableau has a provision for ingesting data from multiple sources it has the ability to provide much deeper and significant insights.

QuickSight is currently available only as a preview version and it would be too early to decide as to whether QuickSight will dominate the BI services market, but it is surely poised to make a heavy impact.

 


Two Way Synchronization between Google Spreadsheet and AWS RDS using Google Apps Script

 

Two Way Synchronization between Google Spreadsheet and AWS RDS using Google Apps Script

Google Spreadsheet can be a very nifty tool to satisfy dynamic data storage needs for any small to medium data analytics projects. Using Google Apps Script, Google Spreadsheet can fetch data from any RESTful API and act as an easily editable data source.

Google Spreadsheet can be directly used as a data source to most of the major analytical dashboard platforms like Tableau as well as operational dashboard platforms like Klipfolio. However, connectivity from Google Spreadsheet to Tableau is not perfectly stable yet and may run into issues on Tableau Server.

One way of getting a more robust method of connectivity from Tableau to Google Spreadsheet could be using an intermediate layer of RDS Database services of AWS. In Google Apps Script, the JDBC service supports, as standard, the Google Cloud SQL, MySQL, Microsoft SQL Server, and Oracle databases.

In this blog, we show you a step-by-step synchronization process of connecting Google Spreadsheet with Amazon’s RDS Web Service. We should bear in mind that Google Apps Script still does not provide connectivity to PostgreSQL DB.

Step 1: 

To demonstrate we have some dummy data on the Spreadsheet(Insights_Summary_Spreadsheet is the name of this Google Spreadsheet) which is extracted using RESTful APIs.

Step 2:  

This data needs to be moved to the database on cloud (AWS RDS MySQL instance). So, the next step involves setting up an RDS MySQL instance on AWS. After having setup the basic RDS MySQL instance, we need to whitelist a few IP addresses to allow Google Apps Script to access your database. Following are the address ranges you’ll need to whitelist:

 

  64.18.0.0 - 64.18.15.255

 64.233.160.0 - 64.233.191.255

 66.102.0.0 - 66.102.15.255

 66.249.80.0 - 66.249.95.255

 72.14.192.0 - 72.14.255.255

 74.125.0.0 - 74.125.255.255

 173.194.0.0 - 173.194.255.255

 207.126.144.0 - 207.126.159.255

 209.85.128.0 - 209.85.255.255

 216.239.32.0 - 216.239.63.255

 

Note: JDBC Service does not connect to port lower than 1025. So make sure that you are not allocating a lower port. 

To add the above IPs to the security group, select the RDS instance and then Edit the Inbound section. Add Rule(IPs) as per requirement. Once this is done, you are ready with your RDS instance to work with Google Apps Script.

Step 3:

In this step we fetch data from Google Spreadsheet into RDS. We need to make a connection to the RDS instance using JDBC Service. In the Google Apps Script project we build the following code.

 

// Replace the variables in this block with real values.

 var address = 'database_IP_address'; //End point provided by the RDS Instance

 var rootPwd = 'root_password'; //Root password given while configuring DB instance

 var user = 'user_name'; //Username given while configuring DB instance

 var userPwd = 'user_password'; //User password given while configuring DB instance

 var db = 'database_name'; //Database name to which you want to connect

 

 var dbUrl  = 'jdbc:mysql://' + address + '/' + db; //Generates the database url to which you can connect to

 var conn = Jdbc.getConnection(dbUrl, user, userPwd);

 

Next, we build the code to select the spreadsheet from where we need to fetch the data. 

 

var ss = SpreadsheetApp.getActiveSpreadsheet(); /* This represents the whole data */

 var sheet = ss.getSheetByName('Insights_Summary_Spreadsheet'); /* Replace with your own spreadsheet name */

 var range = sheet.getRange();   /* This represents the whole data */

 var values = range.getValues();

 

Then using prepare statement we can insert data into RDS.

 

/* Query to insert fields into the table on RDS. Here Insights_Summary_RDS is the table created in RDS. */

var stmt = conn. prepareStatement('INSERT INTO Insights_Summary_RDS(date,clicks,costs,ctr,cpc,impressions,cpm,avg_pos,budget)values(?,?,?,?,?,?,?,?,?)');

stmt.execute();

Step 4:

In this process we fetch data from RDS to Google Spreadsheet if required by our project. After establishing the Google Spreadsheet to RDS connection as shown previously, we build the following code:

var stmt = conn.createStatement();

var store_results = stmt.executeQuery('SELECT * FROM Insights_Summary_RDS');

 

The data returned from the query is stored in a variable called store_results. Now we need to write the records by using a looping structure until the end of the data in the store_results.

 

while (store_results.next()) {

 var rowString = ' ';

 for (var column = 0; column < numCols; column++) {

     rowString += store_results.getString(col + 1) + '\t';

     }

 }

 store_results.close();

 stmt.close();

 

Eventually a simple Google Apps Script can help us build a great analytical dashboard to get enhanced insights from the data.