Optimized Outbound Call Management with Advanced Voice Activity Detection

Client

Our client, a contact center solution provider, wanted to redefine outbound communication strategies with Voice Activity Detection (VAD). Hoping to overcome the limitations of traditional predictive dialers, they intended to employ VAD’s advanced machine-learning capabilities. By precisely classifying calls and optimizing agent productivity, our client aimed to drive efficiency, elevate customer experiences, and lead innovation in the contact center landscape.

Challenges

  • To build a Voice Activity Detection system that detects the presence of voice in the audio input.
  • The system should be adaptable to various environments with changing noise levels, music, background sounds, and speaker characteristics.
  • The VAD system should operate with minimal latency, allowing it to keep pace with dynamic audio streams.

Approach

  • Used SpeechBrain- an open-source speech processing toolkit.
  • Collected and prepared LibriParty, CommonLanguage, Musan, and open-rir datasets.
  • Designed a deep learning model based on DNN architecture using SpeechBrain’s LibriParty recipe.
  • Computed the standard FBANK features to provide a compact representation of the spectral characteristics of the audio signal.
  • Trained the DNN model on an NVIDIA A10 GPU using the training set, optimizing hyperparameters based on validation performance.
  • Binary classification is performed to provide speech/non-speech predictions for each input frame.
  • Evaluated the final model on a separate test set to ensure generalization capabilities.

Impact

  • Achieved an accuracy rate of 97% in identifying live human voice from non-human signals like answering machines, voicemails, or silence. 
  • Successfully lowered the VAD system’s classification response time to 1.1 seconds following a call connection.
  • Achieved a call connection rate of 85%, ensuring calls are consistently connected to a live person.
  • Boosted agent productivity by 33%, enabling them to handle more calls per hour and leading to a 21% improvement in cost-per-call.