“I said mountain biking trails, not mountain biker’s entrails!”
We’ve all experienced poor speech recognition when calling businesses. Probably more than once. Having deployed speech solutions at my last company, and now as a contact center consultant, I believe Automatic Speech Recognition (ASR) gets a bad rap. Sure, many ASR challenges are addressable, but there a few inherent challenges that are important to understand in order to set realistic performance expectations. The following are two that are often misunderstood.
Limitation #1: Recognition in a speech application is supported by a grammar file for each collection. While there are a number of other settings related to timing and confidence level, a grammar file is what the speech engine compares to the caller’s utterance in order to determine a match. Typically, problems arise not when the speech recognition engine doesn’t recognize what the caller says, but rather when the caller says something that the grammar file did not contain or could not match to an entry in the file. The system is doing its best to confirm what the caller has spoken based on what the system knows.
Limitation #2: Unlike humans, ASR software always listens. Utterances callers likely don’t intend the system to recognize are often included in what the system hears. An example might be a chuckle or clearing of the throat. Busy callers carrying on multiple conversations while interacting with a speech application often don’t realize that the system is listening to both conversations. Poor caller behavior is very difficult to overcome.
Demystifying ASR limitations helps companies focus on the things they can change to improve performance, while avoiding wasted resources. I invite you to comment below and/or share the limitations you’ve experienced when deploying speech recognition.