The CompanyA leading multinational technology company teamed up with us to help develop an automatic speech recognition (ASR) system designed from the ground up to specifically cater to children’s applications.
The ChallengeYou might not be surprised to learn that most speech recognition systems are designed with adult speakers in mind. To date, the nuances and idiosyncrasies of children’s speech have rarely been built into speech-driven applications for children’s use, rendering them unable to successfully process interactions with a younger audience. For one leading multinational technology company, this was the precise situation which needed to be addressed. The business had discovered that its speech recognition system, originally trained with adult speech data, had not taken into account all of the differences in how children speak, making it ineffective for use in applications designed for children. Children typically speak with higher-pitch frequencies, and greater temporal and spectral variability – irregularities, hesitations, and mispronunciations (for example “uh,” “um,” and “fwoggy” instead of “froggy”).
The SolutionThe company addressed the shortfall by building a new automatic speech recognition (ASR) system for North American English, designed from the ground up to specifically cater for children’s applications. The tech firm approached us for help with the product based on our global industry reputation for expertise in languages, transcription and speech recognition systems. The client team asked first for guidance on the new project, and then for help with collecting and transcribing the ideal range of children’s speech data across a range of demographics. The ASR’s primary purpose was for use with educational technology applications. We provided help and guidance via its team of highly skilled linguists, which developed scripts for the target education-related speech needs. This included an appropriate range of numbers, key words, short phrases, and short educational sentences. In its entirety, the project scope covered:
- Recruiting and working with 400 child speakers
- Targeting a cross section of required demographics: 50% Caucasian, 40% African American, 10% Latino
- Data collection and transcription
- Engaging native speakers of US English with a range of regional dialects including: Northeast, Midwest, South, and West