Success Stories:

Global Tech Firm Expands into New Markets with Enhanced Speech System

The Situation

You might not be surprised to learn that most speech recognition systems are designed with adult speakers in mind. To date, the nuances and idiosyncrasies of children’s speech have rarely been built into speech-driven applications for children’s use, rendering them unable to successfully process interactions with a younger audience.

For one leading multinational technology company, this was the precise situation which needed to be addressed. The business had discovered that its speech recognition system, originally trained with adult speech data, had not taken into account all of the differences in how children speak, making it ineffective for use in applications designed for children.

Children typically speak with higher-pitch formant frequencies, and greater temporal and spectral variability. Irregularities, hesitations, and mispronunciations abound (for example “uh”, “um” and “fwoggy” instead of “froggy”).

The Solution

The company addressed the shortfall by building a new automatic speech recognition (ASR) system for North American English, designed from the ground up to specifically cater for children’s applications.

The tech firm approached Appen for help with the product based on Appen’s global industry reputation for expertise in languages, transcription and speech recognition systems. The client team asked first for guidance on the new project, and then for help with collecting and transcribing the ideal range of children’s speech data across a range of demographics. The ASR’s primary purpose was for use with educational technology applications. Appen provided help and guidance via its team of highly skilled linguists, which developed scripts for the target education-related speech needs. This included an appropriate range of numbers, key words, short phrases, and short educational sentences.

In its entirety, the project scope covered:

  • Recruiting and working with 400 child speakers
  • Targeting a cross section of required demographics: 50% Caucasian, 40% African American, 10% Latino
  • Data collection and transcription
  • Engaging native speakers of US English with a range of regional dialects including: Northeast, Midwest, South, and West


Working with Appen allowed the multinational technology company to meet its objectives for an ASR that specifically caters to children’s speech— within its desired time frame and on budget.

Appen successfully managed the collection and transcription of 105 hours of audio—totalling 60,000 utterances—which helped the client design, build and deliver the ASR it needed to take to market.

The company has since been able to take the acoustic models built into its new ASR and apply it to a range of North American English edutainment platforms and apps specifically designed for children.

Key Success Factors

Appen is known globally in the industry for its deep expertise in speech related projects. As such, a major contributing factor to the success of this success of this project was Appen’s key recommendation regarding which age groups to focus on. The client had originally identified that data collection should focus on 4-to-9-year olds to best meet its needs in the edutainment space. However, Appen linguists recommended that focusing on two age groups—4-to-7 and 8-to-14-year-olds—along with other demographic requirements, would ensure optimal coverage, which proved to be the case.

Another factor was Appen’s ability to leverage its broad partner network, which features access to a global crowd of over 1 million people. This meant Appen was able to recruit a large number of participants for the project on relatively short notice.

Additionally, Appen was able to bring on board an accompanying “family and friends” network including schools and church groups to help recruit interested parents who were happy to consent to their child’s participation in the project. This meant that parents were comfortable with Appen’s respectful and communicative process of recruiting minors for data collection purposes, helping the project to achieve a more successful and seamless end result.

Lastly, Appen demonstrated its experience in working with children for transcription purposes, which helped ensure an easier outcome within the desired time frame. Recording children, especially 4-to-9-year-olds, can be a tricky prospect. By deploying supervisors used to working with children, utilizing images in conjunction with text, and keeping recording sessions short but productive, Appen ensured a successful delivery for its global tech client.

A leading multinational technology company teamed up with Appen to help develop an automatic speech recognition (ASR) system designed from the ground up to specifically cater to children’s applications.


Virtual Assistant

Find out how Appen

can help you meet your goals