Live Webinar - Optimize LLM performance through Human-AI Collaboration

The Hunt for Human Speech Data

Published on
January 11, 2017
Author
Authors
Share

With voice activated devices launching weekly, one might think that we’re reaching a tipping point in the use of speech recognition technologies. However, a recent Bloomberg article argues that while speech recognition has made great strides in recent years, the approach taken to speech data collection has prevented the technology from reaching a point where it would replace how most consumers currently interact with their devices. Consumers have embraced the concept of voice activated devices with enthusiasm, but the actual experience has room for improvement. What is holding the technology back?

More data = better performance

According to the authors, what is needed to improve devices’ abilities to better understand and communicate with users is terabytes of human speech data representing multiple languages, accents and dialects to deepen the conversational understanding capabilities of the devices.Recent advances in speech engines are the result of a form of artificial intelligence called neural networks which learn and change over time without precise programming. Loosely modeled after the human brain, these software systems can train themselves to make sense of the human world, performing better with increased amounts of data. Andrew Ng, Baidu’s chief scientist says, “The more data we shove in our systems the better it performs. This is why speech is such a capital-intensive exercise; not a lot of organizations have this much data.”Tech giants including Amazon, Apple, Baidu and Microsoft are now racing to collect natural language data across the globe to improve accuracy. As Adam Coates from Baidu’s AI lab in Sunnyvale, CA states, “Our goal is to push the error rate down to 1 percent. That’s where you can really trust the device to understand what you’re saying, and that will be transformative.”How can these firms scale their data collection in a cost-effective way while ensuring the human speech data accurately captures the nuance of human language?

It’s about quantity AND quality

While the quantity of data is important, the quality is also critical to optimize machine learning algorithms. ‘Quality’ in this context includes how well the data fits the use case. For example, if a speech recognition engine is being developed for use in a car, the data needs to be collected in a car for best results, taking into account all of the typical background noises that the engine would ‘hear’.While it’s tempting to use ‘off the shelf’ data and to try to collect the data using ad-hoc methods, it’s more effective in the long run to collect data specifically for its end-use.This principle also applies when building global speech recognition products. Human language data is nuanced, accented and full of cultural bias. Data collection must be undertaken in a multitude of languages, geographies, locations and accents to reduce error rates and improve performance.

Partner with Appen

At Appen, we continue to play a key role in the evolution of natural language processing and conversational understanding, having spent the last 20 years working with the top global technology companies to build speech and natural language interfaces for the leading virtual assistants on the market today. We have years of experience working in language data collection in a wide range of environments, from in-studio to outdoors, using a variety of modalities.Contact us here to discuss your specific data collection needs and how we can help you meet your goals.

Related posts

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Insights from the International Conference on Acoustics, Speech, and Signal Processing

Appen recently sponsored the IEEE International Conference on Acoustics, Speech, and Signal Processing (iCASSP) in Brighton. Our VP of Business Development in Europe, Dorota
Read more

Appen Becomes Leading Language Service Provider; Maintains Leading Position in APAC

Appen is excited to announce our official ranking as one of the largest language service providers (LSPs) in the global translation and interpreting industry. Issued May 2019
Read more

What is Text Annotation in Machine Learning?

Everything You Need to Know About Text Annotation with Yao Xu. Every day, we interact with different media (such as text, audio, images, and video), relying on our brain to
Read more

How Off-the-Shelf Training Datasets Can Save Your ML Teams Time and Money

New Off-the-Shelf Datasets from Appen. Creating a high-quality dataset with the right degree of accuracy for training machine learning (ML) algorithms can be a difficult
Read more
Dec 11, 2023