
Common Data Challenges in Conversational Design
When companies are developing a new virtual assistant or refining the performance of an existing one, there are often gaps in the available data. Because virtual assistants need large volumes of high-quality data to train them to cover all the needed scenarios, getting this spread of data can be difficult. Here are some of the most common challenges we’ve seen when companies are building a virtual assistant:- No data available to bootstrap a new system — You may not have sufficient call center recordings, website interactions, chatbot logs, or other data to account for all the situations needed.
- Collecting use case-specific data — Each use case will require specific data, which may make deployment difficult across multiple domains.
- Translating call center & website data into chatbot interactions — Mimicking call center or website interactions can be helpful, but only to a given extent. It’s crucial to train your solution on a higher volume of data.
- Lack of language-specific knowledge — Without local language expertise and a large team of native speakers, it can be difficult to capture all the needed data in new languages. If dialogue design is only driven by a single expert, you also run the risk of introducing bias or overlooking key scenarios your solution might need to support.
The Solution: an Intentional Data Collection Strategy
Appen designs data collection scenarios in close cooperation with our customers. It’s important to first determine the specific data needs for the solution (i.e., text, voice, both), and whether annotation is needed (i.e., transcription or tagging). Then, we elicit text commands or utterances from native speakers of the language, as needed for the given scenario. It’s important to refine the scenarios and responses over multiple iterations, to account for all the real-world commands and situations your solution might encounter. It’s also crucial to make sure you’re capturing high-quality data, and managing the quality of data annotation to ensure correct results.
- Use-case specific
- Company-specific
- Product-specific
- Representative of the demographic spread of clients (age, gender, dialect, product experience, etc.)