When planning the development of a chatbot or virtual assistant, it’s important to begin the conversational design process with a clear data strategy in place. Conversational design requires more than just an understanding of the fundamentals of voice user interface design — you also need to understand how to solve some of the key challenges in data preparation.

Working with the companies behind some of the world’s leading virtual assistants, we’ve encountered some common data challenges. To overcome these challenges, it’s important to create an intentional strategy for collecting and structuring conversational data.
Common Data Challenges in Conversational Design
When companies are developing a new virtual assistant or refining the performance of an existing one, there are often gaps in the available data. Because virtual assistants need large volumes of high-quality data to train them to cover all the needed scenarios, getting this spread of data can be difficult. Here are some of the most common challenges we’ve seen when companies are building a virtual assistant:
- No data available to bootstrap a new system — You may not have sufficient call center recordings, website interactions, chatbot logs, or other data to account for all the situations needed.
- Collecting use case-specific data — Each use case will require specific data, which may make deployment difficult across multiple domains.
- Translating call center & website data into chatbot interactions — Mimicking call center or website interactions can be helpful, but only to a given extent. It’s crucial to train your solution on a higher volume of data.
- Lack of language-specific knowledge — Without local language expertise and a large team of native speakers, it can be difficult to capture all the needed data in new languages. If dialogue design is only driven by a single expert, you also run the risk of introducing bias or overlooking key scenarios your solution might need to support.
The Solution: an Intentional Data Collection Strategy
Appen designs data collection scenarios in close cooperation with our customers. It’s important to first determine the specific data needs for the solution (i.e., text, voice, both), and whether annotation is needed (i.e., transcription or tagging). Then, we elicit text commands or utterances from native speakers of the language, as needed for the given scenario. It’s important to refine the scenarios and responses over multiple iterations, to account for all the real-world commands and situations your solution might encounter. It’s also crucial to make sure you’re capturing high-quality data, and managing the quality of data annotation to ensure correct results.

To successfully design interactions, conversational data must be:
- Use-case specific
- Company-specific
- Product-specific
- Representative of the demographic spread of clients (age, gender, dialect, product experience, etc.)
The Benefits of Working with an Experienced Data Collection Provider
The benefits of working with an established provider for data collection & annotation are clear: your team saves time on data collection and annotation, allowing them to focus on your core business. Working with a provider like Appen also gives you access to expertise in over 180 languages and dialects, as well as to rapidly localize data collection and annotation for new markets. These benefits trickle down to your customers, as well: They get a better quality of chatbot or virtual assistant interaction, from day one. This results in higher conversion rates and better customer satisfaction, overall.—Need high-quality conversational data to train your chatbot? Contact us to schedule a call and discuss your data needs





