Appen’s range of AI projects and diverse global contractor network ensure unbiased AI data for fair and equitable AI projects SAN FRANCISCO — April 29, 2021 — Appen Limited (ASX:APX), the leading provider of high-quality training data for organizations that build effective AI systems at scale, is enabling organizations to launch, update and operate unbiased AI models through a range of projects and partnerships. With support from the company’s global crowd of data annotation specialists that’s more than a million strong, Appen has developed diverse training data sets for AI models, particularly natural language processing (NLP) initiatives to ensure end users receive the same experience, no matter their language variety, dialect, ethnolect, accent, race or gender. AI projects based on biased or incomplete data don’t work for everyone. According to a report published by PNAS in March 2020 (Proceedings of the National Academy of Sciences), popular automated speech recognition (ASR) systems that are used for virtual assistants, closed captioning, hands-free computing and much more, exhibit significant racial disparities in performance. The report concludes that more diverse training datasets are needed to reduce these performance differences and ensure speech recognition technology is inclusive. Language interpretation and natural language processing (NLP) systems suffer from the same challenge and require the same solution. “The quality and diversity of training data directly impacts the performance and bias present in AI models”, said Appen CEO Mark Brayan. “As a data partner, we can supply complete training data for many use cases to ensure AI models work for everyone. It’s critical that we engage a diverse group of individuals to produce, label, and validate the data to ensure the model being trained is not only equitable, but also built responsibly.” Range of Appen Language Projects Appen demonstrates its commitment to creating AI for everyone through a variety of projects and partnerships focused on the diversity of languages and dialects.
- Translators without Borders (TWB) partnership – Appen, in partnership with TWB, Amazon, Carnegie Mellon University, Facebook, Google, John Hopkins University, Microsoft, and Translated joined the Translation Initiative for COVID-19 (TICO-19), which supported the development of language technology to make COVID-19 information available in as many languages as possible, including languages in developing countries like Congolese Swahili, Tigrinya, and Nigerian Fulfulde.
- The Inuktitut translation project – In collaboration with the Government of Nunavut, Microsoft added Inuktitut, an Indigenous language in North America spoken in the Canadian Arctic, to Microsoft Translator, using Appen services.
- The Canadian French translation project – Appen coordinated with native language consultants to help Microsoft add “Canadian French” as a language option in Microsoft Translator.
- African American Vernacular English (AAVE) off-the-shelf datasets – Most existing training datasets used in ASR, search engines, voice assistants and sentiment analysis are not representative of AAVE. To make high-quality AAVE data available, Appen is working with AAVE speakers among its crowd of annotators to collect data for an OTS dataset based on conversations about a broad range of topics.