Data Quality


We have a robust range of data quality control measures to make sure you receive high-quality data.



At Appen we understand that the importance of the quality of your AI training data.  Whether you are designing jobs in our data annotation platform yourself or working with our managed services team, we deliver highly accurate training data for every use case leveraging industry-leading quality controls.


Image

Appen Data Annotation Platform



Our Appen Data Annotation Platform (ADAP) offers several tools to help you monitor the quality of your labeled data and ensure exceptional data quality at every step.

Job Design

Using our data annotation platform, users can build and test jobs to deliver high quality training data using:

Image

Smart Validators

Choose from numerous validators powered by machine learning that ensure contributors provide inputs as desired. Contributors get notified if an input is not within expected thresholds thus improving data quality by standardizing data types. 
Image

Workflows

Break down large, complex projects into a series of simple jobs and control which rows proceed by configuring routing rules such as routing by confidence, specific answers, or random samples. 




Image

Test Questions

Using ADAP, users have the ability to test contributors with test questions before entering a job and while jobs are in progress to ensure their ability to correctly identify and label per task. Our framework utilizes pre-answered rows of your data to qualify high-performing contributors, remove under-performing ones, and continually train contributors to improve their understanding of the task.




Image

Contributor Levels & Targeting



With our crowd residing in the same ecosystem, we can apply consistent data quality controls across the entire annotation pipeline. Some measures include:


Image

Contributor Targeting



Assess the distribution of data in your training dataset against key attributes like demographics, gender, location, etc. You have the option to determine areas of abnormal distribution and augment datasets accordingly to balance classes and reduce bias.

Custom Channels - Increase data quality by creating custom channels targeting specific contributors who've proven they understand the job and can perform successfully based on previously submitted work. Automatically group contributors into a custom channel based on their trust in previous jobs, or manually by providing the IDs of contributors you trust.


Image

Contributor Levels



ADAP allows you to target contributors based on their performance and skill level. We keep an audit trail of every contributor and bucket them into three levels based on performance and experience on the platform. Level 1 can be used to optimize throughput, while Level 3 ensures only our most experienced and highest performers will work on your task.





Monitor, Review & Rework



Ensure high-quality results by keeping a watchful eye on your data annotation pipeline. Having easy access to monitor your data pipeline allows you to catch inconsistencies early so your project runs smoothly and you receive high-quality annotated data.


Monitoring Dashboard

Actively monitor running jobs for anomalies that can slow annotations down. Leverage the job monitoring tool to quickly surface anomalies in test questions, answer distributions, throughput, completion percent, or job costs.

Review

Users can send data of a job from one set of contributors to another to perform additional review and correction, ensuring data from an open-ended task is both relevant and correct. This is especially useful to ensure data quality in jobs that don't traditionally work with test questions.

Image

Auditing

Understand how the aggregate annotations came to be to achieve high-quality results.

In-Platform Audit

Visualize and review the results of a job within the platform to determine whether contributors understand the instructions sufficiently, identify problem areas, and improve instructions and job design to achieve high-quality results.

Image


Appen Data Annotation Platform Quality Controls

Download PDF

Managed Services Quality Controls

Download PDF


Image

Managed Service Quality Controls



With our white glove Managed Services, we’ll manage the day-to-day data annotation and/or data collection process for you to deliver high-quality training data. Let our expert project managers handle your projects using our top quality data annotation platform. We control the quality of your data through two main levers, Expertise and Crowd.



Expertise


We monitor quality at every stage of the data annotation and collection process. Our team has decades of experience working on all sorts of data annotation and collection projects, delivering custom quality solutions to meet your specific data quality needs. They will be able to quickly identify if data quality is compromised and work to fix this straight away. 

Here is just a short sample of some of the ways we measure, monitor, and control quality pre-production, during, and post production:

Image Image Image


Pre-Production Support & In-Production Monitoring
  • Qualification involves Onboarding Quizzes to assess understanding of guidelines
  • Curated set of jobs (Golden Sets) that assesses all raters equally and typically involve double review vetting to ensure accuracy
  • Rapid Evaluator Feedback (REF) are randomly interspersed with live data and on submission, the rater receives immediate feedback

Post-Production Analysis & Learning Methods 
  • Disagreement Sets analysis of group and individual disagreement rates to identify outliers and trends
  • Disagreement Severity (Off-By Score) allows for analysis of disagreement severity i.e., calculation of the interval between the original and correct answer
  • Rating Distribution analysis by both individuals and groups to identify any patterns that fall outside the group distribution




Crowd: Measuring and Managing Contributor Quality


Our customers have access to a curated crowd of 1M+ contributors in over 170 countries working in more than 235 languages/dialects. 

With Appen Managed Services you have the advantage of strategically selecting contributors to manage bias and data quality. We use AI to match crowd workers with tasks better suited to their skills. AI is also used to assist with their annotations to increase quality and throughput, and to improve their experience. Every contributor goes through a qualification process that best sets them up for success and maximizes their availability and skillsets.  This process includes practice exams and reviewing instruction guidelines. We onboard contributors at scale, helping you ramp up project productivity in no time. 





Secure Data Access


Data security requirements are met for customers working with personally identifiable information (PII), protected health information (PHI), and other sophisticated compliance needs.

We have enterprise level security options to suit your sensitive data needs,


Image
Image
Image
Image

Secure Crowd


We offer a suite of secure service offerings with flexible options to ensure data security via secure facilities, secure remote workers, and onsite services to meet specifi­c business needs.

We have enterprise level security options to suit your sensitive data needs,


Image
Image
Image
Image

Deployment Options


Private cloud deployment 
That can be hosted on your specific cloud environment.

On-premises deployment
That can be deployed in your particular network either air-gapped or non-air-gapped.

We have enterprise level security options to suit your sensitive data needs,


Image
Image
Image
Image

SAML-based Single Sign-on


SSO which gives members access to the data partner platform through an identity provider (IDP) of your choice.

We have enterprise level security options to suit your sensitive data needs,


Image
Image
Image
Image