Platform Delivery Models

Icon image

Managed Services

End-to-end service that delivers controlled, consistent, high-quality annotated data with speed and at scale.

Icon image

Self Services

Tools to monitor labeled data and ensure quality at every step with Appen Data Annotation Platform (ADAP).

Icon image

Open Crowd

On-demand global pool of contributors available 24/7 for the most specific use cases.

Icon image

Managed Crowd

Curate a proprietary crowd of contributors overseen by expert project managers, from end to end.

Experience-Driven Tech

Technology backed by 25+ years of expertise.

LEARN MORE

1+ Million Crowd Contributors

Get targeted data from a diverse global crowd.

LEARN MORE

Platform Capabilities

Image of caption

Scale Quickly with High-Quality, Customized Data

Whether you are training a model to work for a very particular use case, or are unable to find sufficient open-source data to train your model, we can help provide the high-quality structured or unstructured data you need to kick off your AI project. Choose from our bespoke Data Collection services or browse our extensive catalog of pre-labeled datasets for a variety of common AI use cases.

What You Get

Curate unique datasets with access to our 24/7 on-demand contributors or managed crowd options for data collection for your specific use case.

What You Get

Accelerate AI projects with access to more than 250+ pre-labeled datasets—off the shelf data specific to your needs.

What You Get

Products and expertise that artificially generate hard-to-find data and edge-cases to enhance model coverage and performance.

Image of caption

Boost machine learning capabilities, pattern recognition, and computer vision solutions

Access to a wide range of types, collection methodologies, and delivery models, including images, videos, text, speech, and audio to fulfill unique data collection scenarios.

What You Get

High-quality data that meets your specific requirements.

Multi-platform data collection methods thanks to our mobile app for iOS and Android, as well as our platform Fast and scalable crowd recruitment from a global pool of over 1M people Peace of mind knowing all data is collected ethically and in line with regulatory requirements

What You Get

  • The best natural language processing, understanding, and automatic speech recognition solutions with human-annotated speech data in 235+ languages and dialects
  • Detailed linguistic and cultural research
  • Crowdsourcing of native speakers for moderated or unsupervised recordings
  • Scalable utterance collection through our iOS and Android app
  • Centralized on-site recordings in a wide range of acoustic environments
  • A variety of different speech collection types including telephone, embedded device, single/multi-speaker, prompt variation, and more
  • Quality assurance and project management

What You Get

  • Millions of high-quality text data samples to scale your solutions globally
  • Chatbots, sentiment analysis, and more
  • A partnership with our experts to collect text data specific to domain, language, and locale enabling you to build robust NLP systems and expand into new geographic markets
  • Our text utterance collection tool with smart validators check for language, duplicates, and coherence to make sure only high-quality utterances are captured
  • Access to our machine learning assisted technology with built-in quality controls which increase the speed and quality of collection

Image of caption

Annotation Tools Powered by Machine Learning

Data annotation faster and at scale with machine-learning assisted annotation tools that provide a comprehensive data-labeling solution. Machine learning assistance is built into our industry leading annotation tools to save customers time, effort, and money – delivering high-quality training data and accelerating the ROI on your AI initiatives.

What You Get

  • A linear interpolation and video object tracking model that predicts the position of objects and automatically tracks them, reducing contributor fatigue
  • Video object tracking with speed labeling eliminates the need to annotate every single frame and expedites the annotation process

What You Get

  • Pretrained and trainable image classification models that can help you save time and money by automating data labeling, and only sending low-confidence rows for human labeling
  • Pixel masks that are automatically generated and applied to an image for contributor validation, saving time and effort

What You Get

  • The ability to bring your model’s predictions into the platform easily with your data, and get faster, higher-quality annotations and more precise metrics on your model’s performance for retraining
  • Purpose-built text annotation tools that make it easy to annotate text in detail, allowing your models to be trained to understand text and gain valuable insights

What You Get

  • Machine learning that ensures potential text utterances are validated and of high quality, so you can reduce the collection of unusable utterances by 35%, speeding up your chatbot testing and deployment
  • Smart validators that check for language, duplicates, and coherence to make sure only high-quality utterances are captured

What You Get

  • Audio that’s automatically segmented into different speakers and audio snippets for faster audio annotation
  • Quick, high-quality audio transcripts with acoustic tags in a variety of languages that leverage NLP to improve transcription quality and efficiency
  • All-in-one audio tooling that delivers clear and crisp audio annotations and transcriptions to train your models
  • An audio annotation tool that enables you to automatically segment audio files for easier annotation
  • The ability to timestamp and transcribe quickly and easily allowing for accurate annotations at scale

What You Get

  • Human and machine intelligence that annotates point cloud frames (3D point cloud and RGB images) with Point cloud calibration, cuboid annotation, auto-adjust, and pixel-level annotation
  • 3D sensor tooling that has a robust suite of features and includes machine learning assistance so you can annotate specific data quickly and accurately, building training data for your unique use cases

Image of caption

Robust Testing and Optimization

Introducing dynamic elements to ensure performance more closely reflects real-world deployment environments.

What You Get

  • Global coverage with a crowd of 1M+ contributors
  • A quickly assembled team that covers hundreds of regions with high-quality evaluators to ensure your AI products work in your target markets
  • The go-to provider of human-in-the-loop services for product and technology teams

What You Get

  • Real-world environmental simulations based on very unique use cases and niche conditions that ensure your AI systems are properly tested
  • Years of experience and expertise with global setups
  • Fast and efficient results

What You Get

  • Testing that identifies potential bias issues before you deploy
  • Assurance that your model can account for the different languages, cultural nuances, and diversity that come with servicing global markets

What You Get

Our new Voice Assistant Benchmark (VAB) initiative which is a partnership with top global technology companies for ad hoc TTS voice benchmarking, mean opinion scale (MOS), and MUSHRA ratings. It’s an opportunity to streamline, standardize, and iterate the voice evaluation process, creating a true benchmark and highlighting optimum Voice Assistant standards across devices and brands.

Image of caption

Our product and experts will help turn your data into intelligence

Achieve optimal results when you work with our experts to design taxonomies and ontologies. Knowledge graphs offer more flexible and complex storage than their traditional counterparts, and end-users receive more expansive answers compared to the standard 1:1 coded answer for each question.

What You Get

  • A future-proofed graph that’s optimized for your specific recommendations
  • Appen contributors who categorize products and make connections
  • Federated ontologies that provide access to a whole new world of linked data
  • Insights that power business processes like smart recommendations

What You Get

  • A graphical user interface for easy click and drop ontology creation
  • Built-in connections to a wide range of popular graph databases
  • Visual UI to easily map out ontology structures
  • Powerful built-in annotation tooling to transform unstructured data into knowledge graph format (RDF, SPARQL, etc)
  • A game-changing tool that allows those with domain expertise to create ontologies, without reliance on third-party technical resources

What You Get

  • Consultation expertise to create the underlying ontology for your knowledge graph
  • Appen Ontology Studio, our best-in-class ontology authoring tool which allows you to easily create your own ontology structures
  • Initial data analysis to ensure the best route for ontology authoring

What You Get

  • Experts who will help you to extract usable information from your raw data to populate a knowledge graph and train an information extraction model
  • Unstructured data transformed into knowledge graph formats (RDF, SPARQL, etc) with our global crowd helping to scale these annotation projects into usable info
  • The ability to transform raw, unstructured data, into usable knowledge graph formats in just one step using our simple GUI, without the arduous multi-step process that’s usually required

Data Types

Tab image

Image

Collect, classify, annotate, and/or transcribe images to train the most accurate and inclusive computer vision models. Our image annotation tooling includes polygons, dots, lines, rotating bounding boxes and/or ellipses and pixel level semantic segmentation. Additional object information can be collected in shapes using ontologies for faster, more flexible and more accurate image annotation. This suite of tooling gives everything you need to successfully annotate a wide variety of image types with speed and precision.

Tab image

Video

Collect, classify, transcribe or annotate videos to assist your models to see and interpret the world around them. Our annotation tooling includes special transcription and time stamping tools, object tracking (with additional Speed Labeling capabilities), object detection and time stamping as well as ontology attribute annotation. Our Machine Learning assistance and custom built tools give you the flexibility to easily collect accurate video annotations at scale.

Tab image

Audio

Collect, classify, transcribe or annotate audio data for your NLP projects. Our Audio Annotation tool is twice as fast as traditional annotation tools. Segment audio into layers, speakers and timestamps for your Audio Speech Recognition and other audio models. Generate high-quality audio transcripts rapidly with acoustic tags in a variety of languages, leveraging NLP to improve transcription quality and efficiency. Our all-in-one audio tooling has been purpose built to deliver clear and crisp audio annotations and transcriptions to train your models.

Tab image

Text

Collect, classify and annotate text to enhance your NLP model’s understanding of nuanced human speech. Speed Labeling capabilities include built-in multi-language tokenizers to assist human annotation efforts. Target entity extraction and span labeling with options to bring your model outputs to accelerate contributor annotations. We can also help with text evaluation and post-editing data generated from your NLP models. Our specialized tools and linguistic experts will deliver the high-quality training data you need to build your NLP model for any chosen market.

Tab image

Sensor

Annotate several types of point cloud data including LiDAR, Radar, and other types of scanners/sensors using our intuitive annotation interface. Point cloud frames (3D point cloud and RGB images) can easily be annotated with cuboids to support complex use cases such as autonomous vehicles. Built-in Machine Learning assistance aids and enhances annotation speed and quality. Our purpose-built 3D sensor tooling enables leaders in technology to annotate complex data types accurately at speed and scale.

Tab image

Multi-Modal

Combining multiple datasets together from vendors or annotation jobs can be challenging without the right tools. Our data annotation platform can annotate multiple types of data easily in one place, and our enterprise level Workflows tooling makes combining and automating multi-step annotation jobs a breeze. With our most advanced machine learning-powered data annotation platform, we can deliver high accuracy for your highly complex multi-modal AI projects

Tab image

Hardware And Device Testing

Ensure products are working as desired before any big launch by enlisting the help of our contributors for hardware/ device testing. Our global crowd of contributors are present in over 170 countries so will be ready to support your launch in any geography. With the help of our specialized Project Managers, we can help devise a robust testing and evaluation plan to make sure all usage scenarios have been thoroughly tested with any improvement areas flagged so your product launch will be a success.

Tab image

Mobile Location

High-quality raw mobile location data from 700+ million devices in 200+ countries allows you to perform location analytics and derive actionable business intelligence. Tap into the global data feed or request data customized to a specific region. Our location data is fully compliant with GDPR and CCPA. Stamped with a unique QuadID and intensive in-house quality control, you can ensure that every event we share is authentic and increases the reliability of your mobility analysis.

Case Studies

Image of caption

Data Collection Improves Leading Social Media Platform

As a result of this project, the client released its product on time with the data it required to meet its users’ needs. The firm quickly and efficiently improved its machine learning model with access to a large amount of high-quality data. The geographic and demographic diversity of our rater pool proved immensely valuable to…

Global Tech Firm Expands into New Markets with Enhanced Speech System

We successfully managed the collection and transcription of 105 hours of audio—totalling 60,000 utterances—which helped the client design, build and deliver the ASR it needed to take to market. The company has since been able to take the acoustic models built into its new ASR and apply it to a range of North American English edutainment platforms and apps specifically designed for children. One of our key recommendations for this project was…

Top Automotive OEM Uses Speech Training Data to Power its Connected Car

As an experienced solution provider for the automotive industry, we offer a full-service approach for localization, data collection, in-car testing and validation, and linguistic consulting. Our experienced project managers, who have years of experience working in the automotive industry, work directly with the OEM’s engineering team to …

Wellio Turns Raw Data Sets Into AI Training Data for Nutrition and Cooking

To achieve their goal of helping people cook at home, Erik Andrejko, Wellio’s CTO said that the company “is building a platform that embodies the intersection of culinary and health expertise.” Doing so required building machine learning algorithms that have the same capacity an expert does to make inferences and suggestions …

GuildLink Makes Consumer Medicines Information More Accessible

To create audio CMIs, GuildLink needed to find a partner to convert text to speech quickly and accurately. GuildLink chose to work with us thanks to our reputation for linguistic and technical expertise, specifically providing high-quality speech and language processing services. The text-to-speech conversion process takes place via…

Preserving Language Through Useable Data and Phonetic Annotation

To preserve the Larrakia language, linguist Dr. Mark Harvey has teamed up with the Larrakia Nation Aboriginal Corporation of People and Appen with a goal to improve the database of usable text and audio data language samples the Larrakia language. This database is a major step in preserving and reviving the Larrakia language as the last fluent speaker died more than 20 years ago…

Image of caption
Image of caption

Leading Software Provider Optimized its Global eCommerce Transaction Funnel

Using our platform as a central point of operations and participant feedback, the provider was able to collect clean, accurate data quickly for both quantitative and qualitative analysis. After a successful initial market research study for its consumer SaaS product, the software provider returned for an additional study to focus on optimizing the eCommerce funnel for its commercial clientele. With a global reach of over one million skilled contractors…

Leading Social Media Platform Improves Content Relevance with Personalization

The client had strict quality thresholds that needed to be implemented and maintained throughout this project. The project also involved a subjective and complex task, and required minimal human QA. As a result, close collaboration was critical to the project’s success. Our team partnered with the client to develop task guidelines and quality management plans, and quickly…

Improved Search Quality From Microsofts’ Bing In Multiple Markets

uring an initial trial project, we were a proactive and agile partner for Microsoft in the U.S. market. In addition to assembling and training a team of linguistic resources for the project within weeks and quickly surpassing the established quality bar, we also provided recommendations for improving the evaluation process. As a result of this successful first project, Microsoft expanded our involvement. We since have processed millions of…

CallMiner Delivers Fast and Accurate Customer Insights with Large-Scale Annotation Solution

CallMiner sought a partner who could help scale its annotation efforts. Our annotation platform at Appen not only fit CallMiner’s needs, but also our commitment to responsible AI practices aligned with CallMiner’s values. Since the start of the partnership, CallMiner has been using our platform to annotate sentiment and emotion of call center data. Notably…

Dialpad Creates Data That Powers ML Models for Human Conversation at Scale

It took just a couple weeks for the change to bear fruit for Dialpad and to create the transcription and NLP training data they needed to make their models a success. “When we changed to Appen, within a few weeks, we saw that labeler accuracy go up to 88% and it stayed in the high 80s and 90s for us ever since, even across a…

London School of Economics Takes Agile Approach to Data Labeling with Appen

Researchers led by Kenneth Benoit at the Department of Methodology set out to study political science as it pertained to political texts—both in their content and in their sophistication. With the first project, their interest was in capturing the content of the messages that political actors send to others and further, using those discoveries to calculate political party positions. They found that relying on expert researchers to go through these messages was time-consuming, expensive, and nearly impossible to scale. The team was in need of a more agile, reproducible process for data labeling that would replace their current approach….

Improved Quality and Increased Output of Data Insights With Zefr

In searching for a solution that wasn’t overly-engineered, but was cost-effective and flexible to their evolving needs, Zefr turned to Appen in 2018. With Appen’s crowdsourcing solution, Zefr suddenly had access to the large pool of people ready to…

Image of caption

Search Functionality Improved With Search Evaluation From Social Network

Our years of experience in web search evaluation translate well to other domains, including social network search. The expertise demonstrated in this project includes continuous improvement of task guidelines, and the identification and …

Adobe Stock: Improves Search Relevance of Massive Asset Profile

Adobe needed highly accurate training data to create a model that could surface these subtle attributes in both their library of over a hundred million images, as well as the hundreds of thousands of new images that are uploaded every day. They used our platform to facilitate the drawing of polygons over areas that could best be used for copy blocks (think large white spaces or tabletops). For example…

Leading Search Engine Expands Internationally with Vendor-Neutral Quality Analysts

Our team of seasoned judges and auditors provided the foundation for this program and is the reason for its continued success. The team provides objective feedback across multiple vendors and manages multiple communication streams to provide a single voice back to the client. Guidelines are also…

Improved Search Quality From Microsofts’ Bing In Multiple Markets

Microsoft’s Bing search engine requires large-scale data sets to continuously deliver relevant search results – in all the global markets they serve. After an initial trial, Appen became an agile partner for Microsoft in multiple markets. Appen is able to provide the Bing team with the following…

eCommerce Company Accelerates Feature Testing with Support for Ad-Hoc Evaluations

Through its partnership with us, the customer was able to scale its evaluation capabilities much more quickly than it could with its internal resources, while simultaneously maintaining its quality standards. The client is able to test the results of changes to website features and act on the evaluation results within just a few …

Speeds Up Identifying Which Images Need Location Metadata With Shotzr

After their first job in the Appen platform, Shotzr identified over 17,000 images that did not require additional labeling. They anticipate over 61 million assets that they can remove from consideration for location data, freeing up their time to focus on…

Image of caption

Tier 1 Automotive Software Provider Creates Smarter In-Car Infotainment Systems

We provided services to collect natural language data and text data, covering all the scenarios and variations that the system might encounter in the real world. Working with in-market, on-demand crowds of native speakers, we are able to rapidly expand ASR capabilities in new locations and languages, for any given scenario. And because the company has strict…

MediaInterface Expands to France With Off-the-Shelf Datasets

Our ability to support French vocabulary with our pre-labeled datasets datasets helped MediaInterface to develop language-specific parts of their product and therefore to expand to an entirely new market and highlight the possibilities for future markets. Now…

Improved Search Engine Listings with Local Content Evaluation

The use of our global, in-market crowd greatly reduced the potential data noise generated from time-zone differences, language barriers, and cultural and geographical considerations. Our proprietary quality and performance measurement systems enabled the delivery of the highest quality data back to its client, quickly and clearly. We have consistently…

Top Software Provider Develops Global CLDR through Trusted Partnership

An important component to ensure the success of this project was the need for strong, experienced project management to lead the months-long project. The complexity of the project and number of resources involved required a strong project plan and thoughtful leadership to move from…

Mobile Device Manufacturer Achieves Greater Accuracy for Map Software with In-market Raters

The success of this project relied heavily on the development of a customized testing plan that was tailored to meet the client’s needs. Timelines were tight, and since client representatives traveled to different locations to meet the testers, the schedule needed to be managed …

Leading Search Engine Provider Enhances Ad Relevance

As the project ramped globally, the search engine provider expanded search advertising into several new markets with the confidence of knowing that ads had been evaluated, sorted, and ranked appropriately by in-market experts, increasing the opportunity for revenue. This confidence was achieved due to the high-quality data that…

FlamingoAI Deploys a Fully Automated Virtual Assistant From Day One

FlamingoAI has found that the benefit of working with us goes beyond simple money or time savings. “It’s an all-out capability thing,” says Elliott. “You can’t be a machine learning firm that develops core IP and also sources key data the way Appen does. Ultimately, you have to specialize in …

Image of caption

Top Gaming Company Strengthens Customer Support Capabilities with AI

A US-based large gaming company that has delivered games and online content to hundreds of millions of users around the world. Gaming customers often seek out customer support when they face technical or other issues with game play. For quicker triage and greater support capacity, the company uses artificial intelligence (AI)-powered chatbots to manage support requests. As is often the case with language, player requests can be confusing or vague, creating interpretability difficulties for the AI-model. In 2019, they received a recommendation from one of our partners to connect with Appen…

Infobip Creates Conversational AI Chatbots Using High Quality Datasets

Some of Infobip’s clients use their help in building the best possible version of chatbots and to meet customer demands, Infobip needs a ton of data. The best data for training this type of machine learning model is crowdsourced data that’s got global coverage and a wide variety of intents….

Image of caption

Maps Faster Than Ever: HERE Technologies Creates Fine Tune Maps

Mapping at the level of accuracy HERE Technologies strives for requires multiple different approaches and machine learning models. One of those methods leverages road signs. One of the stated goals of HERE is to create an understanding of every road sign on earth. That includes both what those signs…

GumGum Finds A Better Way to Annotate and Classify Text and Images

The Appen platform also allows GumGum team members with no prior coding experience or engineering background to set up a new annotation job, especially when the annotation job does become more complicated. Furthermore, GumGum can now create foreign language data annotation tasks for NLP-related projects. We have annotators who are native or fluent in those languages and can…

Get in touch with Sales

Join our team!

Join our Crowd

Have a question? We’d love to help.

Website for deploying AI with world class training data
Language