End-to-end service that delivers controlled, consistent, high-quality annotated data with speed and at scale.
Whether you are training a model to work for a very particular use case, or are unable to find sufficient open-source data to train your model, we can help provide the high-quality structured or unstructured data you need to kick off your AI project. Choose from our bespoke Data Collection services or browse our extensive catalog of pre-labeled datasets for a variety of common AI use cases.
Curate unique datasets with access to our 24/7 on-demand contributors or managed crowd options for data collection for your specific use case.
Accelerate AI projects with access to more than 250+ pre-labeled datasets—off the shelf data specific to your needs.
Products and expertise that artificially generate hard-to-find data and edge-cases to enhance model coverage and performance.
Access to a wide range of types, collection methodologies, and delivery models, including images, videos, text, speech, and audio to fulfill unique data collection scenarios.
High-quality data that meets your specific requirements.
Multi-platform data collection methods thanks to our mobile app for iOS and Android, as well as our platform Fast and scalable crowd recruitment from a global pool of over 1M people Peace of mind knowing all data is collected ethically and in line with regulatory requirements
Data annotation faster and at scale with machine-learning assisted annotation tools that provide a comprehensive data-labeling solution. Machine learning assistance is built into our industry leading annotation tools to save customers time, effort, and money – delivering high-quality training data and accelerating the ROI on your AI initiatives.
Introducing dynamic elements to ensure performance more closely reflects real-world deployment environments.
Our new Voice Assistant Benchmark (VAB) initiative which is a partnership with top global technology companies for ad hoc TTS voice benchmarking, mean opinion scale (MOS), and MUSHRA ratings. It’s an opportunity to streamline, standardize, and iterate the voice evaluation process, creating a true benchmark and highlighting optimum Voice Assistant standards across devices and brands.
Achieve optimal results when you work with our experts to design taxonomies and ontologies. Knowledge graphs offer more flexible and complex storage than their traditional counterparts, and end-users receive more expansive answers compared to the standard 1:1 coded answer for each question.
Collect, classify, annotate, and/or transcribe images to train the most accurate and inclusive computer vision models. Our image annotation tooling includes polygons, dots, lines, rotating bounding boxes and/or ellipses and pixel level semantic segmentation. Additional object information can be collected in shapes using ontologies for faster, more flexible and more accurate image annotation. This suite of tooling gives everything you need to successfully annotate a wide variety of image types with speed and precision.
Collect, classify, transcribe or annotate videos to assist your models to see and interpret the world around them. Our annotation tooling includes special transcription and time stamping tools, object tracking (with additional Speed Labeling capabilities), object detection and time stamping as well as ontology attribute annotation. Our Machine Learning assistance and custom built tools give you the flexibility to easily collect accurate video annotations at scale.
Collect, classify, transcribe or annotate audio data for your NLP projects. Our Audio Annotation tool is twice as fast as traditional annotation tools. Segment audio into layers, speakers and timestamps for your Audio Speech Recognition and other audio models. Generate high-quality audio transcripts rapidly with acoustic tags in a variety of languages, leveraging NLP to improve transcription quality and efficiency. Our all-in-one audio tooling has been purpose built to deliver clear and crisp audio annotations and transcriptions to train your models.
Collect, classify and annotate text to enhance your NLP model’s understanding of nuanced human speech. Speed Labeling capabilities include built-in multi-language tokenizers to assist human annotation efforts. Target entity extraction and span labeling with options to bring your model outputs to accelerate contributor annotations. We can also help with text evaluation and post-editing data generated from your NLP models. Our specialized tools and linguistic experts will deliver the high-quality training data you need to build your NLP model for any chosen market.
Annotate several types of point cloud data including LiDAR, Radar, and other types of scanners/sensors using our intuitive annotation interface. Point cloud frames (3D point cloud and RGB images) can easily be annotated with cuboids to support complex use cases such as autonomous vehicles. Built-in Machine Learning assistance aids and enhances annotation speed and quality. Our purpose-built 3D sensor tooling enables leaders in technology to annotate complex data types accurately at speed and scale.
Combining multiple datasets together from vendors or annotation jobs can be challenging without the right tools. Our data annotation platform can annotate multiple types of data easily in one place, and our enterprise level Workflows tooling makes combining and automating multi-step annotation jobs a breeze. With our most advanced machine learning-powered data annotation platform, we can deliver high accuracy for your highly complex multi-modal AI projects
Ensure products are working as desired before any big launch by enlisting the help of our contributors for hardware/ device testing. Our global crowd of contributors are present in over 170 countries so will be ready to support your launch in any geography. With the help of our specialized Project Managers, we can help devise a robust testing and evaluation plan to make sure all usage scenarios have been thoroughly tested with any improvement areas flagged so your product launch will be a success.
High-quality raw mobile location data from 700+ million devices in 200+ countries allows you to perform location analytics and derive actionable business intelligence. Tap into the global data feed or request data customized to a specific region. Our location data is fully compliant with GDPR and CCPA. Stamped with a unique QuadID and intensive in-house quality control, you can ensure that every event we share is authentic and increases the reliability of your mobility analysis.
As a result of this project, the client released its product on time with the data it required to meet its users’ needs. The firm quickly and efficiently improved its machine learning model with access to a large amount of high-quality data. The geographic and demographic diversity of our rater pool proved immensely valuable to…
We successfully managed the collection and transcription of 105 hours of audio—totalling 60,000 utterances—which helped the client design, build and deliver the ASR it needed to take to market. The company has since been able to take the acoustic models built into its new ASR and apply it to a range of North American English edutainment platforms and apps specifically designed for children. One of our key recommendations for this project was…
As an experienced solution provider for the automotive industry, we offer a full-service approach for localization, data collection, in-car testing and validation, and linguistic consulting. Our experienced project managers, who have years of experience working in the automotive industry, work directly with the OEM’s engineering team to …
To achieve their goal of helping people cook at home, Erik Andrejko, Wellio’s CTO said that the company “is building a platform that embodies the intersection of culinary and health expertise.” Doing so required building machine learning algorithms that have the same capacity an expert does to make inferences and suggestions …
To create audio CMIs, GuildLink needed to find a partner to convert text to speech quickly and accurately. GuildLink chose to work with us thanks to our reputation for linguistic and technical expertise, specifically providing high-quality speech and language processing services. The text-to-speech conversion process takes place via…
To preserve the Larrakia language, linguist Dr. Mark Harvey has teamed up with the Larrakia Nation Aboriginal Corporation of People and Appen with a goal to improve the database of usable text and audio data language samples the Larrakia language. This database is a major step in preserving and reviving the Larrakia language as the last fluent speaker died more than 20 years ago…
Using our platform as a central point of operations and participant feedback, the provider was able to collect clean, accurate data quickly for both quantitative and qualitative analysis. After a successful initial market research study for its consumer SaaS product, the software provider returned for an additional study to focus on optimizing the eCommerce funnel for its commercial clientele. With a global reach of over one million skilled contractors…
The client had strict quality thresholds that needed to be implemented and maintained throughout this project. The project also involved a subjective and complex task, and required minimal human QA. As a result, close collaboration was critical to the project’s success. Our team partnered with the client to develop task guidelines and quality management plans, and quickly…
uring an initial trial project, we were a proactive and agile partner for Microsoft in the U.S. market. In addition to assembling and training a team of linguistic resources for the project within weeks and quickly surpassing the established quality bar, we also provided recommendations for improving the evaluation process. As a result of this successful first project, Microsoft expanded our involvement. We since have processed millions of…
CallMiner sought a partner who could help scale its annotation efforts. Our annotation platform at Appen not only fit CallMiner’s needs, but also our commitment to responsible AI practices aligned with CallMiner’s values. Since the start of the partnership, CallMiner has been using our platform to annotate sentiment and emotion of call center data. Notably…
It took just a couple weeks for the change to bear fruit for Dialpad and to create the transcription and NLP training data they needed to make their models a success. “When we changed to Appen, within a few weeks, we saw that labeler accuracy go up to 88% and it stayed in the high 80s and 90s for us ever since, even across a…
Researchers led by Kenneth Benoit at the Department of Methodology set out to study political science as it pertained to political texts—both in their content and in their sophistication. With the first project, their interest was in capturing the content of the messages that political actors send to others and further, using those discoveries to calculate political party positions. They found that relying on expert researchers to go through these messages was time-consuming, expensive, and nearly impossible to scale. The team was in need of a more agile, reproducible process for data labeling that would replace their current approach….
In searching for a solution that wasn’t overly-engineered, but was cost-effective and flexible to their evolving needs, Zefr turned to Appen in 2018. With Appen’s crowdsourcing solution, Zefr suddenly had access to the large pool of people ready to…
Our years of experience in web search evaluation translate well to other domains, including social network search. The expertise demonstrated in this project includes continuous improvement of task guidelines, and the identification and …
Adobe needed highly accurate training data to create a model that could surface these subtle attributes in both their library of over a hundred million images, as well as the hundreds of thousands of new images that are uploaded every day. They used our platform to facilitate the drawing of polygons over areas that could best be used for copy blocks (think large white spaces or tabletops). For example…
Our team of seasoned judges and auditors provided the foundation for this program and is the reason for its continued success. The team provides objective feedback across multiple vendors and manages multiple communication streams to provide a single voice back to the client. Guidelines are also…
Microsoft’s Bing search engine requires large-scale data sets to continuously deliver relevant search results – in all the global markets they serve. After an initial trial, Appen became an agile partner for Microsoft in multiple markets. Appen is able to provide the Bing team with the following…
Through its partnership with us, the customer was able to scale its evaluation capabilities much more quickly than it could with its internal resources, while simultaneously maintaining its quality standards. The client is able to test the results of changes to website features and act on the evaluation results within just a few …
After their first job in the Appen platform, Shotzr identified over 17,000 images that did not require additional labeling. They anticipate over 61 million assets that they can remove from consideration for location data, freeing up their time to focus on…
We provided services to collect natural language data and text data, covering all the scenarios and variations that the system might encounter in the real world. Working with in-market, on-demand crowds of native speakers, we are able to rapidly expand ASR capabilities in new locations and languages, for any given scenario. And because the company has strict…
Our ability to support French vocabulary with our pre-labeled datasets datasets helped MediaInterface to develop language-specific parts of their product and therefore to expand to an entirely new market and highlight the possibilities for future markets. Now…
The use of our global, in-market crowd greatly reduced the potential data noise generated from time-zone differences, language barriers, and cultural and geographical considerations. Our proprietary quality and performance measurement systems enabled the delivery of the highest quality data back to its client, quickly and clearly. We have consistently…
An important component to ensure the success of this project was the need for strong, experienced project management to lead the months-long project. The complexity of the project and number of resources involved required a strong project plan and thoughtful leadership to move from…
The success of this project relied heavily on the development of a customized testing plan that was tailored to meet the client’s needs. Timelines were tight, and since client representatives traveled to different locations to meet the testers, the schedule needed to be managed …
As the project ramped globally, the search engine provider expanded search advertising into several new markets with the confidence of knowing that ads had been evaluated, sorted, and ranked appropriately by in-market experts, increasing the opportunity for revenue. This confidence was achieved due to the high-quality data that…
FlamingoAI has found that the benefit of working with us goes beyond simple money or time savings. “It’s an all-out capability thing,” says Elliott. “You can’t be a machine learning firm that develops core IP and also sources key data the way Appen does. Ultimately, you have to specialize in …
A US-based large gaming company that has delivered games and online content to hundreds of millions of users around the world. Gaming customers often seek out customer support when they face technical or other issues with game play. For quicker triage and greater support capacity, the company uses artificial intelligence (AI)-powered chatbots to manage support requests. As is often the case with language, player requests can be confusing or vague, creating interpretability difficulties for the AI-model. In 2019, they received a recommendation from one of our partners to connect with Appen…
Some of Infobip’s clients use their help in building the best possible version of chatbots and to meet customer demands, Infobip needs a ton of data. The best data for training this type of machine learning model is crowdsourced data that’s got global coverage and a wide variety of intents….
Mapping at the level of accuracy HERE Technologies strives for requires multiple different approaches and machine learning models. One of those methods leverages road signs. One of the stated goals of HERE is to create an understanding of every road sign on earth. That includes both what those signs…
The Appen platform also allows GumGum team members with no prior coding experience or engineering background to set up a new annotation job, especially when the annotation job does become more complicated. Furthermore, GumGum can now create foreign language data annotation tasks for NLP-related projects. We have annotators who are native or fluent in those languages and can…