Data Annotation

Building a solution that thinks and acts like a human continuously requires large volumes of training data. For a model understand this information, the data must be properly categorized and annotated for a specific use case. With high-quality, human-annotated data, companies can build and improve AI applications. The result is an enhanced customer experience for related solutions such as product recommendations, relevant search engine results, computer vision, speech recognition, chatbots, and more.

As an industry leader, Appen has the expertise and resources to help you quickly meet your data annotation. Between our AI-assisted data annotation platform and team of experts, we have worked with a wide variety of data types including text, audio, speech, image, and video — in over 180 languages and dialects.


Text Annotation

Sentiment Annotation

  • To assess attitudes, emotions, and opinions online, it’s important to have the right training data for sentiment analysis. Appen annotators can evaluate sentiment and moderate content on all web platforms, including social media and eCommerce sites, with the ability to tag and report on keywords that are profane, sensitive, neologistic, or misspelled.

Intent Annotation

  • As people converse more with human-machine interfaces, machines must be able to understand both natural language and user intent. Appen helps clients train applications and machine learning models via multi-intent data collection and categorization — differentiating intent into key categories including request, command, booking, recommendation, and confirmation.

Query Annotation

  • To help our clients build search engines and products with more relevant search results for customers, Appen annotates a range of queries to evaluate whether they are transactional, informational, or navigational. We also offer a subset service called Address Annotation — identifying which elements of an address query correspond to the street name, city, state, and country.

Co-Reference Annotation

  • Captions on the search engine results page (SERP) offer users key clues on which links are most relevant to their searches. Typically, the first interaction with a web result will be through the caption, followed by how well the result matches the caption on the SERP. In fact, almost 35% of search traffic does not result in a click, meaning that users often find what they are looking for directly from a caption. Our evaluation optimizes captions for both the query and the result, offering insight into areas of improvement for better search relevance.

Semantic Annotation

  • Semantic annotation both improves product listings and ensures customers can find the products they’re looking for. This helps turn browsers into buyers. By tagging the various components within product titles and search queries, our semantic annotation services help train your algorithm to recognize those individual parts and improve overall search relevance. As your inventory changes over time, we can help ensure the accuracy and relevance of your site search results.

Named Entity Annotation

  • NER systems require a large amount of manually annotated training data. We provide named entity annotation capabilities across a wide range of use cases, such as helping eCommerce clients identify and tag a range of key descriptors, or aiding social media companies in tagging entities such as people, places, companies, organizations, and titles to assist with better targeted advertising content.

Other Linguistic Annotation Services

  • Appen offers a full range of other common linguistic annotation services including semantic roles and relations, part-of-speech markup, and syntactic and dependency tree-banking annotation.


Audio Annotation

Advance the success of your technology platforms—or machine learning programs—with leading audio annotation services

With over 20 years of experience collecting and processing speech data, audio annotation is one of Appen’s core service offerings. We provide audio annotation services, such as the transcription and time-stamping of speech data, including the transcription of specific pronunciation and intonation, along with the identification of language, dialect, and speaker demographics. Every use case is different, and some require a very specific approach: for example, the tagging of aggressive speech indicators and non-speech sounds like glass breaking for use in security and emergency hotline technology applications.


Image Annotation

Improve your machine learning solutions with high-quality, human-annotated image data for greater precision and accuracy

Image annotation is vital for a wide range of applications, including computer vision, robotic vision, facial recognition, and solutions that rely on machine learning to interpret images. To train these solutions, metadata must be assigned to the images in the form of identifiers, captions, or keywords.

From computer vision systems used by self-driving vehicles and machines that pick and sort produce, to healthcare applications that auto-identify medical conditions, there are many use cases that require high volumes of annotated images. Image annotation increases precision and accuracy by effectively training these systems.

Appen provides a range of image annotation services. Our AI-assisted image annotation platform lets our team easily categorize or classify objects within an image by assigning it one or more pieces of metadata, from simple categorization to complex image analysis. We can handle multi-phase annotation by building in logic and dependency trees into multiple (unlimited) rounds of annotation, reviewing images in more detail, and building out specific metadata about each object. We can also review for offensive content in images, annotate images to improve search functionality within a client’s image recognition software, and categorize images by quality to improve search content over time.

Our data annotation platform allows for a variety of image tagging methods, including bounding box, line and multi-point line, point, polygon and free-form drawing, and categorization.

Appen image annotation facial recognition


Video Annotation

Appen’s AI-assisted video annotation platform allow for tagging gestures and facial expressions, frame-by-frame annotation, object tracking, content moderation, and more. For frame annotation, a data annotator uses image annotation features (bounding box, cuboids, points, lines, and multi-segment lines) to markup video frames. Our object tracking service “tracks” previously defined objects in between frames, aiding annotators by pre-populating known object markups in subsequent frames.

We help our clients evaluate video for offensive content and classify videos into different categories like politics, religion, news, or business. Additionally, we can annotate videos in qualitative terms to improve search content functionality and the user experience over time. Finally, we can classify topics, and help clients understand what type of advertising should accompany a particular video while identifying where ad breaks should best occur.


Speech Recognition Datasets

Whether you are working on a text-to-speech system, a voice recognition system or another solution that relies on natural language, high-quality licensed speech and language datasets allow you to go to market faster and reach more potential customers.

Talk to us today to find out more about these high-quality licensable datasets, which cover:

  • Fully transcribed speech recognition databases for broadcast, call center, in-car, and telephony applications
  • Pronunciation lexicons, both general and domain specific (e.g. names, places, and natural numbers)
  • POS-tagged lexicons and thesauri
  • Speech corpora annotated for POS, morphological information, and named entities


About Appen

Appen collects and labels images, text, speech, audio, video and other data used to build and continuously improve the world’s most innovative artificial intelligence systems. Our expertise include having a global crowd of over 1 million skilled contractors who speak over 180 languages, and the industry’s most advanced AI-assisted data annotation platform. Our high-quality training data gives leaders in technology, automotive, financial services, retail, healthcare, and governments the confidence to deploy world-class AI products. Founded in 1996, Appen has customers and offices globally.

Contact us today