GumGum Finds A Better Way to Annotate and Classify Text and Images

GumGum selected Appen for its robust training data platform and the Machine Learning (ML)-Assisted Data Annotation

April 13, 2020

The Company

GumGum is an artificial intelligence (AI) company with a focus on computer vision (CV) and natural language processing (NLP). For the past 10 years, it has applied its patented capabilities to solving hard problems in a variety of industries, from professional sports to healthcare, but the company built its name with solutions for the digital advertising industry. It was for that industry that GumGum developed one of its most exciting proprietary offerings: webpage content analysis technology.

GumGum’s technology reviews webpages, identifying and classifying the content it finds in order to help advertisers place digital ads in relevant and brand-safe contexts. Rather than rely on behavioral targeting, which targets ads at users based on their personal online history, GumGum’s contextual targeting technology serves ads that are aligned with users’ interests without infringing on users’ data privacy. It also ensures that a brand’s ads do not appear adjacent to context that is offensive or harmful to brand reputation.

The Challenge

Erica Nishimura, data curator at GumGum, said,

"To provide accurate contextual intelligence for digital ad placements, our technology has to be able to look at images and text on webpages and identify what’s in them. For an image that means we first need to determine if it’s safe."

We’ll look for things like hate symbols, violence, nudity, drugs, etc. If we see those things, we prevent ads from being placed. If we determine it’s safe, we’ll then identify whether it’s a person’s face, a specific celebrity’s face, a dog, or whatever may be relevant to the ad. There’s a more complex but similar process for analyzing text.”

For GumGum’s algorithms to understand what they are seeing and reading, they must be fed large volumes of relevant annotated training data. Initially, GumGum worked with two full-time annotators who could, at best, annotate 15,000 rows of text data or 50,000 images per month.

GumGum’s CV and NLP scientists, who work on the company’s algorithms, needed a better way to perform text classification, image classification, and image annotation in order to efficiently create the high-quality structured data used to train the company’s advanced machine learning models.

The Solution

GumGum selected Appen for its robust training data platform. We offer GumGum data scientists solutions, such as Machine Learning (ML)-Assisted Data Annotation.

The Appen platform also allows GumGum team members with no prior coding experience or engineering background to set up a new annotation job, especially when the annotation job does become more complicated.

Furthermore, GumGum can now create foreign language data annotation tasks for NLP-related projects. We have annotators who are native or fluent in those languages and can work on the annotation. In the past, Appen has successfully completed annotation tasks in Spanish, French, German and Japanese. Nishimura added that “GumGum is especially happy with the Japanese annotation quality and support, which Appen has improved tremendously over the past year.”

The Result

“Most data scientists find data labeling a time-consuming process and their wait for data to be labeled unenjoyable” Nishimura said, so they jumped at the chance to use the Appen platform and crowd. GumGum is now able to annotate, depending on the task or language, 10,000 rows of data in just a few days—and sometimes within just a few hours—a fraction of the time it previously required for annotating a similarly sized data set. This efficiency freed up their data scientists to work on research for their NLP and CV technology instead of spending the extra time and effort on in-house data annotation.

“Working with Appen has made our model development process 10 times faster, allowing us to get to the next step much quicker and think about audio and video at scale.” Lane Schechter, Product Manager at GumGum said.

“In addition to the importance of accurate data, a quick turnaround on large data sets is critical to improve and maintain quality of Machine Learning models” Schechter noted, so the accuracy and throughput of Appen data was essential in enabling the quality of GumGum’s Machine Learning models.

“The Appen platform is super neat and easy to navigate compared to most of its competitors” Nishimura said.

“The Appen platform is super neat and easy to navigate compared to most of its competitors. (…) Support has been super helpful. I get responses usually within minutes, if not, the following day” – Erica Nishimura, Data Curator, GumGum.

Not only can GumGum create high-quality datasets more efficiently, but it also has the flexibility to customize annotation jobs for specific use cases and leverage our expertise for guidance. GumGum has found a one-stop shop for high-quality ML training data creation, ensuring its employees can focus on growing the business and supporting its customers.

"What’s been super helpful is to tell my customer success manager what it is I want to achieve, and look to Appen to help me with the job design, creation, and coding."

– Erica Nishimura
Data Curator, GumGum