Allen Institute for AI Delivers Enhanced Research Experience to Scholars

How AI2 used the Appen Platform for Semantic Scholar, an AI-powered academic search engine, to improve the quality of its novel citation intent feature with crowdsourcing

 We were able to quickly go through different iterations of annotation tasks with our crowd workers and understand what was working and what wasn’t. The fact that the quality control features are already built into the Appen platform makes task setup even easier.

– Madeleine van Zuylen, Data Science Analyst, Allen Institute for AI

The Company

The Allen Institute for AI (AI2) is a nonprofit research institute founded in 2014 by the late Paul Allen. AI2’s team of researchers and engineers pursue high-impact AI projects for the common good. Semantic Scholar’s mission is to accelerate scientific breakthroughs by using AI to help scholars locate and understand the right research, make important connections, and overcome information overload.

The Challenge

AI2 Citations

Semantic Scholar launched a citation intent feature to enable researchers to discover related academic papers using classifications for cited work. This feature displays classifications of background information, methods, and results on a source paper’s page for each subsequent article citing this source paper’s work. These classifications provide users with an understanding of why one research paper cites another and allows them to quickly discern if a cited paper is relevant to their interests.

To achieve the accurate labeling necessary to launch this feature, AI2 required access to annotators on a large scale.

The Solution

Initially, the Semantic Scholar team at AI2 worked with us to add content to the Semantic Scholar corpus before moving into data extraction. Two use cases were citation intent labeling and abstract labeling, two of the key features that make Semantic Scholar the leading AI-powered platform for discovering academic research.

The Appen Platform was used to build a dataset of labeled sentences from research papers that were then fed into a machine learning model and trained to label sentences accurately. With our help, AI2 was able to quickly set up a task for annotators, launch it, and efficiently understand how annotators were performing at the task— allowing for rapid adjustments as needed. We also made it possible to easily select different types of crowds based on language or other required factors for added customization. The citation intent feature now covers over 10 million papers and classifies over 100 million citations.

The Result

We were thrilled to achieve our desired level of quality at the scale that we needed on the Appen platform to train our new citation intent classification model.

– Sebastian Kohlmeier, Sr. Manager – Business Operations, Allen Institute for AI

With the ease-of-use of our platform, AI2 can now run through jobs quickly and receive real-time feedback through aggregated reports, a significant time-saver. Our platform also delivered excellent accuracy – the citation intent task was over 80% and increased over several task iterations.

Ultimately, these efforts have positively impacted the Semantic Scholar user experience by providing greater accessibility to quality academic research. Today, eight million scholars globally use the site each month, interacting with citation intent and other AI-powered features.

The Semantic Scholar team is now working with us to expand into future use cases. We are also proud to be partners with AI2 in supporting fair pay and pay transparency with crowd workers across the globe.