By Ben Christensen Director, Content Relevance “When should I use crowdsourcing and when should I use a curated crowd?” This is the question anyone interested in staffing human annotation tasks should be asking, but many don’t because they don’t even know there are two different options. So let’s start there—defining the options. Assuming you need human annotation, for example for search relevance evaluation, there are two ways you can gather the necessary humans to do that work: 1. Crowdsourcing, where the task is made available to a large crowd without any training or management beyond a very limited set of task instructions and possibly a simple screening test; or 2. Curated crowds, where a smaller group is selected to complete the task accurately according to quality guidelines. The power of crowdsourcing is in its numbers. You can accomplish a lot quickly because many hands make light work. A hundred thousand people can do quite a bit more than a hundred can. The cost is less because crowdsourcing typically pays only a few pennies per task. Most members of the crowd aren’t trying to make a living—they’re just trying to make a few extra bucks in their spare time. There’s usually little overhead involved in crowdsourcing because the crowd looks after itself. You put the task out there, and if it’s interesting enough and pays enough, the crowd will get it done. […]
Appen is proud to announce that Dorota Iskra will be speaking at LT-Accelerate 2015 in Brussels, Belgium on Monday, November 23rd. Dorota’s presentation is entitled “Crowdsourcing in language data collection and annotation”. Appen has been active in the area of data collection and annotation for over 15 years. Starting with a traditional approach of working locally, we have gradually moved to using web-based tools. This presentation focuses on advantages of web-based tools such as greater reach, lower cost, and enabling us to build an extensive network/crowd. On the surface crowdsourcing appears an attractive alternative for data collection and annotation, but poses a lot of challenges. In order to address these challenges, we have built our own crowd in a controlled and tested environment. Dorota has extensive experience in speech and language technology where, after a period research, she has moved towards applications and language resources. She has worked in the telecom and software development industries and is currently responsible for the European business at Appen, a company collecting speech and language data and providing a wide range of linguistic services.