Appen provides quality assessments and creates customized linguistic evaluation processes for machine translation throughout entire technology development life cycle.


The Natural Language Processing group in the Microsoft Research division developed a machine translation system that automatically learns translation mappings from bilingual corpora. Changes to the learning algorithms can drastically and unpredictably affect translation quality, making it difficult to measure and improve system performance. Microsoft Research needed linguistic experts to systematically evaluate the quality and performance throughout the development process, and help deliver a superior end product that supports more than 40 languages.


For the last 13 years, Appen has been partnering with the Natural Language Processing group to support linguistic needs for the machine translation system throughout the entire technology life cycle – from the feasibility stage of a promising new technology to the production rollout. As the system evolved, the needs for quality evaluation and assessment processes changed, and Appen continued to adapt and optimize their solutions.

In the early research stages, when the machine translation was based on syntactic representations and grammatical rules, Appen linguists worked on-site with Microsoft developers, providing ongoing targeted feedback and helping to detect and analyze systemic errors. They evaluated a set of regression test sentences on a daily basis and logged bugs for the systemic error patterns found.

As the technology moved towards a statistically trained system, Appen implemented a remote system (via SharePoint) that allows a large team of testers to evaluate bi-weekly regression tests and to aggregate the test results across languages. This process allows identification of detrimental changes in the translation algorithm and early escalation to the development team.

As the system matured, Appen established, and continues to perform, regular benchmarking of the machine translation to maintain the high quality standard for all languages. The benchmarking process combines unbiased human judgments from at least four independent machine translation evaluators per language with a rigorous statistical analysis, and generates a quality measure with a 95% confidence interval.

The MT technology is now being applied in a variety of translation scenarios, including real-time in-place translation of text and web pages through Bing Translator and instant messaging across different languages.

Haiti earthquake: Haitian Creole translation service launched in five days

Communication is crucial for disaster release efforts. When the 2010 Haiti earthquake struck, the world immediately responded to appeals for humanitarian aid. But not everyone spoke Haitian Creole.

Appen partnered with Microsoft to fast-track an online Haitian Creole machine translation system in support of international relief efforts. Appen sourced and evaluated a data corpus of human translations, while the Microsoft team built out the rest of the translation system. A process that typically takes weeks to accomplish was accomplished in five days.

Languages covered:

  • Arabic
  • Bengali
  • Bulgarian
  • Catalan
  • Chinese Simplified
  • Chinese Traditional
  • Czech
  • Danish
  • Dutch
  • English US
  • Estonian
  • Farsi
  • Finnish
  • French
  • German
  • Greek
  • Haitian Creole
  • Hebrew
  • Hindi
  • Hmong
  • Hungarian
  • Icelandic
  • Indonesian
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Lithuanian
  • Malay
  • Maltese
  • Norwegian
  • Polish
  • Portuguese Brazil
  • Portuguese Europe
  • Romanian
  • Russian
  • Slovak
  • Slovenian
  • Spanish Europe
  • Spanish Latin America
  • Swedish
  • Tamil
  • Thai
  • Turkish
  • Ukranian
  • Urdu
  • Vietnamese


Appen’s broad experience enabled Microsoft to work with only one linguistic partner throughout all phases of development. Appen engaged with the Microsoft team through the early feasibility stage of the technology, the adoption of new approaches and integration of new algorithms, and the extension of language coverage.

Appen offers flexible staffing solutions by retaining a pool of over 150 machine translation evaluators worldwide, accommodating both ongoing and immediate requests in a flexible way. As a result, turnaround time is swift, and schedule changes can quickly be accommodated.