Handwriting Recognition

Transcriptions of 400,000 handwritten names for Optical Character Recognition (OCR).

Off the shelf machine learning datasets repository from Appen. Find 250+ datasets across 80 languages and dialects for a variety of common AI and ML use cases.

Overview

This dataset consists of more than four hundred thousand handwritten names collected through charity projects to support disadvantaged children around the world.

Optical Character Recognition (OCR) utilizes image processing technologies to convert characters on scanned documents into digital forms. It typically performs well in machine printed fonts. However, it still poses a difficult challenges for machines to recognize handwritten characters, because of the huge variation in individual writing styles.

There are 206,799 first names and 207,024 surnames in total. The data was divided into a training set (331,059), testing set (41,382), and validation set (41,382) respectively.

Labels of all images created via human-in-the-loop anotation on the Appen platform are also provided, enabling you to extend the data set with your own data.

 

The input data in this job is a hundreds of thousands of images of handwritten names. In the “Data” tab above, you’ll find the transcribed images broken up into test, training, and validation sets.

Imageurl     
D2M150010079F00021firstname.jpg
D2M150010079F00021surname.jpg
D2M150010079F00032surname.jpg
D2M150010079F00043firstname.jpg
D2M150010079F00043surname.jpg
D2M150010079F00054firstname.jpg
D2M150010079F00065firstname.jpg
D2M150010079F00065surname.jpg
D2M150010079F00076firstname.jpg
Website for deploying AI with world class training data
Language