Speech and Language Datasets


Our off-the-shelf, licensable datasets for your natural language processing projects


Image

Speech Recognition Datasets Catalog



We offer an extensive catalog of off-the-shelf, licensable datasets for natural language processing. 

Speech recognition in machine learning requires a robust, comprehensive sample of spoken language that accurately represents the dialects being transcribed or interpreted.

Based on your use case, you can purchase transcribed speech datasets, general and domain-specific pronunciation lexicons, POS-tagged lexicons and thesauri, or text corpora annotated for morphological information and named entities.



To receive a quote, select one or multiple languages by clicking
+ Add to quote and then click on Request Quote.



LanguageProductsDetailsQuote
English (Australian)TelephonyAUS_ASR001 + Add to quote
English (Australian)TelephonyAUS_ASR002 + Add to quote
Bahasa IndonesiaConversational TelephonyBAH_ASR001 + Add to quote
French (Belgium)TelephonyBelgian French SpeechDat(II) FDB-1000 (FIXED1BF) + Add to quote
BengaliConversational TelephonyBEN_ASR001 + Add to quote
BulgarianConversational TelephonyBUL_ASR001 + Add to quote
Arabic (UAE/Saudi)MicrophoneCGA_ASR001 + Add to quote
CroatianConversational TelephonyCRO_ASR001 + Add to quote
CroatianMicrophoneCRO_ASR002 + Add to quote
CzechMicrophoneCZE_ASR001 + Add to quote
CzechScripted TelephonyCzech SpeechDat(E) Database + Add to quote
DariConversational TelephonyDAR_ASR001 + Add to quote
DariBroadcast DataDAR_BRC001 + Add to quote
German (Germany)MicrophoneDEU_ASR001 + Add to quote
German (Germany)MicrophoneDEU_ASR003 + Add to quote
Dutch (Netherlands & Belgium)In-CarDutch and Flemish SpeechDat-Car + Add to quote
Arabic (Eastern Algerian)Conversational TelephonyEAR_ASR001 + Add to quote
English (Arabic - Levant/Egypt)Conversational TelephonyENA_ASR001 + Add to quote
English (Canadian)TelephonyENC_ASR001 + Add to quote
English (Filipino)Conversational TelephonyENF_ASR001 + Add to quote
English (Indian)TelephonyENI_ASR001 + Add to quote
English (Indian)Conversational TelephonyENI_ASR002 + Add to quote
Spanish (Latin America)MicrophoneESL_ASR001 + Add to quote
Spanish (Latin America- Chilean and Colombian)Conversational TelephonyESL_ASR002 + Add to quote
Spanish (Spain)MicrophoneESP_ASR001 + Add to quote
Spanish (Spain)MicrophoneESP_TTS001 + Add to quote
Farsi/PersianTelephonyFAR_ASR001 + Add to quote
Farsi/PersianConversational TelephonyFAR_ASR002 + Add to quote
Dutch (Belgium)TelephonyFlemish SpeechDat(II) FDB-1000 (FIXED1FL) + Add to quote
French (Canadian)TelephonyFRC_ASR001 + Add to quote
French (Canadian)Microphone recordingsFRC_ASR002 + Add to quote
French (Canadian)Conversational TelephonyFRC_ASR003 + Add to quote
French (France)TelephonyFrench SpeechDat(II) FDB-1000 + Add to quote
French (France)TelephonyFrench SpeechDat(II) FDB-5000 + Add to quote
French (France)In-CarFrench SpeechDat-Car + Add to quote
French (France)Conversational TelephonyFRF_ASR001 + Add to quote
French (France)MicrophoneFRF_ASR003 + Add to quote
German (Germany)TelephonyGerman SpeechDat (II) FDB-1000 + Add to quote
German (Germany)TelephonyGerman SpeechDat(II) FDB-4000 + Add to quote
HausaMicrophoneHAU_ASR001 + Add to quote
HausaConversational telephonyHAU_ASR002 + Add to quote
HebrewConversational TelephonyHEB_ASR001 + Add to quote
HindiTelephonyHIN_ASR001 + Add to quote
HindiConversational TelephonyHIN_ASR002 + Add to quote
HungarianScripted TelephonyHungarian SpeechDat(E) + Add to quote
ItalianMicrophoneITA_ASR001 + Add to quote
ItalianMicrophoneITA_ASR002 + Add to quote
ItalianConversational TelephonyITA_ASR003 + Add to quote
ItalianMicrophoneITA_TTS001 + Add to quote
ItalianTelephonyItalian Fixed Network Speech SpeechDat(M) Corpus + Add to quote
ItalianTelephonyItalian SpeechDat(II) FDB-3000 + Add to quote
ItalianTelephonyItalian SpeechDat(II) MDB-250 + Add to quote
JapaneseMicrophoneJPN_ASR001 + Add to quote
KannadaConversational TelephonyKAN_ASR001 + Add to quote
KoreanMicrophoneKOR_ASR001 + Add to quote
French (Luxembourg)TelephonyLuxembourgish French SpeechDat(II) FDB-500 (FIXED1LF) + Add to quote
German (Luxembourg)TelephonyLuxembourgish German SpeechDat(II) FDB-500 (FIXED1LG) + Add to quote
MandarinTelephonyMAC_ASR001 + Add to quote
MandarinMicrophoneMAC_ASR002 + Add to quote
MarathiConversational TelephonyMAR_ASR001 + Add to quote
Arabic (MSA)MicrophoneMSA_ASR001 + Add to quote
Dutch (Netherlands)Conversational TelephonyNLD_ASR001 + Add to quote
English (Arabic - UAE)TelephonyOrienTel English as spoken in the United Arab Emirates + Add to quote
German (Turkish)TelephonyOrienTel German Spoken by Turkish + Add to quote
TurkishTelephonyOrienTel Turkish Database + Add to quote
Arabic (United Arab Emirates)TelephonyOrienTel United Arab Emirates MCA (Modern Colloquial Arabic) + Add to quote
Arabic (United Arab Emirates)TelephonyOrienTel United Arab Emirates MSA (Modern Standard Arabic) + Add to quote
PashtoConversational TelephonyPAS_ASR001 + Add to quote
PashtoConversational microphone dataPAS_ASR002 + Add to quote
PashtoBroadcast DataPAS_BRC001 + Add to quote
PolishMicrophonePOL_ASR001 + Add to quote
PolishScripted TelephonyPolish SpeechDat(E) Database + Add to quote
Portuguese (Brazilian)MicrophonePTB_ASR001 + Add to quote
Portuguese (Brazilian)Conversational TelephonyPTB_ASR002 + Add to quote
Portuguese (Portugal)Conversational TelephonyPTP_ASR001 + Add to quote
Punjabi (Pakistan)Conversational TelephonyPAP_ASR001 + Add to quote
RomanianConversational TelephonyROM_ASR001 + Add to quote
RussianConversational TelephonyRUS_ASR001 + Add to quote
RussianMicrophoneRUS_ASR002 + Add to quote
RussianScripted TelephonyRussian SpeechDat(E) Database + Add to quote
SlovakScripted TelephonySlovak SpeechDat(E) Database + Add to quote
SlovenianTelephonySlovenian SpeechDat(II) FDB-1000 + Add to quote
SomaliConversational TelephonySOM_ASR001 + Add to quote
Sorani (Kurdish)Conversational TelephonySOR_ASR001 + Add to quote
ItalianTelephonySpeechDat(M) Italian Mobile Network Speech Database + Add to quote
DanishMicrophoneSpeecon Danish + Add to quote
Dutch (Belgium)MicrophoneSpeecon Dutch from Belgium + Add to quote
Dutch (Netherlands)MicrophoneSpeecon Dutch from the Netherlands + Add to quote
English (US)MicrophoneSpeecon English (USA) database + Add to quote
German (Switzerland)MicrophoneSpeecon German (Switzerland) database + Add to quote
JapaneseMicrophoneSpeecon Japanese + Add to quote
RussianMicrophoneSpeecon Russian Database + Add to quote
Spanish (Spain)MicrophoneSpeecon Spanish Database + Add to quote
SwedishMicrophoneSWE_ASR001 + Add to quote
English (UK)TC-STAR female baseline voice Laura + Add to quote
English (UK)TC-STAR male baseline voice Ian + Add to quote
ThaiMicrophoneTHA_ASR001 + Add to quote
TurkishConversational TelephonyTUR_ASR001 + Add to quote
TurkishMicrophoneTUR_ASR002 + Add to quote
English (UK)Conversational TelephonyUKE_ASR001 + Add to quote
UrduConversational TelephonyURD_ASR001 + Add to quote
English (US)Studio/microphone recordingsUSE_ASR001 + Add to quote
English (US)Conversational TelephonyUSE_ASR003 + Add to quote
VietnameseMicrophoneVIE_ASR001 + Add to quote
KannadaConversational TelephonyKAN_ASR001A + Add to quote
MarathiConversational TelephonyMAR_ASR001A + Add to quote
BulgarianMicrophoneBUL_ASR002 + Add to quote
Arabic (Moroccan)Conversational TelephonyARY_ASR001 + Add to quote
Arabic (Moroccan)Conversational TelephonyARY_MT001 + Add to quote
AmharicPronunciation Lexicon45,000 words + Add to quote
Arabic (Algerian)Pronunciation Lexicon10,000 words + Add to quote
Arabic (Egyptian)Pronunciation Lexicon40,000 words + Add to quote
Arabic (Gulf)Pronunciation Lexicon75,000 words + Add to quote
Arabic (Iraqi)Pronunciation Lexicon15,000 words + Add to quote
Arabic (Maghrebi)Pronunciation Lexicon10,000 words + Add to quote
Arabic (MSA)Pronunciation Lexicon15,000 words + Add to quote
Arabic (North Levantine)Pronunciation Lexicon25,000 words + Add to quote
Arabic (Palestinian)Pronunciation Lexicon25,000 words + Add to quote
Arabic (South Levantine)Pronunciation Lexicon30,000 words + Add to quote
Arabic (Sudanese)Pronunciation Lexicon10,000 words + Add to quote
Arabic (Syrian)Pronunciation Lexicon25,000 words + Add to quote
Arabic (UAE/Saudi)Pronunciation Lexicon75,000 words + Add to quote
AssamesePronunciation Lexicon40,000 words + Add to quote
Bahasa IndonesiaPronunciation Lexicon95,000 words + Add to quote
BasquePronunciation Lexicon10,000 words + Add to quote
BengaliPronunciation Lexicon25,000 words + Add to quote
BulgarianPronunciation Lexicon55,000 words + Add to quote
CatalanPronunciation Lexicon10,000 words + Add to quote
CebuanoPronunciation Lexicon20,000 words + Add to quote
CroatianPronunciation Lexicon15,000 words + Add to quote
CzechPronunciation Lexicon45,000 words + Add to quote
DanishPronunciation Lexicon105,000 words + Add to quote
DariPronunciation Lexicon15,000 words + Add to quote
DholuoPronunciation Lexicon20,000 words + Add to quote
DutchPronunciation Lexicon45,000 words + Add to quote
English (Australian)Pronunciation Lexicon155,000 words + Add to quote
English (Canadian)Pronunciation Lexicon50,000 words + Add to quote
English (Filipino)Pronunciation Lexicon5,000 words + Add to quote
English (Hong Kong)Pronunciation Lexicon15,000 words + Add to quote
English (Indian)Pronunciation Lexicon30,000 words + Add to quote
English (Ireland)Pronunciation Lexicon10,000 words + Add to quote
English (New Zealand)Pronunciation Lexicon50,000 words + Add to quote
English (UK)Pronunciation Lexicon175,000 words + Add to quote
English (US)Pronunciation Lexicon290,000 words + Add to quote
FarsiPronunciation Lexicon75,000 words + Add to quote
FinnishPronunciation Lexicon85,000 words + Add to quote
French (Belgian)Pronunciation Lexicon5,000 words + Add to quote
French (Canadian)Pronunciation Lexicon65,000 words + Add to quote
French (France)Pronunciation Lexicon110,000 words + Add to quote
French (Luxembourg)Pronunciation Lexicon5,000 words + Add to quote
French (Swiss)Pronunciation Lexicon5,000 words + Add to quote
German (Austria)Pronunciation Lexicon5,000 words + Add to quote
German (Germany)Pronunciation Lexicon145,000 words + Add to quote
German (Swiss)Pronunciation Lexicon5,000 words + Add to quote
GreekPronunciation Lexicon5,000 words + Add to quote
GuaraniPronunciation Lexicon35,000 words + Add to quote
Haitian CreolePronunciation Lexicon15,000 words + Add to quote
HausaPronunciation Lexicon10,000 words + Add to quote
HebrewPronunciation Lexicon25,000 words + Add to quote
HindiPronunciation Lexicon35,000 words + Add to quote
HungarianPronunciation Lexicon500 words + Add to quote
IgboPronunciation Lexicon30,000 words + Add to quote
ItalianPronunciation Lexicon190,000 words + Add to quote
JapanesePronunciation Lexicon260,000 words + Add to quote
JavanesePronunciation Lexicon20,000 words + Add to quote
KannadaPronunciation Lexicon35,000 words + Add to quote
KazakhPronunciation Lexicon30,000 words + Add to quote
KoreanPronunciation Lexicon105,000 words + Add to quote
KurmanjiPronunciation Lexicon60,000 words + Add to quote
LaoPronunciation Lexicon15,000 words + Add to quote
LithuanianPronunciation Lexicon60,000 words + Add to quote
MalayalamPronunciation Lexicon4,000 words + Add to quote
MalaysianPronunciation Lexicon30,000 words + Add to quote
MarathiPronunciation Lexicon30,000 words + Add to quote
MongolianPronunciation Lexicon30,000 words + Add to quote
NorwegianPronunciation Lexicon115,000 words + Add to quote
OriyaPronunciation Lexicon15,000 words + Add to quote
PashtoPronunciation Lexicon65,000 words + Add to quote
PolishPronunciation Lexicon40,000 words + Add to quote
Portuguese (Brazilian)Pronunciation Lexicon100,000 words + Add to quote
Portuguese (Portugal)Pronunciation Lexicon110,000 words + Add to quote
RomanianPronunciation Lexicon15,000 words + Add to quote
RussianPronunciation Lexicon115,000 words + Add to quote
SerbianPronunciation Lexicon15,000 words + Add to quote
SomaliPronunciation Lexicon20,000 words + Add to quote
Sorani (Kurdish)Pronunciation Lexicon25,000 words + Add to quote
Spanish (Castilian)Pronunciation Lexicon90,000 words + Add to quote
Spanish (Latin American)Pronunciation Lexicon10,000 words + Add to quote
Spanish (US)Pronunciation Lexicon90,000 words + Add to quote
Swahili (Kenya)Pronunciation Lexicon65,000 words + Add to quote
SwedishPronunciation Lexicon105,000 words + Add to quote
SylhetiPronunciation Lexicon20,000 words + Add to quote
TagalogPronunciation Lexicon30,000 words + Add to quote
TamilPronunciation Lexicon105,000 words + Add to quote
TeluguPronunciation Lexicon50,000 words + Add to quote
ThaiPronunciation Lexicon30,000 words + Add to quote
Tok PisinPronunciation Lexicon5,000 words + Add to quote
TurkishPronunciation Lexicon255,000 words + Add to quote
UkrainianPronunciation Lexicon5,000 words + Add to quote
UrduPronunciation Lexicon20,000 words + Add to quote
VietnamesePronunciation Lexicon8,000 words + Add to quote
WuPronunciation Lexicon10,000 words + Add to quote
XiangPronunciation Lexicon10,000 words + Add to quote
ZuluPronunciation Lexicon75,000 words + Add to quote
Bahasa IndonesiaPart of Speech Lexicon10,000 words + Add to quote
Cantonese (Yue)Part of Speech Lexicon10,000 words + Add to quote
DanishPart of Speech Lexicon100,000 words + Add to quote
English (Canadian)Part of Speech Lexicon3,000 words + Add to quote
English (India)Part of Speech Lexicon10,000 words + Add to quote
English (UK)Part of Speech Lexicon155,000 words + Add to quote
English (US)Part of Speech Lexicon260,000 words + Add to quote
FarsiPart of Speech Lexicon1,400,000 words + Add to quote
FinnishPart of Speech Lexicon10,000 words + Add to quote
ItalianPart of Speech Lexicon140,000 words + Add to quote
JapanesePart of Speech Lexicon265,000 words + Add to quote
KoreanPart of Speech Lexicon100,000 words + Add to quote
NorwegianPart of Speech Lexicon3,000 words + Add to quote
PolishPart of Speech Lexicon4,000 words + Add to quote
Portuguese (Brazilian)Part of Speech Lexicon95,000 words + Add to quote
Portuguese (Portugal)Part of Speech Lexicon100,000 words + Add to quote
RussianPart of Speech Lexicon100,000 words + Add to quote
SwedishPart of Speech Lexicon105,000 words + Add to quote
TurkishPart of Speech Lexicon255,000 words + Add to quote
English (US)Part of Speech Lexicon260,000 words + Add to quote
UrduPart of Speech Lexicon25,000 words + Add to quote
Arabic (MSA)NER Corpus + Add to quote
Arabic (MSA)Thesaurus + Add to quote
Arabic (MSA)Vowelised Text Corpus + Add to quote
English (US)NER Corpus + Add to quote
FarsiNER Corpus + Add to quote
FarsiMorphological Analyser + Add to quote
JapaneseNER Corpus + Add to quote
KoreanNER Corpus + Add to quote
MandarinNER Corpus + Add to quote
RussianNER Corpus + Add to quote
UrduNER Corpus + Add to quote
UrduMorphological Analyser + Add to quote
 



Image

Use Cases


Whether you are working on a text-to-speech system, a voice recognition system or another solution that relies on natural language, high-quality licensed speech and language datasets allow you to go to market faster and reach more potential customers.