Speech and Language Datasets


Our off-the-shelf, licensable datasets for your natural language processing projects



Image

Product Catalog



We offer an extensive catalog of off-the-shelf, licensable datasets for natural language processing. We cover low-resource languages, including dialects from West and North Asia, the Middle East and Africa.

Based on your use case, you can purchase transcribed speech datasets, general and domain specific pronunciation lexicons, POS-tagged lexicons and thesauri, or text corpora annotated for morphological information and named entities.



To receive a quote, select one or multiple languages by clicking
+ Add to quote and then click on Request Quote.



LanguageProductsDetailsQuote
English (Australian)TelephonyAUS_ASR001 + Add to quote
English (Australian)TelephonyAUS_ASR002 + Add to quote
Bahasa IndonesiaConversational TelephonyBAH_ASR001 + Add to quote
French (Belgium)TelephonyBelgian French SpeechDat(II) FDB-1000 (FIXED1BF) + Add to quote
BengaliConversational TelephonyBEN_ASR001 + Add to quote
BulgarianConversational TelephonyBUL_ASR001 + Add to quote
Arabic (UAE/Saudi)MicrophoneCGA_ASR001 + Add to quote
CroatianConversational TelephonyCRO_ASR001 + Add to quote
CroatianMicrophoneCRO_ASR002 + Add to quote
CzechMicrophoneCZE_ASR001 + Add to quote
CzechScripted TelephonyCzech SpeechDat(E) Database + Add to quote
DariConversational TelephonyDAR_ASR001 + Add to quote
DariBroadcast DataDAR_BRC001 + Add to quote
German (Germany)MicrophoneDEU_ASR001 + Add to quote
German (Germany)MicrophoneDEU_ASR003 + Add to quote
Dutch (Netherlands & Belgium)In-CarDutch and Flemish SpeechDat-Car + Add to quote
Arabic (Eastern Algerian)Conversational TelephonyEAR_ASR001 + Add to quote
English (Arabic - Levant/Egypt)Conversational TelephonyENA_ASR001 + Add to quote
English (Canadian)TelephonyENC_ASR001 + Add to quote
English (Filipino)Conversational TelephonyENF_ASR001 + Add to quote
English (Indian)TelephonyENI_ASR001 + Add to quote
English (Indian)Conversational TelephonyENI_ASR002 + Add to quote
Spanish (Latin America)MicrophoneESL_ASR001 + Add to quote
Spanish (Latin America- Chilean and Colombian)Conversational TelephonyESL_ASR002 + Add to quote
Spanish (Spain)MicrophoneESP_ASR001 + Add to quote
Spanish (Spain)MicrophoneESP_TTS001 + Add to quote
Farsi/PersianTelephonyFAR_ASR001 + Add to quote
Farsi/PersianConversational TelephonyFAR_ASR002 + Add to quote
Dutch (Belgium)TelephonyFlemish SpeechDat(II) FDB-1000 (FIXED1FL) + Add to quote
French (Canadian)TelephonyFRC_ASR001 + Add to quote
French (Canadian)Microphone recordingsFRC_ASR002 + Add to quote
French (Canadian)Conversational TelephonyFRC_ASR003 + Add to quote
French (France)TelephonyFrench SpeechDat(II) FDB-1000 + Add to quote
French (France)TelephonyFrench SpeechDat(II) FDB-5000 + Add to quote
French (France)In-CarFrench SpeechDat-Car + Add to quote
French (France)Conversational TelephonyFRF_ASR001 + Add to quote
French (France)MicrophoneFRF_ASR003 + Add to quote
German (Germany)TelephonyGerman SpeechDat (II) FDB-1000 + Add to quote
German (Germany)TelephonyGerman SpeechDat(II) FDB-4000 + Add to quote
HausaMicrophoneHAU_ASR001 + Add to quote
HausaConversational telephonyHAU_ASR002 + Add to quote
HebrewConversational TelephonyHEB_ASR001 + Add to quote
HindiTelephonyHIN_ASR001 + Add to quote
HindiConversational TelephonyHIN_ASR002 + Add to quote
HungarianScripted TelephonyHungarian SpeechDat(E) + Add to quote
ItalianMicrophoneITA_ASR001 + Add to quote
ItalianMicrophoneITA_ASR002 + Add to quote
ItalianConversational TelephonyITA_ASR003 + Add to quote
ItalianMicrophoneITA_TTS001 + Add to quote
ItalianTelephonyItalian Fixed Network Speech SpeechDat(M) Corpus + Add to quote
ItalianTelephonyItalian SpeechDat(II) FDB-3000 + Add to quote
ItalianTelephonyItalian SpeechDat(II) MDB-250 + Add to quote
JapaneseMicrophoneJPN_ASR001 + Add to quote
KannadaConversational TelephonyKAN_ASR001 + Add to quote
KoreanMicrophoneKOR_ASR001 + Add to quote
French (Luxembourg)TelephonyLuxembourgish French SpeechDat(II) FDB-500 (FIXED1LF) + Add to quote
German (Luxembourg)TelephonyLuxembourgish German SpeechDat(II) FDB-500 (FIXED1LG) + Add to quote
MandarinTelephonyMAC_ASR001 + Add to quote
MandarinMicrophoneMAC_ASR002 + Add to quote
MarathiConversational TelephonyMAR_ASR001 + Add to quote
Arabic (MSA)MicrophoneMSA_ASR001 + Add to quote
Dutch (Netherlands)Conversational TelephonyNLD_ASR001 + Add to quote
English (Arabic - UAE)TelephonyOrienTel English as spoken in the United Arab Emirates + Add to quote
German (Turkish)TelephonyOrienTel German Spoken by Turkish + Add to quote
TurkishTelephonyOrienTel Turkish Database + Add to quote
Arabic (United Arab Emirates)TelephonyOrienTel United Arab Emirates MCA (Modern Colloquial Arabic) + Add to quote
Arabic (United Arab Emirates)TelephonyOrienTel United Arab Emirates MSA (Modern Standard Arabic) + Add to quote
PashtoConversational TelephonyPAS_ASR001 + Add to quote
PashtoConversational microphone dataPAS_ASR002 + Add to quote
PashtoBroadcast DataPAS_BRC001 + Add to quote
PolishMicrophonePOL_ASR001 + Add to quote
PolishScripted TelephonyPolish SpeechDat(E) Database + Add to quote
Portuguese (Brazilian)MicrophonePTB_ASR001 + Add to quote
Portuguese (Brazilian)Conversational TelephonyPTB_ASR002 + Add to quote
Portuguese (Portugal)Conversational TelephonyPTP_ASR001 + Add to quote
Punjabi (Pakistan)Conversational TelephonyPAP_ASR001 + Add to quote
RomanianConversational TelephonyROM_ASR001 + Add to quote
RussianConversational TelephonyRUS_ASR001 + Add to quote
RussianMicrophoneRUS_ASR002 + Add to quote
RussianScripted TelephonyRussian SpeechDat(E) Database + Add to quote
SlovakScripted TelephonySlovak SpeechDat(E) Database + Add to quote
SlovenianTelephonySlovenian SpeechDat(II) FDB-1000 + Add to quote
SomaliConversational TelephonySOM_ASR001 + Add to quote
Sorani (Kurdish)Conversational TelephonySOR_ASR001 + Add to quote
ItalianTelephonySpeechDat(M) Italian Mobile Network Speech Database + Add to quote
DanishMicrophoneSpeecon Danish + Add to quote
Dutch (Belgium)MicrophoneSpeecon Dutch from Belgium + Add to quote
Dutch (Netherlands)MicrophoneSpeecon Dutch from the Netherlands + Add to quote
English (US)MicrophoneSpeecon English (USA) database + Add to quote
German (Switzerland)MicrophoneSpeecon German (Switzerland) database + Add to quote
JapaneseMicrophoneSpeecon Japanese + Add to quote
RussianMicrophoneSpeecon Russian Database + Add to quote
Spanish (Spain)MicrophoneSpeecon Spanish Database + Add to quote
SwedishMicrophoneSWE_ASR001 + Add to quote
English (UK)TC-STAR female baseline voice Laura + Add to quote
English (UK)TC-STAR male baseline voice Ian + Add to quote
ThaiMicrophoneTHA_ASR001 + Add to quote
TurkishConversational TelephonyTUR_ASR001 + Add to quote
TurkishMicrophoneTUR_ASR002 + Add to quote
English (UK)Conversational TelephonyUKE_ASR001 + Add to quote
UrduConversational TelephonyURD_ASR001 + Add to quote
English (US)Studio/microphone recordingsUSE_ASR001 + Add to quote
English (US)Conversational TelephonyUSE_ASR003 + Add to quote
VietnameseMicrophoneVIE_ASR001 + Add to quote
KannadaConversational TelephonyKAN_ASR001A + Add to quote
MarathiConversational TelephonyMAR_ASR001A + Add to quote
BulgarianMicrophoneBUL_ASR002 + Add to quote
Arabic (Moroccan)Conversational TelephonyARY_ASR001 + Add to quote
Arabic (Moroccan)Conversational TelephonyARY_MT001 + Add to quote
AmharicPronunciation Lexicon45,000 words + Add to quote
Arabic (Algerian)Pronunciation Lexicon10,000 words + Add to quote
Arabic (Egyptian)Pronunciation Lexicon40,000 words + Add to quote
Arabic (Gulf)Pronunciation Lexicon75,000 words + Add to quote
Arabic (Iraqi)Pronunciation Lexicon15,000 words + Add to quote
Arabic (Maghrebi)Pronunciation Lexicon10,000 words + Add to quote
Arabic (MSA)Pronunciation Lexicon15,000 words + Add to quote
Arabic (North Levantine)Pronunciation Lexicon25,000 words + Add to quote
Arabic (Palestinian)Pronunciation Lexicon25,000 words + Add to quote
Arabic (South Levantine)Pronunciation Lexicon30,000 words + Add to quote
Arabic (Sudanese)Pronunciation Lexicon10,000 words + Add to quote
Arabic (Syrian)Pronunciation Lexicon25,000 words + Add to quote
Arabic (UAE/Saudi)Pronunciation Lexicon75,000 words + Add to quote
AssamesePronunciation Lexicon40,000 words + Add to quote
Bahasa IndonesiaPronunciation Lexicon95,000 words + Add to quote
BasquePronunciation Lexicon10,000 words + Add to quote
BengaliPronunciation Lexicon25,000 words + Add to quote
BulgarianPronunciation Lexicon55,000 words + Add to quote
CatalanPronunciation Lexicon10,000 words + Add to quote
CebuanoPronunciation Lexicon20,000 words + Add to quote
CroatianPronunciation Lexicon15,000 words + Add to quote
CzechPronunciation Lexicon45,000 words + Add to quote
DanishPronunciation Lexicon105,000 words + Add to quote
DariPronunciation Lexicon15,000 words + Add to quote
DholuoPronunciation Lexicon20,000 words + Add to quote
DutchPronunciation Lexicon45,000 words + Add to quote
English (Australian)Pronunciation Lexicon155,000 words + Add to quote
English (Canadian)Pronunciation Lexicon50,000 words + Add to quote
English (Filipino)Pronunciation Lexicon5,000 words + Add to quote
English (Hong Kong)Pronunciation Lexicon15,000 words + Add to quote
English (Indian)Pronunciation Lexicon30,000 words + Add to quote
English (Ireland)Pronunciation Lexicon10,000 words + Add to quote
English (New Zealand)Pronunciation Lexicon50,000 words + Add to quote
English (UK)Pronunciation Lexicon175,000 words + Add to quote
English (US)Pronunciation Lexicon290,000 words + Add to quote
FarsiPronunciation Lexicon75,000 words + Add to quote
FinnishPronunciation Lexicon85,000 words + Add to quote
French (Belgian)Pronunciation Lexicon5,000 words + Add to quote
French (Canadian)Pronunciation Lexicon65,000 words + Add to quote
French (France)Pronunciation Lexicon110,000 words + Add to quote
French (Luxembourg)Pronunciation Lexicon5,000 words + Add to quote
French (Swiss)Pronunciation Lexicon5,000 words + Add to quote
German (Austria)Pronunciation Lexicon5,000 words + Add to quote
German (Germany)Pronunciation Lexicon145,000 words + Add to quote
German (Swiss)Pronunciation Lexicon5,000 words + Add to quote
GreekPronunciation Lexicon5,000 words + Add to quote
GuaraniPronunciation Lexicon35,000 words + Add to quote
Haitian CreolePronunciation Lexicon15,000 words + Add to quote
HausaPronunciation Lexicon10,000 words + Add to quote
HebrewPronunciation Lexicon25,000 words + Add to quote
HindiPronunciation Lexicon35,000 words + Add to quote
HungarianPronunciation Lexicon500 words + Add to quote
IgboPronunciation Lexicon30,000 words + Add to quote
ItalianPronunciation Lexicon190,000 words + Add to quote
JapanesePronunciation Lexicon260,000 words + Add to quote
JavanesePronunciation Lexicon20,000 words + Add to quote
KannadaPronunciation Lexicon35,000 words + Add to quote
KazakhPronunciation Lexicon30,000 words + Add to quote
KoreanPronunciation Lexicon105,000 words + Add to quote
KurmanjiPronunciation Lexicon60,000 words + Add to quote
LaoPronunciation Lexicon15,000 words + Add to quote
LithuanianPronunciation Lexicon60,000 words + Add to quote
MalayalamPronunciation Lexicon4,000 words + Add to quote
MalaysianPronunciation Lexicon30,000 words + Add to quote
MarathiPronunciation Lexicon30,000 words + Add to quote
MongolianPronunciation Lexicon30,000 words + Add to quote
NorwegianPronunciation Lexicon115,000 words + Add to quote
OriyaPronunciation Lexicon15,000 words + Add to quote
PashtoPronunciation Lexicon65,000 words + Add to quote
PolishPronunciation Lexicon40,000 words + Add to quote
Portuguese (Brazilian)Pronunciation Lexicon100,000 words + Add to quote
Portuguese (Portugal)Pronunciation Lexicon110,000 words + Add to quote
RomanianPronunciation Lexicon15,000 words + Add to quote
RussianPronunciation Lexicon115,000 words + Add to quote
SerbianPronunciation Lexicon15,000 words + Add to quote
SomaliPronunciation Lexicon20,000 words + Add to quote
Sorani (Kurdish)Pronunciation Lexicon25,000 words + Add to quote
Spanish (Castilian)Pronunciation Lexicon90,000 words + Add to quote
Spanish (Latin American)Pronunciation Lexicon10,000 words + Add to quote
Spanish (US)Pronunciation Lexicon90,000 words + Add to quote
Swahili (Kenya)Pronunciation Lexicon65,000 words + Add to quote
SwedishPronunciation Lexicon105,000 words + Add to quote
SylhetiPronunciation Lexicon20,000 words + Add to quote
TagalogPronunciation Lexicon30,000 words + Add to quote
TamilPronunciation Lexicon105,000 words + Add to quote
TeluguPronunciation Lexicon50,000 words + Add to quote
ThaiPronunciation Lexicon30,000 words + Add to quote
Tok PisinPronunciation Lexicon5,000 words + Add to quote
TurkishPronunciation Lexicon255,000 words + Add to quote
UkrainianPronunciation Lexicon5,000 words + Add to quote
UrduPronunciation Lexicon20,000 words + Add to quote
VietnamesePronunciation Lexicon8,000 words + Add to quote
WuPronunciation Lexicon10,000 words + Add to quote
XiangPronunciation Lexicon10,000 words + Add to quote
ZuluPronunciation Lexicon75,000 words + Add to quote
Bahasa IndonesiaPart of Speech Lexicon10,000 words + Add to quote
Cantonese (Yue)Part of Speech Lexicon10,000 words + Add to quote
DanishPart of Speech Lexicon100,000 words + Add to quote
English (Canadian)Part of Speech Lexicon3,000 words + Add to quote
English (India)Part of Speech Lexicon10,000 words + Add to quote
English (UK)Part of Speech Lexicon155,000 words + Add to quote
English (US)Part of Speech Lexicon260,000 words + Add to quote
FarsiPart of Speech Lexicon1,400,000 words + Add to quote
FinnishPart of Speech Lexicon10,000 words + Add to quote
ItalianPart of Speech Lexicon140,000 words + Add to quote
JapanesePart of Speech Lexicon265,000 words + Add to quote
KoreanPart of Speech Lexicon100,000 words + Add to quote
NorwegianPart of Speech Lexicon3,000 words + Add to quote
PolishPart of Speech Lexicon4,000 words + Add to quote
Portuguese (Brazilian)Part of Speech Lexicon95,000 words + Add to quote
Portuguese (Portugal)Part of Speech Lexicon100,000 words + Add to quote
RussianPart of Speech Lexicon100,000 words + Add to quote
SwedishPart of Speech Lexicon105,000 words + Add to quote
TurkishPart of Speech Lexicon255,000 words + Add to quote
English (US)Part of Speech Lexicon260,000 words + Add to quote
UrduPart of Speech Lexicon25,000 words + Add to quote
Arabic (MSA)NER Corpus + Add to quote
Arabic (MSA)Thesaurus + Add to quote
Arabic (MSA)Vowelised Text Corpus + Add to quote
English (US)NER Corpus + Add to quote
FarsiNER Corpus + Add to quote
FarsiMorphological Analyser + Add to quote
JapaneseNER Corpus + Add to quote
KoreanNER Corpus + Add to quote
MandarinNER Corpus + Add to quote
RussianNER Corpus + Add to quote
UrduNER Corpus + Add to quote
UrduMorphological Analyser + Add to quote
 



Image

Use Cases


Whether you are working on a text-to-speech system, a voice recognition system or another solution that relies on natural language, high-quality licensed speech and language datasets allow you to go to market faster and reach more potential customers.