Off-the-Shelf Datasets


Our licensable datasets to jumpstart your AI projects



Image

Product Catalog



While open data or public datasets are convenient, we offer an extensive catalog of ‘off-the-shelf’, 250+ licensable datasets across 80 languages across multiple dialects for a variety of common AI use cases. We are excited to announce 30+ new datasets that deliver immediate value to our customers. Among our offerings, you will find datasets for speech recognition, learning datasets for machine learning algorithms, all created with the most advanced available data science.




Image

Speed



Available immediately to support your AI/ML projects today


Image

Cost Effective



Licensed datasets are more economical than custom data collection


Image

Expertise



20+ years’ data collection experience


Image

Support All Data Types



Image, video, speech, audio, and text


Image

Scale



Provide the right amount of data to train your models effectively

Image

Quality



Improve quality and minimize bias in your AI models






Dataset NameProduct TypeCommon Use CasesRecording DeviceUnit
Dataset NameProduct TypeCommon Use CasesRecording DeviceUnit
135
Down arrow Product Type ots-text Albanian (Albania) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A12,000 words Add Quotesqi_ALB_PHONAppen GlobalPronunciation DictionaryAlbanianAlbaniaN/AN/AN/AN/A12,000N/AtextAlbanian (Albania) Pronunciation Dictionary
136
Down arrow Product Type ots-text Amharic (Ethiopia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A45,000 words Add Quoteamh_ETH_PHONAppen GlobalPronunciation DictionaryAmharicEthiopiaN/AN/AN/AN/A45,000N/AtextAmharic (Ethiopia) Pronunciation Dictionary
141
Down arrow Product Type ots-text Arabic (Algeria) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A11,000 words Add Quoteara_DZA_PHONAppen GlobalPronunciation DictionaryArabicAlgeriaN/AN/AN/AN/A11,000N/AtextArabic (Algeria) Pronunciation Dictionary
20
Down arrow Product Type ots-sound Arabic (Eastern Algeria) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline29 hours Add QuoteEAR_ASR001Appen GlobalConversational SpeechArabicAlgeriaLow background noise (home/office)4962Available on request11,3278alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For the majority of calls, both speakers (in-line/out-line) were collected and transcribed however, for a smaller number of calls, only one half of the conversation was collected and transcribed
Arabic (Eastern Algeria) conversational telephony
137
Down arrow Product Type ots-text Arabic (Egypt) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A40,000 words Add Quoteara_EGY_PHONAppen GlobalPronunciation DictionaryArabicEgyptN/AN/AN/AN/A40,000N/AtextArabic (Egypt) Pronunciation Dictionary
114
Down arrow Product Type ots-sound Arabic (Egypt) scripted smartphone
Audio ASR, Virtual Assistant, ChatbotMobile phone352 hours Add QuoteARE_ASR001_CNAppen ChinaScripted SpeechArabicEgyptLow background noise (home/office)6271128,908207,57616wavDataset contains audio with corresponding text prompts
Text prompts are not vowelised
Arabic (Egypt) scripted smartphone
139
Down arrow Product Type ots-text Arabic (Iraq) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A13,000 words Add Quoteara_IRQ_POSAppen GlobalPart of Speech DictionaryArabicIraqN/AN/AN/AN/A13,000N/AtextArabic (Iraq) Part of Speech Dictionary
138
Down arrow Product Type ots-text Arabic (Iraq) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A15,000 words Add Quoteara_IRQ_PHONAppen GlobalPronunciation DictionaryArabicIraqN/AN/AN/AN/A15,000N/AtextPerson namesArabic (Iraq) Pronunciation Dictionary
140
Down arrow Product Type ots-text Arabic (Libya) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A48,000 words Add Quoteara_LBY_PHONAppen GlobalPronunciation DictionaryArabicLibyaN/AN/AN/AN/A48,000N/AtextArabic (Libya) Pronunciation Dictionary
65
Down arrow Product Type ots-sound Arabic (Modern Standard Arabic) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone12 hours Add QuoteMSA_ASR001Global PhoneScripted SpeechArabicTunisiaLow background noise (home/office)7814,908Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Arabic (Modern Standard Arabic) scripted microphone
112
Down arrow Product Type ots-sound Arabic (Morocco) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline33 hours Add QuoteARY_ASR001Appen GlobalConversational SpeechArabicMoroccoLow background noise180280,54423,8368alawEach speaker participated in 1 to 4 conversations. Speakers are identified by a unique 4-digit speaker ID which is recorded in the demographic file
Transcription is available in original script and fully reversible Romanised version with accompanying pronunciation lexicon
English translation of product transcription is available (ARY_MT001, ARY_ASRMT001)
Arabic (Morocco) conversational telephony
113
Down arrow Product Type ots-text Arabic (Morocco) conversational telephony translation
Text MT, Chatbot , Conversational AIN/A80,544 utterances Add QuoteARY_MT001Appen GlobalConversational TranslationArabicMoroccoN/A180N/A80,43023,844N/AtextCorresponding audio, transcription, fully reversible romanised transcription and pronunciation lexicon data are available (ARY_ASR001, ARY_ASRMT001)Arabic (Morocco) conversational telephony translation
143
Down arrow Product Type ots-text Arabic (Morocco) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A60,000 words Add Quoteara_MAR_PHONAppen GlobalPronunciation DictionaryArabicMoroccoN/AN/AN/AN/A60,000N/AtextArabic (Morocco) Pronunciation Dictionary
144
Down arrow Product Type ots-text Arabic (N/A) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A40,000 words Add Quotearb_N/A_PHONAppen GlobalPronunciation DictionaryArabicN/AN/AN/AN/AN/A40,000N/AtextArabic (N/A) Pronunciation Dictionary
115
Down arrow Product Type ots-sound Arabic (Saudi Arabia) scripted smartphone
Audio ASR, Virtual Assistant, ChatbotMobile phone322 hours Add QuoteARS_ASR001_CNAppen ChinaScripted SpeechArabicSaudi ArabiaLow background noise (home/office)2271104,574156,28216wavDataset contains audio with corresponding text prompts
Text prompts are not vowelised
300-1000 prompts per speaker covering general content including education, sports, entertainment, travel, culture and technology
Arabic (Saudi Arabia) scripted smartphone
146
Down arrow Product Type ots-text Arabic (Sudan) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A17,000 words Add Quoteara_SDN_PHONAppen GlobalPronunciation DictionaryArabicSudanN/AN/AN/AN/A17,000N/AtextArabic (Sudan) Pronunciation Dictionary
145
Down arrow Product Type ots-text Arabic (United Arab Emirates (UAE)) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A75,000 words Add Quoteara_ARE_PHONAppen GlobalPronunciation DictionaryArabicUnited Arab Emirates (UAE)N/AN/AN/AN/A75,000N/AtextArabic (United Arab Emirates (UAE)) Pronunciation Dictionary
120
Down arrow Product Type ots-sound Arabic (United Arab Emirates (UAE)) scripted smartphone
Audio ASR, Virtual Assistant, ChatbotMobile phone170 hours Add QuoteARU_ASR001_CNAppen ChinaScripted SpeechArabicUnited Arab Emirates (UAE)Low background noise (home/office)133142,35285,77516wavDataset contains audio with corresponding text prompts
Text prompts are not vowelised
Arabic (United Arab Emirates (UAE)) scripted smartphone
70
Down arrow Product Type ots-sound Arabic (United Arab Emirates (UAE)) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline48 hours Add QuoteOrienTel United Arab Emirates MCA (Modern Colloquial Arabic)NuanceScripted SpeechArabicUnited Arab Emirates (UAE)Low background noise880143,000Available on request8alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
49 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
Arabic (United Arab Emirates (UAE)) scripted telephony
71
Down arrow Product Type ots-sound Arabic (United Arab Emirates (UAE)) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline31 hours Add QuoteOrienTel United Arab Emirates MSA (Modern Standard Arabic)NuanceScripted SpeechArabicUnited Arab Emirates (UAE)Low background noise500124,500Available on request8alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
49 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
Arabic (United Arab Emirates (UAE)) scripted telephony
9
Down arrow Product Type ots-sound Arabic (United Arab Emirates (UAE)/ Saudi Arabia) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone86 hours Add QuoteCGA_ASR001Appen GlobalScripted SpeechArabicUnited Arab Emirates (UAE) - Saudi ArabiaLow background noise (home/office)150442,00019,24516alawFully transcribed with acoustic event tagging derived from the SpeechDAT conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
All transcriptions fully vowelized
280 prompts per speaker including 30 Person names (first name and family name) from a set of 15, 10 single isolated digits 0-10, 8-digit sequences (randomly generated), 200 phonetically balanced sentences, 30 x 10-word phonetically balanced word strings
Arabic (United Arab Emirates (UAE)/ Saudi Arabia) scripted microphone
127
Down arrow Product Type ots-text Arabic NER news text
Text NER, Content Classification, Search EnginesN/A20,774 sentences Add QuoteARB_NER001Appen GlobalNews NERStandard ArabicN/AN/AN/AN/A20,774Available on requestN/AtextArabic NER news text
147
Down arrow Product Type ots-text Assamese (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A40,000 words Add Quoteasm_IND_PHONAppen GlobalPronunciation DictionaryAssameseIndiaN/AN/AN/AN/A40,000N/AtextAssamese (India) Pronunciation Dictionary
121
Down arrow Product Type ots-sound Baby crying audio
Audio Baby Monitor, Security & Other Consumer ApplicationsMobile phone3 hours Add QuoteCRY_ASR001Appen ChinaHuman SoundN/AChinaLow background noise (home/office)1001N/AN/A16wavCrying sound of babies 0-3 years old, each lasting around 2 minutes.Baby crying audio
4
Down arrow Product Type ots-sound Bahasa Indonesia conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline31 hours Add QuoteBAH_ASR001Appen GlobalConversational SpeechIndonesianIndonesiaLow background noise1,002230,69511,4808wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For a large proportion of calls, only one half of the conversation was collected and transcribed
Bahasa Indonesia conversational telephony
150
Down arrow Product Type ots-text Basque (Spain) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A10,000 words Add Quoteeus_ESP_PHONAppen GlobalPronunciation DictionaryBasqueSpainN/AN/AN/AN/A10,000N/AtextBasque (Spain) Pronunciation Dictionary
6
Down arrow Product Type ots-sound Bengali (Bangladesh) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline47 hours Add QuoteBEN_ASR001Appen GlobalConversational SpeechBengaliBangladeshMixed (in-car, roadside, home/office)1,0002108,92317,9228alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Bengali (Bangladesh) conversational telephony
151
Down arrow Product Type ots-text Bengali (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A29,000 words Add Quoteben_IND_PHONAppen GlobalPronunciation DictionaryBengaliIndiaN/AN/AN/AN/A29,000N/AtextBengali (India) Pronunciation Dictionary
7
Down arrow Product Type ots-sound Bulgarian (Bulgaria) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline38 hours Add QuoteBUL_ASR001Appen GlobalConversational SpeechBulgarianBulgariaLow background noise (home/office)217286,45322,3428alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project - 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
Bulgarian (Bulgaria) conversational telephony
152
Down arrow Product Type ots-text Bulgarian (Bulgaria) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A55,000 words Add Quotebul_BGR_PHONAppen GlobalPronunciation DictionaryBulgarianBulgariaN/AN/AN/AN/A55,000N/AtextBulgarian (Bulgaria) Pronunciation Dictionary
111
Down arrow Product Type ots-sound Bulgarian (Bulgaria) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone22 hours Add QuoteBUL_ASR002Global PhoneScripted SpeechBulgarianBulgariaLow background noise (home/office)7718,674Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Bulgarian (Bulgaria) scripted microphone
268
Down arrow Product Type ots-image Business-to-business printed text document OCR
Image Document Processing, Document SearchCamera, scan4,362 documents Add QuoteIMG_OCR_B2BAppen GlobalDocument OCRN/AN/AMixed lighting conditionsN/AN/AN/AN/AN/AjpgScans and photographs of business-to-business documents containing printed text. 48% Premium Quality images including Purchase Order, Payment Advice or Remittance Advice, Order Confirmation and Delivery note; 52% Standard Quality images in various challenging conditions in a wider range of categories including Complaints or Return, Delivery advice, Delivery note, Dunning, Goods receipt, Invoice, Offer, Order confirmation, Pay slip, Payment Advice or Remittance Advice, Purchase Order, Receipt, and Supplier loadBusiness-to-business printed text document OCR
269
Down arrow Product Type ots-image Business-to-consumer/other text document OCR
Image Document Processing, Document SearchCamera, scan26,020 documents Add QuoteIMG_OCR_B2C_OtherAppen GlobalDocument OCRN/AN/AMixed lighting conditionsN/AN/AN/AN/AN/AjpgScans and photographs of business-to-consumer and miscellaneous other category documents containing text: 37% invoices, 42% receipts, 1% documents with tables, 2% handwritten forms and documents, 2% menus, 11% product labels, 2% posters, 3% street signs. 6 Languages collected in 23+ locales: 11% Arabic, 43% English, 4% French, 4% German, 24% Spanish, 14% RussianBusiness-to-consumer/other text document OCR
155
Down arrow Product Type ots-text Cantonese (China) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A10,000 words Add Quoteyue_HKG_POSAppen GlobalPart of Speech DictionaryCantoneseChinaN/AN/AN/AN/A10,000N/AtextTraditionalCantonese (China) Part of Speech Dictionary
153
Down arrow Product Type ots-text Cantonese (China) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A37,000 words Add Quoteyue_CHN_PHONAppen GlobalPronunciation DictionaryCantoneseChinaN/AN/AN/AN/A37,000N/AtextSimplifiedCantonese (China) Pronunciation Dictionary
154
Down arrow Product Type ots-text Cantonese (China) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A40,000 words Add Quoteyue_CHN_PHONAppen GlobalPronunciation DictionaryCantoneseChinaN/AN/AN/AN/A40,000N/AtextTraditionalCantonese (China) Pronunciation Dictionary
156
Down arrow Product Type ots-text Catalan (Spain) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A10,000 words Add Quotecat_ESP_PHONAppen GlobalPronunciation DictionaryCatalanSpainN/AN/AN/AN/A10,000N/AtextCatalan (Spain) Pronunciation Dictionary
157
Down arrow Product Type ots-text Cebuano (Philippines) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A20,000 words Add Quoteceb_PHL_PHONAppen GlobalPronunciation DictionaryCebuanoPhilippinesN/AN/AN/AN/A20,000N/AtextCebuano (Philippines) Pronunciation Dictionary
265
Down arrow Product Type ots-sound Chinese (foreigner) (Multinational) Scripted Speech
Audio ASR, Conversational AI, Speech AnalyticsMobile phone200 hours Add QuoteFOREIGNER_ASR001_CNAppen ChinaScripted SpeechChinese (foreigner)MultinationalLow background noise309116wavThis database contains 200 hours of foreigners speaking Chinese from the following countries: Argentina, Egypt, Australia, Russia, the Philippines, Kazakhstan, Korea, Kyrgyzstan, Canada, Kuala Lumpur, Kenya, Laos, Malaysia, Mauritius, the United States, Mongolia, South Africa, Japan, Tajikistan, Thailand, Turkey, Hong Kong, Singapore, India, Indonesia, Vietnam
There is no data from South Korea, Brazil, or data recorded by minors.
Each session lasts about an hour; sentence duration ranges between 3-10 seconds
The content is in the form of an individual reading while being recorded on a mobile phone in a home/office environment.
Sensitive data and personal information has been scrubbed.
Chinese (foreigner) (Multinational) Scripted Speech
10
Down arrow Product Type ots-sound Croatian (Croatia) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline39 hours Add QuoteCRO_ASR001Appen GlobalConversational SpeechCroatianCroatiaLow background noise (home/office)2002Available on request23,9198alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project - 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
Croatian (Croatia) conversational telephony
158
Down arrow Product Type ots-text Croatian (Croatia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A20,000 words Add Quotehrv_HRV_PHONAppen GlobalPronunciation DictionaryCroatianCroatiaN/AN/AN/AN/A20,000N/AtextCroatian (Croatia) Pronunciation Dictionary
11
Down arrow Product Type ots-sound Croatian (Croatia) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone11 hours Add QuoteCRO_ASR002Global PhoneScripted SpeechCroatianCroatiaLow background noise (home/office)9414,499Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Croatian (Croatia) scripted microphone
116
Down arrow Product Type ots-sound Croatian (Croatia) scripted smartphone
Audio ASR, Virtual Assistant, ChatbotMobile phone263 hours Add QuoteCRO_ASR003_CNAppen ChinaScripted SpeechCroatianCroatiaLow background noise (home/office)243173,467136,14016wavDataset contains audio with corresponding text promptsCroatian (Croatia) scripted smartphone
159
Down arrow Product Type ots-text Czech (Czech Republic) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A50,000 words Add Quoteces_CZE_PHONAppen GlobalPronunciation DictionaryCzechCzech RepublicN/AN/AN/AN/A50,000N/AtextCzech (Czech Republic) Pronunciation Dictionary
12
Down arrow Product Type ots-sound Czech (Czech Republic) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone31 hours Add QuoteCZE_ASR001Global PhoneScripted SpeechCzechCzech RepublicLow background noise (home/office)102112,425Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Czech (Czech Republic) scripted microphone
13
Down arrow Product Type ots-sound Czech (Czech Republic) scripted telephony
Audio ASR, Virtual AssistantLandline only93 hours Add QuoteCzech SpeechDat(E) DatasetNuanceScripted SpeechCzechCzech RepublicLow background noise1,000152,000Available on request8alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
52 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, and phonetically rich words and sentences
Czech (Czech Republic) scripted telephony
161
Down arrow Product Type ots-text Danish (Denmark) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A100,000 words Add Quotedan_DNK_POSAppen GlobalPart of Speech DictionaryDanishDenmarkN/AN/AN/AN/A100,000N/AtextDanish (Denmark) Part of Speech Dictionary
160
Down arrow Product Type ots-text Danish (Denmark) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A107,000 words Add Quotedan_DNK_PHONAppen GlobalPronunciation DictionaryDanishDenmarkN/AN/AN/AN/A107,000N/AtextDanish (Denmark) Pronunciation Dictionary
90
Down arrow Product Type ots-sound Danish (Denmark) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone53 hours Add QuoteSpeecon DanishNuanceScripted SpeechDanishDenmarkMixed (office, entertainment, car, public place)600 (550 adult speakers and 50 child speakers)4170,000Available on request16alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Danish (Denmark) scripted microphone
15
Down arrow Product Type ots-sound Dari (Afghanistan) broadcast
Audio ASR, Automatic Captioning, Keyword SpottingMicrophone51 hours Add QuoteDAR_BRC001Appen GlobalBroadcast SpeechDariAfghanistanLow background noise (studio)N/A1Available on requestAvailable on requestN/AwavDataset is fully transcribed and timestamped
Pronunciation lexicon not currently available but can be developed upon request
Dataset is largely speech only and does not include music or advertisements
Data types include: talk shows, interviews, news broadcasts (excluding news reading by anchors)
Dari (Afghanistan) broadcast
14
Down arrow Product Type ots-sound Dari (Afghanistan) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline40 hours Add QuoteDAR_ASR001Appen GlobalConversational SpeechDariAfghanistanLow background noise5002Available on request11,1688alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Dataset is largely speech only and does not include music or advertisements
Dari (Afghanistan) conversational telephony
162
Down arrow Product Type ots-text Dari (Afghanistan) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A30,000 words Add Quoteprs_AFG_PHONAppen GlobalPronunciation DictionaryDariAfghanistanN/AN/AN/AN/A30,000N/AtextDari (Afghanistan) Pronunciation Dictionary
163
Down arrow Product Type ots-text Dholuo (Kenya) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A20,000 words Add Quoteluo_KEN_PHONAppen GlobalPronunciation DictionaryDholuoKenyaN/AN/AN/AN/A20,000N/AtextDholuo (Kenya) Pronunciation Dictionary
258
Down arrow Product Type ots-sound Dongbei dialect (China) Conversational Speech
Audio ASR, Conversational AI, Speech AnalyticsRecording pen/microphone84.6 hours Add QuoteDONGBEI_ASR001_CNAppen ChinaConversational SpeechDongbei dialectChinaLow background noise268116wavAudio only; transcription not included
Audio recordings cover 19 districts: Shenyang Heping District, Shenhe District, Huanggu District, Dadong District, Tiexi District, Lvyuan District, Chaoyang District, Kuancheng District, Erdao District, Nanguan District, Daoli District, Nangang District, Daowai District, Pingfang District, Songbei District, Xiangfang District, Hulan District, Acheng District and Shuangcheng District
Northeast suburb accents not included, and no minors were recorded.
Each recording session contains 20-30 minutes of free dialogue between 2-5 people.
Sensitive data and personal information has been scrubbed.
Dongbei dialect (China) Conversational Speech
259
Down arrow Product Type ots-sound Dongbei dialect (China) Conversational Speech
Audio ASR, Conversational AI, Speech AnalyticsMobile phone75.2 hours Add QuoteDONGBEI_ASR002_CNAppen ChinaConversational SpeechDongbei dialectChinaLow background noise18518wavAudio only; transcription not included
Audio recordings cover 19 districts: Shenyang Heping District, Shenhe District, Huanggu District, Dadong District, Tiexi District, Lvyuan District, Chaoyang District, Kuancheng District, Erdao District, Nanguan District, Daoli District, Nangang District, Daowai District, Pingfang District, Songbei District, Xiangfang District, Hulan District, Acheng District and Shuangcheng District
Northeast suburb accents not included, and no minors were recorded.
Each recording session contains 20-30 minutes of free dialogue between 2-5 people.
Sensitive data and personal information has been scrubbed.
Dongbei dialect (China) Conversational Speech
91
Down arrow Product Type ots-sound Dutch (Belgium) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone47 hours Add QuoteSpeecon Dutch from BelgiumNuanceScripted SpeechDutchBelgiumMixed (office, entertainment, car, public place)600 (550 adult speakers and 50 child speakers)4170,000Available on request16alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Dutch (Belgium) scripted microphone
33
Down arrow Product Type ots-sound Dutch (Belgium) scripted telephony
Audio ASR, Virtual AssistantMicrophone80 hours Add QuoteFlemish SpeechDat(II) FDB-1000 (FIXED1FL)NuanceScripted SpeechDutchBelgiumLow background noise1,000152,000Available on request8alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
52 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
Dutch (Belgium) scripted telephony
19
Down arrow Product Type ots-sound Dutch (Netherlands & Belgium) scripted in-car
Audio ASR, Virtual Assistant, In Car HMI & EntertainmentMicrophone and mobile phone27 hours Add QuoteDutch and Flemish SpeechDat-CarNuanceScripted SpeechDutchNetherland - BelgiumMixed (in-car)302515,100Available on request16 and 8alawDataset is fully transcribed and is accompanied by a pronunciation lexicon and validation report
125 prompts per adult speaker including digits, natural numbers, letter strings, personal, place and business names (some spontaneous), generic command and control items, phonetically rich words and sentences and prompts for spontaneous speech
Dutch (Netherlands & Belgium) scripted in-car
66
Down arrow Product Type ots-sound Dutch (Netherlands) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline36 hours Add QuoteNLD_ASR001Appen GlobalConversational SpeechDutchNetherlandsLow background noise2002Available on request14,9648alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project - 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
Dutch (Netherlands) conversational telephony
164
Down arrow Product Type ots-text Dutch (Netherlands) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A45,000 words Add Quotenld_NLD_PHONAppen GlobalPronunciation DictionaryDutchNetherlandsN/AN/AN/AN/A45,000N/AtextDutch (Netherlands) Pronunciation Dictionary
92
Down arrow Product Type ots-sound Dutch (Netherlands) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone68 hours Add QuoteSpeecon Dutch from the NetherlandsNuanceScripted SpeechDutchNetherlandsMixed (office, entertainment, car, public place)600 (550 adult speakers and 50 child speakers)4170,000Available on request16alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Dutch (Netherlands) scripted microphone
122
Down arrow Product Type ots-image East African facial images
Image Facial RecognitionCamera14948 images Add QuoteIMG_FACE_KEN_CNAppen ChinaHuman FaceN/AKenyaMixed background and lighting conditions99N/AN/AN/AN/AjpgImages contain all combinations of 9 different lighting conditions, 2 different distances between participants face and smartphone, 7 different camera angles
A random 32 images per person include occlusions such as sunglasses, masks, wigs or hats
A random 36 shots include different facial expressions including stare, open mouth, pout mouth smile and frown
Lighting conditions: indoor normal light, outdoor normal light, indoor backlight, outdoor backlight, indoor ordinary dark light, full black screen fill light, point light source (white light, street light), neon light, side glare)
Camera angles: front, left 45°, right 45°, left 15°, right 15°, top 30°, bottom 30°
East African facial images
21
Down arrow Product Type ots-sound English (Arabic - Levant/Egypt) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline28 hours Add QuoteENA_ASR001Appen GlobalConversational SpeechEnglishEgyptLow background noise2502Available on request5,6198alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Average length of calls: 10-15 mins
English (Arabic - Levant/Egypt) conversational telephony
166
Down arrow Product Type ots-text English (Australia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A157,000 words Add Quoteeng_AUS_PHONAppen GlobalPronunciation DictionaryEnglishAustraliaN/AN/AN/AN/A157,000N/AtextEnglish (Australia) Pronunciation Dictionary
2
Down arrow Product Type ots-sound English (Australia) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline92 hours Add QuoteAUS_ASR001Appen GlobalScripted SpeechEnglishAustraliaLow background noise (home/office)500182,50035,1378alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
162 prompts (read speech) per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items (from a set of 215), phonetically rich sentences and words
English (Australia) scripted telephony
3
Down arrow Product Type ots-sound English (Australia) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline118 hours Add QuoteAUS_ASR002Appen GlobalScripted SpeechEnglishAustraliaMixed1,000175,00018,9528alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
75 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
The prompts are a mixture of 'read' and 'elicited' items where 5 prompts per script are 'spontaneous free speech'
English (Australia) scripted telephony
168
Down arrow Product Type ots-text English (Canada) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A3,000 words Add Quoteeng_CAN_POSAppen GlobalPart of Speech DictionaryEnglishCanadaN/AN/AN/AN/A3,000N/AtextEnglish (Canada) Part of Speech Dictionary
167
Down arrow Product Type ots-text English (Canada) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A50,000 words Add Quoteeng_CAN_PHONAppen GlobalPronunciation DictionaryEnglishCanadaN/AN/AN/AN/A50,000N/AtextEnglish (Canada) Pronunciation Dictionary
22
Down arrow Product Type ots-sound English (Canada) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline144 hours Add QuoteENC_ASR001Appen GlobalScripted SpeechEnglishCanadaMixed1,000199,00012,4838alaw or wavFully transcribed to SALA II/SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
99 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
English (Canada) scripted telephony
170
Down arrow Product Type ots-text English (Hong Kong) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A18,000 words Add Quoteeng_HKG_PHONAppen GlobalPronunciation DictionaryEnglishHong KongN/AN/AN/AN/A18,000N/AtextEnglish (Hong Kong) Pronunciation Dictionary
271
Down arrow Product Type ots-sound English (India) conversational smartphone
Audio ASR, Conversational AI, Speech AnalyticsMobile phone143 hours Add QuoteENI_ASR003Appen GlobalConversational SpeechEnglishIndiaMixed (home, car, public place, outdoor)2721Available on requestAvailable on request16wavTwo person conversations covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
English (India) conversational smartphone
25
Down arrow Product Type ots-sound English (India) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline67 hours Add QuoteENI_ASR002Appen GlobalConversational SpeechEnglishIndiaLow background noise540277,56511,6468alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
271 telephony conversations are recorded for this project
English (India) conversational telephony
172
Down arrow Product Type ots-text English (India) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A13,000 words Add Quoteeng_IND_POSAppen GlobalPart of Speech DictionaryEnglishIndiaN/AN/AN/AN/A13,000N/AtextEnglish (India) Part of Speech Dictionary
171
Down arrow Product Type ots-text English (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A60,000 words Add Quoteeng_IND_PHONAppen GlobalPronunciation DictionaryEnglishIndiaN/AN/AN/AN/A60,000N/AtextEnglish (India) Pronunciation Dictionary
24
Down arrow Product Type ots-sound English (India) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline217 hours Add QuoteENI_ASR001Appen GlobalScripted SpeechEnglishIndiaMixed2,3581117,9009,1908alawFully transcribed to SpeechDAT type conventions.
Dataset is accompanied by a pronunciation lexicon [SAMPA] containing all transcribed words
49 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
English (India) scripted telephony
173
Down arrow Product Type ots-text English (Ireland) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A12,000 words Add Quoteeng_IRL_PHONAppen GlobalPronunciation DictionaryEnglishIrelandN/AN/AN/AN/A12,000N/AtextEnglish (Ireland) Pronunciation Dictionary
174
Down arrow Product Type ots-text English (NZ) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A50,000 words Add Quoteeng_NZL_PHONAppen GlobalPronunciation DictionaryEnglishNZN/AN/AN/AN/A50,000N/AtextEnglish (NZ) Pronunciation Dictionary
23
Down arrow Product Type ots-sound English (Philippines) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline53 hours Add QuoteENF_ASR001Appen GlobalConversational SpeechEnglishPhilippinesLow background noise450241,6027,2728alaw or wavDataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Average length of calls: 10-15 mins
English (Philippines) conversational telephony
169
Down arrow Product Type ots-text English (Philippines) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A5,000 words Add Quoteeng_PHL_PHONAppen GlobalPronunciation DictionaryEnglishPhilippinesN/AN/AN/AN/A5,000N/AtextEnglish (Philippines) Pronunciation Dictionary
165
Down arrow Product Type ots-text English (United Arab Emirates (UAE)) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A5,000 words Add Quoteeng_ARE_PHONAppen GlobalPronunciation DictionaryEnglishUnited Arab Emirates (UAE)N/AN/AN/AN/A5,000N/AtextEnglish (United Arab Emirates (UAE)) Pronunciation Dictionary
67
Down arrow Product Type ots-sound English (United Arab Emirates (UAE)) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline33 hours Add QuoteOrienTel English as spoken in the United Arab EmiratesNuanceScripted SpeechEnglishUnited Arab Emirates (UAE)Low background noise500125,500Available on request8alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
51 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
English (United Arab Emirates (UAE)) scripted telephony
104
Down arrow Product Type ots-sound English (United Kingdom) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline150 hours Add QuoteUKE_ASR001Appen GlobalConversational SpeechEnglishUnited KingdomLow background noise1,1502298,56224,1938wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
English (United Kingdom) conversational telephony
255
Down arrow Product Type ots-sound English (United Kingdom) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline50 hours Add QuoteUKE_ASR001BAppen GlobalConversational SpeechEnglishUnited KingdomLow background noise1,1502Available on request13,1928wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
English (United Kingdom) conversational telephony
176
Down arrow Product Type ots-text English (United Kingdom) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A155,000 words Add Quoteeng_GBR_POSAppen GlobalPart of Speech DictionaryEnglishUnited KingdomN/AN/AN/AN/A155,000N/AtextEnglish (United Kingdom) Part of Speech Dictionary
175
Down arrow Product Type ots-text English (United Kingdom) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A195,000 words Add Quoteeng_GBR_PHONAppen GlobalPronunciation DictionaryEnglishUnited KingdomN/AN/AN/AN/A195,000N/AtextEnglish (United Kingdom) Pronunciation Dictionary
99
Down arrow Product Type ots-sound English (United Kingdom) scripted microphone - single female
Audio TTSHeadset microphone11 hours Add QuoteTC-STAR female baseline voice LauraNuanceScripted SpeechEnglishUnited KingdomLow background noise (studio)11Available on requestAvailable on request96Available on requestDataset includes manual orthographic transcription, automatic segmentation into phonemes, automatic generation of pitch marks (where a certain percentage of phonetic segments and pitch marks has been manually checked)
Dataset is accompanied by a pronunciation lexicon with POS, lemma and phonetic transcription
English (United Kingdom) scripted microphone - single female
100
Down arrow Product Type ots-sound English (United Kingdom) scripted microphone - single male
Audio TTSHeadset microphone7 hours Add QuoteTC-STAR male baseline voice IanNuanceScripted SpeechEnglishUnited KingdomLow background noise (studio)11Available on requestAvailable on request96Available on requestDataset includes manual orthographic transcription, automatic segmentation into phonemes, automatic generation of pitch marks (where a certain percentage of phonetic segments and pitch marks has been manually checked)
Dataset is accompanied by a pronunciation lexicon with POS, lemma and phonetic transcription
English (United Kingdom) scripted microphone - single male
272
Down arrow Product Type ots-sound English (United States - African American) conversational smartphone
Audio ASR, Conversational AI, Speech AnalyticsMobile phone50 hours Add QuoteUSE_ASR004Appen GlobalConversational SpeechEnglishUnited StatesMixed (home, car, public place, outdoor)Available on request1Available on requestAvailable on request16wavTwo person conversations covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
English (United States - African American) conversational smartphone
266
Down arrow Product Type ots-text English (United States) Conversation SMS - Threaded
Text Virtual Assistant, ChatbotN/A952,677 messages Add QuoteENG_SMS001Appen GlobalSMS text messagesEnglishUnited StatesN/AAvailable on requestN/A952,677Available on requestN/AtextThis dataset contains threaded SMS conversations between 2 participants, using iMessage and Android SMS. All messages are in US English. Contains timestamps and text message exchanges, with metadata including gender, age range and relationship between participants. Consent is obtained from all participants and the dataset does not contain PII.English (United States) Conversation SMS - Threaded
267
Down arrow Product Type ots-text English (United States) Conversation SMS - Threaded
Text Virtual Assistant, ChatbotN/A106,649 messages Add QuoteENG_SMS001AAppen GlobalSMS text messagesEnglishUnited StatesN/A390N/A106,649Available on requestN/AtextThis is a subset of ENG_SMS001. This dataset contains threaded SMS conversations between 2 participants, using iMessage and Android SMS. All messages are in US English. Contains timestamps and text message exchanges, with metadata including gender, age range and relationship between participants. Consent is obtained from all participants and the dataset does not contain PII.English (United States) Conversation SMS - Threaded
270
Down arrow Product Type ots-text English (United States) Conversation WhatsApp - Threaded
Text Virtual Assistant, ChatbotN/A351,826 messages Add QuoteENG_SMS002Appen GlobalWhatsApp text messagesEnglishUnited StatesN/AAvailable on requestN/A351,826Available on requestN/AtextThis dataset contains threaded text message conversations between 2 participants, using WhatsApp. All messages are in US English. Contains timestamps and text message exchanges, with metadata including gender, age range and relationship between participants. Consent is obtained from all participants and the dataset does not contain PII.English (United States) Conversation WhatsApp - Threaded
107
Down arrow Product Type ots-sound English (United States) conversational smartphone
Audio ASR, Conversational AI, Speech AnalyticsMobile phone1000 hours Add QuoteUSE_ASR003Appen GlobalConversational SpeechEnglishUnited StatesLow background noise2,0001500,00052,58616wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Conversations cover a wide variety of topics including: study/major/work, hometown, living arrangements, weather and seasons, punctuality, TV programs/film)
English (United States) conversational smartphone
178
Down arrow Product Type ots-text English (United States) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A263,000 words Add Quoteeng_USA_POSAppen GlobalPart of Speech DictionaryEnglishUnited StatesN/AN/AN/AN/A263,000N/AtextEnglish (United States) Part of Speech Dictionary
177
Down arrow Product Type ots-text English (United States) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A330,000 words Add Quoteeng_USA_PHONAppen GlobalPronunciation DictionaryEnglishUnited StatesN/AN/AN/AN/A330,000N/AtextEnglish (United States) Pronunciation Dictionary
93
Down arrow Product Type ots-sound English (United States) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone53 hours Add QuoteSpeecon English (USA) databaseNuanceScripted SpeechEnglishUnited StatesMixed (office, entertainment, car, public place)600 (550 adult speakers and 50 child speakers)4170,000Available on request16Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
English (United States) scripted microphone
106
Down arrow Product Type ots-sound English (United States) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone62 hours Add QuoteUSE_ASR001Appen GlobalScripted SpeechEnglishUnited StatesLow background noise (studio)200280,00018,31848alaw or wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Each speaker read 400 prompts including digits, natural numbers, personal and city names, telephone numbers, generic command and control items, phonetically rich sentences and words
English (United States) scripted microphone
128
Down arrow Product Type ots-text English NER news text
Text NER, Content Classification, Search EnginesN/A22,768 sentences Add QuoteENG_NER001Appen GlobalNews NEREnglishN/AN/AN/AN/A22,768Available on requestN/AtextEnglish NER news text
132
Down arrow Product Type ots-text Farsi/Persian NER news text
Text NER, Content Classification, Search EnginesN/A19,584 sentences Add QuoteFAR_NER001Appen GlobalNews NERIranian PersianIranN/AN/AN/A19,584Available on requestN/AtextFarsi/Persian NER news text
182
Down arrow Product Type ots-text Finnish (Finland) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A10,000 words Add Quotefin_FIN_POSAppen GlobalPart of Speech DictionaryFinnishFinlandN/AN/AN/AN/A10,000N/AtextFinnish (Finland) Part of Speech Dictionary
125
Down arrow Product Type ots-image Finnish (Finland) printed text OCR
Image Document Processing, Document SearchCamera7293 images Add QuoteIMG_OCR_FIN_CNAppen ChinaDocument OCRFinnishFinlandMixed lighting conditions4N/AN/AN/AN/AjpgImages containing text, such as billboards / outer packaging / signage / magazines / menus, etc.Finnish (Finland) printed text OCR
181
Down arrow Product Type ots-text Finnish (Finland) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A85,000 words Add Quotefin_FIN_PHONAppen GlobalPronunciation DictionaryFinnishFinlandN/AN/AN/AN/A85,000N/AtextFinnish (Finland) Pronunciation Dictionary
142
Down arrow Product Type ots-text French (Algeria) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A4,000 words Add Quotefra_DZA_PHONAppen GlobalPronunciation DictionaryFrenchAlgeriaN/AN/AN/AN/A4,000N/AtextArabic scriptFrench (Algeria) Pronunciation Dictionary
5
Down arrow Product Type ots-sound French (Belgium) scripted telephony
Audio ASR, Virtual AssistantLandline only76 hours Add QuoteBelgian French SpeechDat(II) FDB-1000 (FIXED1BF)NuanceScripted SpeechFrenchBelgiumLow background noise1,000153,000Available on request8alawDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
53 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
French (Belgium) scripted telephony
36
Down arrow Product Type ots-sound French (Canada) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline9 hours Add QuoteFRC_ASR003Appen GlobalConversational SpeechFrenchCanadaMixed682Available on request6,0228alawDataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Average length of calls: 10-15 mins
For the majority of calls, only one half of the conversation was collected and transcribed, however, for a smaller number of calls, both speakers (in-line/out-line) were collected and transcribed
French (Canada) conversational telephony
183
Down arrow Product Type ots-text French (Canada) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A67,000 words Add Quotefra_CAN_PHONAppen GlobalPronunciation DictionaryFrenchCanadaN/AN/AN/AN/A67,000N/AtextFrench (Canada) Pronunciation Dictionary
35
Down arrow Product Type ots-sound French (Canada) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone46 hours Add QuoteFRC_ASR002Appen GlobalScripted SpeechFrenchCanadaLow background noise (home/office)150122,50010,75516alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
150 prompts per speaker including digits, digit strings (randomly generated), addressses and phonetically rich sentences and words
French (Canada) scripted microphone
34
Down arrow Product Type ots-sound French (Canada) scripted telephony
Audio ASR, Virtual AssistantMobile phone131 hours Add QuoteFRC_ASR001Appen GlobalScripted SpeechFrenchCanadaMixed1,0001100,00011,6978alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon [SAMPA] containing all transcribed words
100 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
French (Canada) scripted telephony
275
Down arrow Product Type ots-sound French (France) conversational smartphone
Audio ASR, Conversational AI, Speech AnalyticsMobile phone159 hours Add QuoteFRF_ASR004Appen GlobalConversational SpeechFrenchFranceMixed (home, car, public place, outdoor)2981Available on requestAvailable on request16wavTwo person conversations covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
French (France) conversational smartphone
40
Down arrow Product Type ots-sound French (France) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline25 hours Add QuoteFRF_ASR001Appen GlobalConversational SpeechFrenchFranceLow background noise5632Available on request11,9228alawDataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For the majority of calls, both speakers (in-line/out-line) were collected and transcribed, however, for a smaller number of calls, only one half of the conversation was collected and transcribed
French (France) conversational telephony
39
Down arrow Product Type ots-sound French (France) In-Car
Audio ASR, Virtual Assistant, In Car HMI & EntertainmentMicrophone and mobile phone113 hours Add QuoteFrench SpeechDat-CarNuanceScripted SpeechFrenchFranceMixed (in-car)300537,500Available on request16 and 8Available on requestDataset is fully transcribed and is accompanied by a pronunciation lexicon and validation report
Approximately 125 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names (some spontaneous), generic command and control items, phonetically rich words and sentences and prompts for spontaneous speech
113.7 hours
French (France) In-Car
185
Down arrow Product Type ots-text French (France) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A95,000 words Add Quotefra_FRA_POSAppen GlobalPart of Speech DictionaryFrenchFranceN/AN/AN/AN/A95,000N/AtextFrench (France) Part of Speech Dictionary
184
Down arrow Product Type ots-text French (France) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A112,000 words Add Quotefra_FRA_PHONAppen GlobalPronunciation DictionaryFrenchFranceN/AN/AN/AN/A112,000N/AtextFrench (France) Pronunciation Dictionary
41
Down arrow Product Type ots-sound French (France) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone26 hours Add QuoteFRF_ASR003Global PhoneScripted SpeechFrenchFranceLow background noise (home/office)98110,273Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
French (France) scripted microphone
37
Down arrow Product Type ots-sound French (France) scripted telephony
Audio ASR, Virtual AssistantLandline only41 hours Add QuoteFrench SpeechDat(II) FDB-1000NuanceScripted SpeechFrenchFranceLow background noise (home/office)1,017148,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
48 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
French (France) scripted telephony
38
Down arrow Product Type ots-sound French (France) scripted telephony
Audio ASR, Virtual AssistantLandline only305 hours Add QuoteFrench SpeechDat(II) FDB-5000NuanceScripted SpeechFrenchFranceLow background noise5,0401237,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
47 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
French (France) scripted telephony
60
Down arrow Product Type ots-sound French (Luxembourg) telephony
Audio ASR, Virtual AssistantLandline only45 hours Add QuoteLuxembourgish French SpeechDat(II) FDB-500 (FIXED1LF)NuanceScripted SpeechFrenchLuxembourgLow background noise614132,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
53 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
French (Luxembourg) telephony
273
Down arrow Product Type ots-sound German (Germany) conversational smartphone
Audio ASR, Conversational AI, Speech AnalyticsMobile phone104 hours Add QuoteDEU_ASR004Appen GlobalConversational SpeechGermanGermanyMixed (home, car, public place, outdoor)1981Available on requestAvailable on request16wavTwo person conversations covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
German (Germany) conversational smartphone
186
Down arrow Product Type ots-text German (Germany) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A146,000 words Add Quotedeu_DEU_PHONAppen GlobalPronunciation DictionaryGermanGermanyN/AN/AN/AN/A146,000N/AtextGerman (Germany) Pronunciation Dictionary
16
Down arrow Product Type ots-sound German (Germany) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone16 hours Add QuoteDEU_ASR001Appen GlobalScripted SpeechGermanGermanyLow background noise (studio)127212,7006,82616alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Each speaker read 100 prompts including digits, natural numbers, personal and city names, telephone numbers, generic command and control items, phonetically rich sentences and words
German (Germany) scripted microphone
18
Down arrow Product Type ots-sound German (Germany) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone25 hours Add QuoteDEU_ASR003Global PhoneScripted SpeechGermanGermanyLow background noise (home/office)77110,085Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
German (Germany) scripted microphone
42
Down arrow Product Type ots-sound German (Germany) telephony
Audio ASR, Virtual AssistantLandline only31 hours Add QuoteGerman SpeechDat (II) FDB-1000NuanceScripted SpeechGermanGermanyLow background noise (home/office)988143,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
44 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
German (Germany) telephony
43
Down arrow Product Type ots-sound German (Germany) telephony
Audio ASR, Virtual AssistantLandline only268 hours Add QuoteGerman SpeechDat(II) FDB-4000NuanceScripted SpeechGermanGermanyLow background noise (home/office)4,0001160,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
40 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
German (Germany) telephony
61
Down arrow Product Type ots-sound German (Luxembourg) telephony
Audio ASR, Virtual AssistantLandline only33 hours Add QuoteLuxembourgish German SpeechDat(II) FDB-500 (FIXED1LG)NuanceScripted SpeechGermanLuxembourgLow background noise500126,500Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
53 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
German (Luxembourg) telephony
187
Down arrow Product Type ots-text German (Switzerland) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A15,000 words Add Quotedeu_CHE_PHONAppen GlobalPronunciation DictionaryGermanSwitzerlandN/AN/AN/AN/A15,000N/AtextGerman (Switzerland) Pronunciation Dictionary
94
Down arrow Product Type ots-sound German (Switzerland) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone53 hours Add QuoteSpeecon German (Switzerland) databaseNuanceScripted SpeechGermanSwitzerlandMixed (office, entertainment, car, public place)600 (550 adult speakers and 50 child speakers)4170,000Available on request16Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
German (Switzerland) scripted microphone
68
Down arrow Product Type ots-sound German (Turkey) telephony
Audio ASR, Virtual AssistantMobile phone and landline31 hours Add QuoteOrienTel German Spoken by TurkishNuanceScripted SpeechGermanTurkeyLow background noise300115,600Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
52 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
German (Turkey) telephony
188
Down arrow Product Type ots-text Greek (Greece) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A5,000 words Add Quoteell_GRC_PHONAppen GlobalPronunciation DictionaryGreekGreeceN/AN/AN/AN/A5,000N/AtextGreek (Greece) Pronunciation Dictionary
117
Down arrow Product Type ots-sound Greek (Greece) scripted smartphone
Audio ASR, Virtual Assistant, ChatbotMobile phone191 hours Add QuoteGRE_ASR001_CNAppen ChinaScripted SpeechGreekGreeceLow background noise (home/office)287154,11368,27116wavDataset contains audio with corresponding text promptsGreek (Greece) scripted smartphone
189
Down arrow Product Type ots-text Guarani (Paraguay) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A35,000 words Add Quotegrn_PRY_PHONAppen GlobalPronunciation DictionaryGuaraniParaguayN/AN/AN/AN/A35,000N/AtextGuarani (Paraguay) Pronunciation Dictionary
190
Down arrow Product Type ots-text Haitian Creole (Haiti) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A15,000 words Add Quotehat_HTI_PHONAppen GlobalPronunciation DictionaryHaitian CreoleHaitiN/AN/AN/AN/A15,000N/AtextHaitian Creole (Haiti) Pronunciation Dictionary
45
Down arrow Product Type ots-sound Hausa (Nigeria) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone33 hours Add QuoteHAU_ASR002Appen GlobalConversational SpeechHausaNigeriaLow background noise2002Available on request7,9498alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project - 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
Hausa (Nigeria) conversational telephony
191
Down arrow Product Type ots-text Hausa (Nigeria) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A11,000 words Add Quotehau_NGA_PHONAppen GlobalPronunciation DictionaryHausaNigeriaN/AN/AN/AN/A11,000N/AtextHausa (Nigeria) Pronunciation Dictionary
44
Down arrow Product Type ots-sound Hausa scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone20 hours Add QuoteHAU_ASR001Global PhoneScripted SpeechHausaCameroonLow background noise (home/office)10317,895Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Hausa scripted microphone
46
Down arrow Product Type ots-sound Hebrew (Israel) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline34 hours Add QuoteHEB_ASR001Appen GlobalConversational SpeechHebrewIsraelLow background noise2002Available on request19,2508alaw or wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project - 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
Hebrew (Israel) conversational telephony
192
Down arrow Product Type ots-text Hebrew (Israel) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A31,000 words Add Quoteheb_ISR_PHONAppen GlobalPronunciation DictionaryHebrewIsraelN/AN/AN/AN/A31,000N/AtextHebrew (Israel) Pronunciation Dictionary
48
Down arrow Product Type ots-sound Hindi (India) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline32 hours Add QuoteHIN_ASR002Appen GlobalConversational SpeechHindiIndiaMixed9962Available on request12,2668wavDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For the majority of calls, both speakers (in-line/out-line) were collected and transcribed, however, for a smaller number of calls, only one half of the conversation was collected and transcribed
Hindi (India) conversational telephony
193
Down arrow Product Type ots-text Hindi (India) Pronunciation Dictionary
Text ASR, TTS, Language Modelling35,000 words Add Quotehin_IND_PHONAppen GlobalPronunciation DictionaryHindiIndiaN/AN/AN/AN/A35,000N/AtextHindi (India) Pronunciation Dictionary
47
Down arrow Product Type ots-sound Hindi (India) scripted telephony
Audio ASR, Virtual AssistantMobile phone224 hours Add QuoteHIN_ASR001Appen GlobalScripted SpeechHindiIndiaLow background noise1,920196,0009,8538alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon [SAMPA] containing all transcribed words
50 prompts per speaker including digits, natural numbers, personal, business and place names, web addresses, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
Hindi (India) scripted telephony
126
Down arrow Product Type ots-video Human body movement
Video Fitness Applications, Action Classification, Gesture RecognitionMobile phone2000 videos Add QuoteVED_HUMAN_BODY_CNAppen ChinaHuman BodyN/AChinaMixed background and lighting conditions1000N/AN/AN/AN/Amp4Video clips are approximately 10-20 seconds longHuman body movement
194
Down arrow Product Type ots-text Hungarian (Hungary) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A500 words Add Quotehun_HUN_PHONAppen GlobalPronunciation DictionaryHungarianHungaryN/AN/AN/AN/A500N/AtextHungarian (Hungary) Pronunciation Dictionary
118
Down arrow Product Type ots-sound Hungarian (Hungary) scripted smartphone
Audio ASR, Virtual Assistant, ChatbotMobile phone286 hours Add QuoteHUN_ASR001_CNAppen ChinaScripted SpeechHungarianHungaryLow background noise (home/office)254194,031201,92116wavDataset contains audio with corresponding text promptsHungarian (Hungary) scripted smartphone
49
Down arrow Product Type ots-sound Hungarian (Hungary) scripted telephony
Audio ASR, Virtual AssistantLandline only65 hours Add QuoteHungarian SpeechDat(E)NuanceScripted SpeechHungarianHungaryLow background noise1,000148,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
48 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
Hungarian (Hungary) scripted telephony
195
Down arrow Product Type ots-text Igbo (Nigeria) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A30,000 words Add Quoteibo_NGA_PHONAppen GlobalPronunciation DictionaryIgboNigeriaN/AN/AN/AN/A30,000N/AtextIgbo (Nigeria) Pronunciation Dictionary
149
Down arrow Product Type ots-text Indonesian (Indonesia) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A10,000 words Add Quoteind_IDN_POSAppen GlobalPart of Speech DictionaryIndonesianIndonesiaN/AN/AN/AN/A10,000N/AtextIndonesian (Indonesia) Part of Speech Dictionary
148
Down arrow Product Type ots-text Indonesian (Indonesia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A95,000 words Add Quoteind_IDN_PHONAppen GlobalPronunciation DictionaryIndonesianIndonesiaN/AN/AN/AN/A95,000N/AtextIndonesian (Indonesia) Pronunciation Dictionary
262
Down arrow Product Type ots-sound Inner Mongolian (China) Conversational Speech
Audio ASR, Conversational AI, Speech AnalyticsMobile phone100 hours Add QuoteNMG_ASR001_CNAppen ChinaConversational SpeechInner MongolianChinaLow background noise200116wavAudio only; transcription not included
Audio recordings cover the following areas: Xilingol League, Tongliao, Hohhot. Each recording session contains about 30 minutes of free dialogue between 2 people.
Inner Mongolian (China) Conversational Speech
32
Down arrow Product Type ots-sound Iranian Persian (Farsi) (Iran) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline30 hours Add QuoteFAR_ASR002Appen GlobalConversational SpeechIranian Persian (Farsi)IranMixed1,0002Available on request12,3588wavDataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Iranian Persian (Farsi) (Iran) conversational telephony
31
Down arrow Product Type ots-sound Iranian Persian (Farsi) (Iran) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline85 hours Add QuoteFAR_ASR001Appen GlobalScripted SpeechIranian Persian (Farsi)IranMixed789138,4008,7168alawFully transcribed to OrienTel type conventions
Dataset is accompanied by a pronunciation lexicon [SAMPA] containing all transcribed words
48 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
Iranian Persian (Farsi) (Iran) scripted telephony
180
Down arrow Product Type ots-text Iranian Persian (Iran) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A1,400,000 words Add Quotepes_IRN_POSAppen GlobalPart of Speech DictionaryIranian PersianIranN/AN/AN/AN/A1,400,000N/AtextIranian Persian (Iran) Part of Speech Dictionary
179
Down arrow Product Type ots-text Iranian Persian (Iran) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A80,000 words Add Quotepes_IRN_PHONAppen GlobalPronunciation DictionaryIranian PersianIranN/AN/AN/AN/A80,000N/AtextIranian Persian (Iran) Pronunciation Dictionary
276
Down arrow Product Type ots-sound Italian (Italy) conversational smartphone
Audio ASR, Conversational AI, Speech AnalyticsMobile phone256 hours Add QuoteITA_ASR005Appen GlobalConversational SpeechItalianItalyMixed (home, car, public place, outdoor)4821Available on requestAvailable on request16wavTwo person conversations covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
Italian (Italy) conversational smartphone
52
Down arrow Product Type ots-sound Italian (Italy) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline36 hours Add QuoteITA_ASR003Appen GlobalConversational SpeechItalianItalyLow background noise2002Available on request18,9748alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project - 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
Italian (Italy) conversational telephony
197
Down arrow Product Type ots-text Italian (Italy) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A147,000 words Add Quoteita_ITA_POSAppen GlobalPart of Speech DictionaryItalianItalyN/AN/AN/AN/A147,000N/AtextItalian (Italy) Part of Speech Dictionary
196
Down arrow Product Type ots-text Italian (Italy) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A197,000 words Add Quoteita_ITA_PHONAppen GlobalPronunciation DictionaryItalianItalyN/AN/AN/AN/A197,000N/AtextItalian (Italy) Pronunciation Dictionary
50
Down arrow Product Type ots-sound Italian (Italy) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone44 hours Add QuoteITA_ASR001Appen GlobalScripted SpeechItalianItalyMixed200440,0007,31622alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 prompts per speaker including 100 command and control type items and 100 phonetically rich sentences
Italian (Italy) scripted microphone
53
Down arrow Product Type ots-sound Italian (Italy) scripted microphone
Audio TTSMicrophone3 hours Add QuoteITA_TTS001Appen GlobalScripted SpeechItalianItalyLow background noise (studio)113,300Available on request22alawDataset is accompanied by a pronunciation lexicon containing all words spoken in the Dataset
3,300 prompts per speaker including phonetically rich sentences
Italian (Italy) scripted microphone
51
Down arrow Product Type ots-sound Italian (Italy) scripted microphone in-car
Audio ASR, Virtual Assistant, In Car HMI & EntertainmentMicrophone47 hours Add QuoteITA_ASR002Appen GlobalScripted SpeechItalianItalyMixed (in-car)205435,87510,36648alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
350 prompts per speaker including digits, street names, generic command and control items, phonetically rich sentences and words
Each speaker recorded 1or 2 sessions including Session 1 in a parked vehicle with the engine running and Session 2 in a vehicle travelling at 60 mph (100 km/h)
Italian (Italy) scripted microphone in-car
54
Down arrow Product Type ots-sound Italian (Italy) telephony
Audio ASR, Virtual AssistantLandline only38 hours Add QuoteItalian Fixed Network Speech SpeechDat(M) CorpusNuanceScripted SpeechItalianItalyLow background noise (home/office)1,000139,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
39 prompts per speaker includign isolated and connected digits, natural numbers, money amounts, spelled words, time and date phrases, yes/no questions, city names, common application words, application words in phrases and phonetically rich sentences
Italian (Italy) telephony
55
Down arrow Product Type ots-sound Italian (Italy) telephony
Audio ASR, Virtual AssistantLandline only228 hours Add QuoteItalian SpeechDat(II) FDB-3000NuanceScripted SpeechItalianItalyLow background noise (home/office)3,0401134,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
44 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
Italian (Italy) telephony
56
Down arrow Product Type ots-sound Italian (Italy) telephony
Audio ASR, Virtual AssistantMobile phone103 hours Add QuoteItalian SpeechDat(II) MDB-250NuanceScripted SpeechItalianItalyLow background noise (home/office)375119,000Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
51 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
Italian (Italy) telephony
89
Down arrow Product Type ots-sound Italian (Italy) telephony
Audio ASR, Virtual AssistantMobile phone13 hours Add QuoteSpeechDat(M) Italian Mobile Network Speech DatabaseNuanceScripted SpeechItalianItalyLow background noise (home/office)342113,500Available on request8Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
40 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items and phonetically rich sentences and words
Italian (Italy) telephony
199
Down arrow Product Type ots-text Japanese (Japan) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A265,000 words Add Quotejpn_JPN_POSAppen GlobalPart of Speech DictionaryJapaneseJapanN/AN/AN/AN/A265,000N/AtextJapanese (Japan) Part of Speech Dictionary
198
Down arrow Product Type ots-text Japanese (Japan) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A262,000 words Add Quotejpn_JPN_PHONAppen GlobalPronunciation DictionaryJapaneseJapanN/AN/AN/AN/A262,000N/AtextJapanese (Japan) Pronunciation Dictionary
57
Down arrow Product Type ots-sound Japanese (Japan) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone33 hours Add QuoteJPN_ASR001Global PhoneScripted SpeechJapaneseJapanLow background noise (home/office)144113,067Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Japanese (Japan) scripted microphone
95
Down arrow Product Type ots-sound Japanese (Japan) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone57 hours Add QuoteSpeecon JapaneseNuanceScripted SpeechJapaneseJapanMixed (office, entertainment, car, public place)600 (550 adult speakers and 50 child speakers)4170,000Available on request16Available on requestDataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Japanese (Japan) scripted microphone
133
Down arrow Product Type ots-text Japanese NER news text
Text NER, Content Classification, Search EnginesN/A20,629 sentences Add QuoteJPY_NER001Appen GlobalNews NERJapaneseJapanN/AN/AN/A20,629Available on requestN/AtextJapanese NER news text
200
Down arrow Product Type ots-text Javanese (Indonesia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A20,000 words Add Quotejav_IDN_PHONAppen GlobalPronunciation DictionaryJavaneseIndonesiaN/AN/AN/AN/A20,000N/AtextJavanese (Indonesia) Pronunciation Dictionary
58
Down arrow Product Type ots-sound Kannada (India) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline15 hours Add QuoteKAN_ASR001Appen GlobalConversational SpeechKannadaIndiaMixed1782Available on request15,6608alawDataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Kannada (India) conversational telephony
109
Down arrow Product Type ots-sound Kannada (India) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline57 hours Add QuoteKAN_ASR001AAppen GlobalConversational SpeechKannadaIndiaMixed1,0002Available on request15,6608alawApprox. 25% of the dataset sessions are transcribed and time stamped - full transcripts can be made available
Database is accompanied by a pronunciation lexicon containing all transcribed words
Kannada (India) conversational telephony
201
Down arrow Product Type ots-text Kannada (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A49,000 words Add Quotekan_IND_PHONAppen GlobalPronunciation DictionaryKannadaIndiaN/AN/AN/AN/A49,000N/AtextKannada (India) Pronunciation Dictionary
202
Down arrow Product Type ots-text Kazakh (Kazakhstan) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A30,000 words Add Quotekaz_KAZ_PHONAppen GlobalPronunciation DictionaryKazakhKazakhstanN/AN/AN/AN/A30,000N/AtextKazakh (Kazakhstan) Pronunciation Dictionary
204
Down arrow Product Type ots-text Korean (South Korea) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A100,000 words Add Quotekor_KOR_POSAppen GlobalPart of Speech DictionaryKoreanSouth KoreaN/AN/AN/AN/A100,000N/AtextKorean (South Korea) Part of Speech Dictionary
203
Down arrow Product Type ots-text Korean (South Korea) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A100,000 words Add Quotekor_KOR_PHONAppen GlobalPronunciation DictionaryKoreanSouth KoreaN/AN/AN/AN/A100,000N/AtextKorean (South Korea) Pronunciation Dictionary
59
Down arrow Product Type ots-sound Korean (South Korea) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone20 hours Add QuoteKOR_ASR001Global PhoneScripted SpeechKoreanSouth KoreaLow background noise (home/office)10018,107Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Korean (South Korea) scripted microphone
129
Down arrow Product Type ots-text Korean NER news text
Text NER, Content Classification, Search EnginesN/A25,830 sentences Add QuoteKOR_NER001Appen GlobalNews NERKoreanSouth KoreaN/AN/AN/A25,830Available on requestN/AtextKorean NER news text
205
Down arrow Product Type ots-text Kurmanji (Turkey) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A60,000 words Add Quotekur_TUR_PHONAppen GlobalPronunciation DictionaryKurmanjiTurkeyN/AN/AN/AN/A60,000N/AtextKurmanji (Turkey) Pronunciation Dictionary
206
Down arrow Product Type ots-text Lao (Laos) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A9,000 words Add Quotelao_LAO_PHONAppen GlobalPronunciation DictionaryLaoLaosN/AN/AN/AN/A9,000N/AtextLao (Laos) Pronunciation Dictionary
207
Down arrow Product Type ots-text Lithuanian (Lithuania) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A71,000 words Add Quotelit_LTU_PHONAppen GlobalPronunciation DictionaryLithuanianLithuaniaN/AN/AN/AN/A71,000N/AtextLithuanian (Lithuania) Pronunciation Dictionary
208
Down arrow Product Type ots-text Malayalam (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A19,000 words Add Quotemal_IND_PHONAppen GlobalPronunciation DictionaryMalayalamIndiaN/AN/AN/AN/A19,000N/AtextMalayalam (India) Pronunciation Dictionary
209
Down arrow Product Type ots-text Malaysian (Malaysia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A10,000 words Add Quotemsa_MYS_PHONAppen GlobalPronunciation DictionaryMalaysianMalaysiaN/AN/AN/AN/A10,000N/AtextMalaysian (Malaysia) Pronunciation Dictionary
210
Down arrow Product Type ots-text Mandarin (Simplified) (China) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A35,000 words Add Quotezho_CHN_PHONAppen GlobalPronunciation DictionaryMandarin (Simplified)ChinaN/AN/AN/AN/A35,000N/AtextMandarin (Simplified) (China) Pronunciation Dictionary
211
Down arrow Product Type ots-text Mandarin (Traditional) (Taiwan) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A50,000 words Add Quotezho_TWN_PHONAppen GlobalPronunciation DictionaryMandarin (Traditional)TaiwanN/AN/AN/AN/A50,000N/AtextMandarin (Traditional) (Taiwan) Pronunciation Dictionary
63
Down arrow Product Type ots-sound Mandarin Chinese (China) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone26 hours Add QuoteMAC_ASR002Global PhoneScripted SpeechMandarin ChineseChinaLow background noise (home/office)132110,225Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web tocover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Mandarin Chinese (China) scripted microphone
62
Down arrow Product Type ots-sound Mandarin Chinese (China) scripted telephony
Audio ASR, Virtual AssistantMobile phone and landline323 hours Add QuoteMAC_ASR001Appen GlobalScripted SpeechMandarin ChineseChinaMixed2,0001200,0007,1458alawFully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon [SAMPA] containing all transcribed words
98 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items (from a set of 215), phonetically rich sentences and words
Mandarin Chinese (China) scripted telephony
131
Down arrow Product Type ots-text Mandarin NER news text
Text NER, Content Classification, Search EnginesN/A17,313 sentences Add QuoteMAC_NER001Appen GlobalNews NERMandarin ChineseChinaN/AN/AN/A17,313Available on requestN/AtextMandarin NER news text
64
Down arrow Product Type ots-sound Marathi (India) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline15 hours Add QuoteMAR_ASR001Appen GlobalConversational SpeechMarathiIndiaMixed1802Available on request11,9088alawApprox. 29% of the dataset sessions are transcribed and time stamped - full transcripts can be made available
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Marathi (India) conversational telephony
110
Down arrow Product Type ots-sound Marathi (India) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline52 hours Add QuoteMAR_ASR001AAppen GlobalConversational SpeechMarathiIndiaMixed1,0002Available on request11,9088alawPortion of the dataset sessions are transcribed and time stamped - full transcripts can be made available
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Marathi (India) conversational telephony
212
Down arrow Product Type ots-text Marathi (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A30,000 words Add Quotemar_IND_PHONAppen GlobalPronunciation DictionaryMarathiIndiaN/AN/AN/AN/A30,000N/AtextMarathi (India) Pronunciation Dictionary
213
Down arrow Product Type ots-text Mongolian (Mongolia) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A30,000 words Add Quotemon_MNG_PHONAppen GlobalPronunciation DictionaryMongolianMongoliaN/AN/AN/AN/A30,000N/AtextMongolian (Mongolia) Pronunciation Dictionary
215
Down arrow Product Type ots-text Norwegian (Norway) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A3,000 words Add Quotenor_NOR_POSAppen GlobalPart of Speech DictionaryNorwegianNorwayN/AN/AN/AN/A3,000N/AtextNorwegian (Norway) Part of Speech Dictionary
214
Down arrow Product Type ots-text Norwegian (Norway) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A115,000 words Add Quotenor_NOR_PHONAppen GlobalPronunciation DictionaryNorwegianNorwayN/AN/AN/AN/A115,000N/AtextNorwegian (Norway) Pronunciation Dictionary
264
Down arrow Product Type ots-image Object Image Collection
Image Image label recognition trainingMobile phone and camera2196 images Add QuoteIMG_TAG_CNAppen ChinaObject ImageN/AN/AMixed lighting conditionsN/AN/AN/AjpgMulti-scene picture sample library of 2196 images, with the following categories: KTV: 50, Department store: 55, Office: 100; Museum: 63; Electrical appliances: 55; Marine: 191; Car: 50; Handbags: 35; Night view: 54; Sports equipment: 54; Convenience stores: 34; Restaurant: 54; Window scenery: 62; Pets: 82; Ship: 50; Zoo, 70; Clothing store: 53; Beach: 95; Airport: 65 tickets; Gym: 47; Attractions: 77; Crowd: 67; Desert: 73; Beach: 68; Mountain area: 54; Shopping mall: 55; Trees: 85; Sky: 102; Snow: 71; Snow Mountain: 53; Night view: 78; Playground: 94Object Image Collection
216
Down arrow Product Type ots-text Oriya (India) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A15,000 words Add Quoteori_IND_PHONAppen GlobalPronunciation DictionaryOriyaIndiaN/AN/AN/AN/A15,000N/AtextOriya (India) Pronunciation Dictionary
80
Down arrow Product Type ots-sound Panjabi (Pakistan) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline20 hours Add QuotePAP_ASR001Appen GlobalConversational SpeechPanjabiPakistanLow background noise2052Available on request7,2988alawDataset is fully transcribed and time-stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
71% of calls, both speakers (in-line/out-line) were collected and transcribed, however, for 29% calls, only one half of the conversation was collected and transcribed
Panjabi (Pakistan) conversational telephony
74
Down arrow Product Type ots-sound Pashto (Afghanistan) broadcast
Audio ASR, Automatic Captioning, Keyword SpottingMicrophone51 hours Add QuotePAS_BRC001Appen GlobalBroadcast SpeechNorthern Pashto - Southern PashtoAfghanistanLow background noise (studio)N/A1Available on requestAvailable on requestN/AwavDataset is fully transcribed and timestamped
Pronunciation lexicon not currently available but can be developed upon request
Dataset is largely speech only and does not include music or advertisements
Data types include: talk shows, interviews, news broadcasts (excluding news reading by anchors)
Pashto (Afghanistan) broadcast
73
Down arrow Product Type ots-sound Pashto (Afghanistan) conversational microphone
Audio ASR, Conversational AI, Speech AnalyticsMicrophone39 hours Add QuotePAS_ASR002Appen GlobalConversational SpeechNorthern Pashto - Southern PashtoAfghanistanLow background noise402348609,48016wavDataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
A full translation of the transcripts into French is also available as an optional additional purchase
Average length of calls: 120 mins where one speaker acts as an interviewer and the other as the interviewee for scenarios are similar to TransTAC style (e.g. civil affairs, checkpoints etc.)
The interviewer appears in more than one set of dialogues but the interviewee is unique for each set
Pashto (Afghanistan) conversational microphone
72
Down arrow Product Type ots-sound Pashto (Afghanistan) conversational telephony
Audio ASR, Conversational AI, Speech AnalyticsMobile phone and landline55 hours Add QuotePAS_ASR001Appen GlobalConversational SpeechNorthern Pashto - Southern PashtoAfghanistanLow background noise9672Available on request13,6338wavDataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For the majority of calls, both speakers (in-line/out-line) were collected and transcribed, however, for a smaller number of calls, only one half of the conversation was collected and transcribed
Pashto (Afghanistan) conversational telephony
217
Down arrow Product Type ots-text Pashto (Afghanistan) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A65,000 words Add Quotepus_AFG_PHONAppen GlobalPronunciation DictionaryPashtoAfghanistanN/AN/AN/AN/A65,000N/AtextPashto (Afghanistan) Pronunciation Dictionary
219
Down arrow Product Type ots-text Polish (Poland) Part of Speech Dictionary
Text ASR, TTS, Language ModellingN/A4,000 words Add Quotepol_POL_POSAppen GlobalPart of Speech DictionaryPolishPolandN/AN/AN/AN/A4,000N/AtextPolish (Poland) Part of Speech Dictionary
218
Down arrow Product Type ots-text Polish (Poland) Pronunciation Dictionary
Text ASR, TTS, Language ModellingN/A40,000 words Add Quotepol_POL_PHONAppen GlobalPronunciation DictionaryPolishPolandN/AN/AN/AN/A40,000N/AtextPolish (Poland) Pronunciation Dictionary
75
Down arrow Product Type ots-sound Polish (Poland) scripted microphone
Audio ASR, Virtual Assistant, ChatbotMicrophone25 hours Add QuotePOL_ASR001Global PhoneScripted SpeechPolishPolandLow background noise (home/office)99110,130Available on request16wavDataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker read