You have 0 Pre-Labeled Datasets Added to Quote Request Quote
Dataset Text Albanian (Albania) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 12,000 words Add Dataset to Quote sqi_ALB_PHON Appen Global Pronunciation Dictionary Albanian Albania N/A N/A N/A N/A 12,000 N/A text Albanian (Albania) Pronunciation Dictionary
Dataset Text Amharic (Ethiopia) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 49,000 words Add Dataset to Quote amh_ETH_PHON Appen Global Pronunciation Dictionary Amharic Ethiopia N/A N/A N/A N/A 49,000 N/A text Amharic (Ethiopia) Pronunciation Dictionary
Dataset Text Arabic (Algeria) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 11,000 words Add Dataset to Quote ara_DZA_PHON Appen Global Pronunciation Dictionary Arabic Algeria N/A N/A N/A N/A 11,000 N/A text Arabic (Algeria) Pronunciation Dictionary
Dataset Audio Arabic (Eastern Algeria) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 29 hours Add Dataset to Quote EAR_ASR001 Appen Global Conversational Speech Arabic Algeria Low background noise (home/office) 496 2 Available on request 11,327 8 alaw Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For the majority of calls, both speakers (in-line/out-line) were collected and transcribed however, for a smaller number of calls, only one half of the conversation was collected and transcribed
8% landline, 92% mobile
Arabic (Eastern Algeria) conversational telephony
Dataset Text Arabic (Egypt) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 40,000 words Add Dataset to Quote ara_EGY_PHON Appen Global Pronunciation Dictionary Arabic Egypt N/A N/A N/A N/A 40,000 N/A text Arabic (Egypt) Pronunciation Dictionary
Dataset Audio Arabic (Egypt) scripted smartphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Mobile phone Unit: 352 hours Add Dataset to Quote ARE_ASR001_CN Appen China Scripted Speech Arabic Egypt Low background noise (home/office) 627 1 128,908 207,576 16 wav Dataset contains audio with corresponding text prompts
Text prompts are not vowelised
Arabic (Egypt) scripted smartphone
Dataset Text Arabic (Iraq) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 13,000 words Add Dataset to Quote ara_IRQ_POS Appen Global Part of Speech Dictionary Arabic Iraq N/A N/A N/A N/A 13,000 N/A text Arabic (Iraq) Part of Speech Dictionary
Dataset Text Arabic (Iraq) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 19,000 words Add Dataset to Quote ara_IRQ_PHON Appen Global Pronunciation Dictionary Arabic Iraq N/A N/A N/A N/A 19,000 N/A text Person names Arabic (Iraq) Pronunciation Dictionary
Dataset Text Arabic (Libya) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 48,000 words Add Dataset to Quote ara_LBY_PHON Appen Global Pronunciation Dictionary Arabic Libya N/A N/A N/A N/A 48,000 N/A text Arabic (Libya) Pronunciation Dictionary
Dataset Audio Arabic (Modern Standard Arabic) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 12 hours Add Dataset to Quote MSA_ASR001 Global Phone Scripted Speech Arabic Tunisia Low background noise (home/office) 78 1 4,908 Available on request 16 wav Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Arabic (Modern Standard Arabic) scripted microphone
Dataset Audio Arabic (Morocco) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 33 hours Add Dataset to Quote ARY_ASR001 Appen Global Conversational Speech Arabic Morocco Low background noise 180 2 80,430 23,836 8 alaw Each speaker participated in 1 to 4 conversations. Speakers are identified by a unique 4-digit speaker ID which is recorded in the demographic file
Transcription is available in original script and fully reversible Romanised version with accompanying pronunciation lexicon
English translation of product transcription is available (ARY_MT001, ARY_ASRMT001)
Arabic (Morocco) conversational telephony
Dataset Text Arabic (Morocco) conversational telephony translation Common Use Cases: MT, Chatbot , Conversational AI Recording Device: N/A Unit: 80,430 utterances Add Dataset to Quote ARY_MT001 Appen Global Conversational Translation Arabic Morocco N/A 180 N/A 80,430 23,836 N/A text Corresponding audio, transcription, fully reversible romanised transcription and pronunciation lexicon data are available (ARY_ASR001, ARY_ASRMT001) Arabic (Morocco) conversational telephony translation
Dataset Text Arabic (Morocco) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 60,000 words Add Dataset to Quote ara_MAR_PHON Appen Global Pronunciation Dictionary Arabic Morocco N/A N/A N/A N/A 60,000 N/A text Arabic (Morocco) Pronunciation Dictionary
Dataset Text Arabic (MSA) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 40,000 words Add Dataset to Quote arb_MSA_PHON Appen Global Pronunciation Dictionary Standard Arabic N/A N/A N/A N/A N/A 40,000 N/A text Arabic (MSA) Pronunciation Dictionary
Dataset Audio Arabic (Saudi Arabia) scripted smartphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Mobile phone Unit: 322 hours Add Dataset to Quote ARS_ASR001_CN Appen China Scripted Speech Arabic Saudi Arabia Low background noise (home/office) 227 1 104,574 156,282 16 wav Dataset contains audio with corresponding text prompts
Text prompts are not vowelised
300-1000 prompts per speaker covering general content including education, sports, entertainment, travel, culture and technology
Arabic (Saudi Arabia) scripted smartphone
Dataset Text Arabic (Sudan) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 17,000 words Add Dataset to Quote ara_SDN_PHON Appen Global Pronunciation Dictionary Arabic Sudan N/A N/A N/A N/A 17,000 N/A text Arabic (Sudan) Pronunciation Dictionary
Dataset Text Arabic (United Arab Emirates (UAE)) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 75,000 words Add Dataset to Quote ara_ARE_PHON Appen Global Pronunciation Dictionary Arabic United Arab Emirates (UAE) N/A N/A N/A N/A 75,000 N/A text Arabic (United Arab Emirates (UAE)) Pronunciation Dictionary
Dataset Audio Arabic (United Arab Emirates (UAE)) scripted smartphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Mobile phone Unit: 170 hours Add Dataset to Quote ARU_ASR001_CN Appen China Scripted Speech Arabic United Arab Emirates (UAE) Low background noise (home/office) 133 1 42,352 85,775 16 wav Dataset contains audio with corresponding text prompts
Text prompts are not vowelised
Arabic (United Arab Emirates (UAE)) scripted smartphone
Dataset Audio Arabic (United Arab Emirates (UAE)) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 48 hours Add Dataset to Quote OrienTel United Arab Emirates MCA (Modern Colloquial Arabic) Nuance Scripted Speech Arabic United Arab Emirates (UAE) Low background noise 880 1 43,000 Available on request 8 alaw Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
49 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
Arabic (United Arab Emirates (UAE)) scripted telephony
Dataset Audio Arabic (United Arab Emirates (UAE)) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 31 hours Add Dataset to Quote OrienTel United Arab Emirates MSA (Modern Standard Arabic) Nuance Scripted Speech Arabic United Arab Emirates (UAE) Low background noise 500 1 24,500 Available on request 8 alaw Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
49 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
Arabic (United Arab Emirates (UAE)) scripted telephony
Dataset Audio Arabic (United Arab Emirates (UAE)/ Saudi Arabia) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 86 hours Add Dataset to Quote CGA_ASR001 Appen Global Scripted Speech Arabic United Arab Emirates (UAE) – Saudi Arabia Low background noise (home/office) 150 4 42,000 19,245 16 raw PCM Fully transcribed with acoustic event tagging derived from the SpeechDAT conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
All transcriptions fully vowelized
280 prompts per speaker including 30 Person names (first name and family name) from a set of 15, 10 single isolated digits 0-10, 8-digit sequences (randomly generated), 200 phonetically balanced sentences, 30 x 10-word phonetically balanced word strings
Arabic (United Arab Emirates (UAE)/ Saudi Arabia) scripted microphone
Dataset Text Arabic NER news text Common Use Cases: NER, Content Classification, Search Engines Recording Device: N/A Unit: 20,774 sentences Add Dataset to Quote ARB_NER001 Appen Global News NER Standard Arabic N/A N/A N/A N/A 20,774 Available on request N/A text News text corpora with entities tagged in XML format: Person, Title, Organization, Location, Geo-political entity, Facility, Religion, Nationality, Quantity Arabic NER news text
Dataset Text Assamese (India) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 40,000 words Add Dataset to Quote asm_IND_PHON Appen Global Pronunciation Dictionary Assamese India N/A N/A N/A N/A 40,000 N/A text Assamese (India) Pronunciation Dictionary
Dataset Audio Baby crying audio Common Use Cases: Baby Monitor, Security & Other Consumer Applications Recording Device: Mobile phone Unit: 70 hours Add Dataset to Quote CRY_ASR001_CN Appen China Human Sound N/A China Low background noise (home/office) 566 1 N/A N/A 16 wav Crying sound of babies 0-3 years old, each lasting around 2 minutes. Audio only. Baby crying audio
Dataset Audio Bahasa Indonesia conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 31 hours Add Dataset to Quote BAH_ASR001 Appen Global Conversational Speech Indonesian Indonesia Low background noise 1,002 2 30,695 11,480 8 wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
For a large proportion of calls, only one half of the conversation was collected and transcribed
28% landline, 72% mobile
Bahasa Indonesia conversational telephony
Dataset Text Basque (Spain) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 10,000 words Add Dataset to Quote eus_ESP_PHON Appen Global Pronunciation Dictionary Basque Spain N/A N/A N/A N/A 10,000 N/A text Basque (Spain) Pronunciation Dictionary
Dataset Audio Bengali (Bangladesh) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 47 hours Add Dataset to Quote BEN_ASR001 Appen Global Conversational Speech Bengali Bangladesh Mixed (in-car, roadside, home/office) 1,000 2 108,923 17,922 8 wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Bengali (Bangladesh) conversational telephony
Dataset Text Bengali (India) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 29,000 words Add Dataset to Quote ben_IND_PHON Appen Global Pronunciation Dictionary Bengali India N/A N/A N/A N/A 29,000 N/A text Bengali (India) Pronunciation Dictionary
Dataset Audio Bulgarian (Bulgaria) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 38 hours Add Dataset to Quote BUL_ASR001 Appen Global Conversational Speech Bulgarian Bulgaria Low background noise (home/office) 217 2 86,453 22,342 8 alaw or wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project – 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
49% landline, 51% mobile
Conversations cover a range of topics including: Holiday/Leisure, Movies/TV Shows and Work.
Bulgarian (Bulgaria) conversational telephony
Dataset Text Bulgarian (Bulgaria) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 55,000 words Add Dataset to Quote bul_BGR_PHON Appen Global Pronunciation Dictionary Bulgarian Bulgaria N/A N/A N/A N/A 55,000 N/A text Bulgarian (Bulgaria) Pronunciation Dictionary
Dataset Audio Bulgarian (Bulgaria) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 22 hours Add Dataset to Quote BUL_ASR002 Global Phone Scripted Speech Bulgarian Bulgaria Low background noise (home/office) 77 1 8,674 Available on request 16 wav Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Bulgarian (Bulgaria) scripted microphone
Dataset Image Business-to-business printed text document OCR Common Use Cases: Document Processing, Document Search Recording Device: Camera, scan Unit: 5,832 documents Add Dataset to Quote IMG_OCR_B2B Appen Global Document OCR N/A N/A Mixed lighting conditions N/A N/A N/A N/A N/A png Scans and photographs of business-to-business documents containing printed text. 38% Premium Quality images in 10 languages, 25 countries, including Purchase Order, Payment Advice or Remittance Advice, Order Confirmation and Delivery note. 64% Standard Quality images in various challenging conditions in 11 languages, 34 countries, in a wider range of categories including Complaints or Return, Delivery advice, Delivery note, Dunning, Goods receipt, Invoice, Offer, Order confirmation, Pay slip, Payment Advice or Remittance Advice, Purchase Order, Receipt, and Supplier load Business-to-business printed text document OCR
Dataset Text Cantonese (China) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 10,000 words Add Dataset to Quote yue_HKG_POS Appen Global Part of Speech Dictionary Cantonese China N/A N/A N/A N/A 10,000 N/A text Traditional Cantonese (China) Part of Speech Dictionary
Dataset Text Cantonese (China) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 37,000 words Add Dataset to Quote yue_CHN_PHON Appen Global Pronunciation Dictionary Cantonese China N/A N/A N/A N/A 37,000 N/A text Simplified Cantonese (China) Pronunciation Dictionary
Dataset Text Cantonese (China) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 40,000 words Add Dataset to Quote yue_CHN_PHON Appen Global Pronunciation Dictionary Cantonese China N/A N/A N/A N/A 40,000 N/A text Traditional Cantonese (China) Pronunciation Dictionary
Dataset Text Catalan (Spain) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 10,000 words Add Dataset to Quote cat_ESP_PHON Appen Global Pronunciation Dictionary Catalan Spain N/A N/A N/A N/A 10,000 N/A text Catalan (Spain) Pronunciation Dictionary
Dataset Text Cebuano (Philippines) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 21,000 words Add Dataset to Quote ceb_PHL_PHON Appen Global Pronunciation Dictionary Cebuano Philippines N/A N/A N/A N/A 21,000 N/A text Cebuano (Philippines) Pronunciation Dictionary
Dataset Audio Chinese (multinational foreigner) scripted smartphone Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone Unit: 200 hours Add Dataset to Quote FOREIGNER_ASR001_CN Appen China Scripted Speech Mandarin Chinese China Low background noise 309 1 16 wav This database contains 200 hours of foreigners speaking Chinese from the following countries: Argentina, Egypt, Australia, Russia, the Philippines, Kazakhstan, Korea, Kyrgyzstan, Canada, Kuala Lumpur, Kenya, Laos, Malaysia, Mauritius, the United States, Mongolia, South Africa, Japan, Tajikistan, Thailand, Turkey, Hong Kong, Singapore, India, Indonesia, Vietnam
There is no data from South Korea, Brazil, or data recorded by minors.
Each session lasts about an hour; sentence duration ranges between 3-10 seconds
The content is in the form of an individual reading while being recorded on a mobile phone in a home/office environment.
Sensitive data and personal information has been scrubbed.
Chinese (multinational foreigner) scripted smartphone
Dataset Audio Croatian (Croatia) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 39 hours Add Dataset to Quote CRO_ASR001 Appen Global Conversational Speech Croatian Croatia Low background noise (home/office) 200 2 Available on request 23,919 8 alaw Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project – 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
53% landline, 47% mobile
Conversations cover a range of topics including: News & Current Affairs, Health and Sport.
Croatian (Croatia) conversational telephony
Dataset Text Croatian (Croatia) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 19,000 words Add Dataset to Quote hrv_HRV_PHON Appen Global Pronunciation Dictionary Croatian Croatia N/A N/A N/A N/A 19,000 N/A text Croatian (Croatia) Pronunciation Dictionary
Dataset Audio Croatian (Croatia) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 11 hours Add Dataset to Quote CRO_ASR002 Global Phone Scripted Speech Croatian Croatia Low background noise (home/office) 94 1 4,499 23,929 16 wav Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Croatian (Croatia) scripted microphone
Dataset Audio Croatian (Croatia) scripted smartphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Mobile phone Unit: 263 hours Add Dataset to Quote CRO_ASR003_CN Appen China Scripted Speech Croatian Croatia Low background noise (home/office) 243 1 73,467 136,140 16 wav Dataset contains audio with corresponding text prompts Croatian (Croatia) scripted smartphone
Dataset Text Czech (Czech Republic) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 50,000 words Add Dataset to Quote ces_CZE_PHON Appen Global Pronunciation Dictionary Czech Czech Republic N/A N/A N/A N/A 50,000 N/A text Czech (Czech Republic) Pronunciation Dictionary
Dataset Audio Czech (Czech Republic) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 31 hours Add Dataset to Quote CZE_ASR001 Global Phone Scripted Speech Czech Czech Republic Low background noise (home/office) 102 1 12,425 Available on request 16 wav Dataset is fully transcribed and the transcription is available both in original script and in Romanized form
Each speaker reads a number of phonetically rich sentences selected from national newspaper articles available from the web to cover a wide domain with large vocabulary
Developed in collaboration with the Karlsruhe Institute of Technology (KIT)
Czech (Czech Republic) scripted microphone
Dataset Audio Czech (Czech Republic) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Landline only Unit: 93 hours Add Dataset to Quote Czech SpeechDat(E) Dataset Nuance Scripted Speech Czech Czech Republic Low background noise 1,000 1 52,000 Available on request 8 alaw Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
52 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, and phonetically rich words and sentences
Czech (Czech Republic) scripted telephony
Dataset Text Danish (Denmark) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 100,000 words Add Dataset to Quote dan_DNK_POS Appen Global Part of Speech Dictionary Danish Denmark N/A N/A N/A N/A 100,000 N/A text Danish (Denmark) Part of Speech Dictionary
Dataset Text Danish (Denmark) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 107,000 words Add Dataset to Quote dan_DNK_PHON Appen Global Pronunciation Dictionary Danish Denmark N/A N/A N/A N/A 107,000 N/A text Danish (Denmark) Pronunciation Dictionary
Dataset Audio Danish (Denmark) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 53 hours Add Dataset to Quote Speecon Danish Nuance Scripted Speech Danish Denmark Mixed (office, entertainment, car, public place) 600 (550 adult speakers and 50 child speakers) 4 170,000 Available on request 16 Available on request Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Danish (Denmark) scripted microphone
Dataset Audio Dari (Afghanistan) broadcast Common Use Cases: ASR, Automatic Captioning, Keyword Spotting Recording Device: Microphone Unit: 51 hours Add Dataset to Quote DAR_BRC001 Appen Global Broadcast Speech Dari Afghanistan Low background noise (studio) N/A 1 Available on request Available on request N/A wav Dataset is fully transcribed and timestamped
Pronunciation lexicon not currently available but can be developed upon request
Dataset is largely speech only and does not include music or advertisements
Data types include: talk shows, interviews, news broadcasts (excluding news reading by anchors)
13% landline, 87% mobile
Dari (Afghanistan) broadcast
Dataset Audio Dari (Afghanistan) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 40 hours Add Dataset to Quote DAR_ASR001 Appen Global Conversational Speech Dari Afghanistan Low background noise 500 2 Available on request 11,168 8 alaw Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Dataset is largely speech only and does not include music or advertisements
13% landline, 87% mobile
Dari (Afghanistan) conversational telephony
Dataset Text Dari (Afghanistan) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 31,000 words Add Dataset to Quote prs_AFG_PHON Appen Global Pronunciation Dictionary Dari Afghanistan N/A N/A N/A N/A 31,000 N/A text Dari (Afghanistan) Pronunciation Dictionary
Dataset Text Dholuo (Kenya) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 23,000 words Add Dataset to Quote luo_KEN_PHON Appen Global Pronunciation Dictionary Dholuo Kenya N/A N/A N/A N/A 23,000 N/A text Dholuo (Kenya) Pronunciation Dictionary
Dataset Audio Dongbei dialect (China) Conversational Speech Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Recording pen/microphone Unit: 84.6 hours Add Dataset to Quote DONGBEI_ASR001_CN Appen China Conversational Speech Dongbei dialect China Low background noise 268 1 16 wav Audio only; transcription not included
Audio recordings cover 19 districts: Shenyang Heping District, Shenhe District, Huanggu District, Dadong District, Tiexi District, Lvyuan District, Chaoyang District, Kuancheng District, Erdao District, Nanguan District, Daoli District, Nangang District, Daowai District, Pingfang District, Songbei District, Xiangfang District, Hulan District, Acheng District and Shuangcheng District
Northeast suburb accents not included, and no minors were recorded.
Each recording session contains 20-30 minutes of free dialogue between 2-5 people.
Sensitive data and personal information has been scrubbed.
Dongbei dialect (China) Conversational Speech
Dataset Audio Dongbei dialect (China) Conversational Speech Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone Unit: 75.2 hours Add Dataset to Quote DONGBEI_ASR002_CN Appen China Conversational Speech Dongbei dialect China Low background noise 185 1 8 wav Audio only; transcription not included
Audio recordings cover 19 districts: Shenyang Heping District, Shenhe District, Huanggu District, Dadong District, Tiexi District, Lvyuan District, Chaoyang District, Kuancheng District, Erdao District, Nanguan District, Daoli District, Nangang District, Daowai District, Pingfang District, Songbei District, Xiangfang District, Hulan District, Acheng District and Shuangcheng District
Northeast suburb accents not included, and no minors were recorded.
Each recording session contains 20-30 minutes of free dialogue between 2-5 people.
Sensitive data and personal information has been scrubbed.
Dongbei dialect (China) Conversational Speech
Dataset Audio Dutch (Belgium) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 47 hours Add Dataset to Quote Speecon Dutch from Belgium Nuance Scripted Speech Dutch Belgium Mixed (office, entertainment, car, public place) 600 (550 adult speakers and 50 child speakers) 4 170,000 Available on request 16 Available on request Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Dutch (Belgium) scripted microphone
Dataset Audio Dutch (Belgium) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Microphone Unit: 80 hours Add Dataset to Quote Flemish SpeechDat(II) FDB-1000 (FIXED1FL) Nuance Scripted Speech Dutch Belgium Low background noise 1,000 1 52,000 Available on request 8 Available on request Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
52 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
Dutch (Belgium) scripted telephony
Dataset Audio Dutch (Netherlands & Belgium) scripted in-car Common Use Cases: ASR, Virtual Assistant, In Car HMI & Entertainment Recording Device: Microphone and mobile phone Unit: 27 hours Add Dataset to Quote Dutch and Flemish SpeechDat-Car Nuance Scripted Speech Dutch Netherland – Belgium Mixed (in-car) 302 5 15,100 Available on request 16 and 8 Available on request Dataset is fully transcribed and is accompanied by a pronunciation lexicon and validation report
125 prompts per adult speaker including digits, natural numbers, letter strings, personal, place and business names (some spontaneous), generic command and control items, phonetically rich words and sentences and prompts for spontaneous speech
Dutch (Netherlands & Belgium) scripted in-car
Dataset Audio Dutch (Netherlands) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 36 hours Add Dataset to Quote NLD_ASR001 Appen Global Conversational Speech Dutch Netherlands Low background noise 200 2 Available on request 14,964 8 alaw Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
200 telephony conversations are recorded for this project – 100 speakers make 2 calls each (1 from landline, 1 from mobile) to a pool of 100 call receivers
51% landline, 49% mobile
Conversations cover a range of topics including: Holiday/Leisure, Work and Sport.
Dutch (Netherlands) conversational telephony
Dataset Text Dutch (Netherlands) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 45,000 words Add Dataset to Quote nld_NLD_PHON Appen Global Pronunciation Dictionary Dutch Netherlands N/A N/A N/A N/A 45,000 N/A text Dutch (Netherlands) Pronunciation Dictionary
Dataset Audio Dutch (Netherlands) scripted microphone Common Use Cases: ASR, Virtual Assistant, Chatbot Recording Device: Microphone Unit: 68 hours Add Dataset to Quote Speecon Dutch from the Netherlands Nuance Scripted Speech Dutch Netherlands Mixed (office, entertainment, car, public place) 600 (550 adult speakers and 50 child speakers) 4 170,000 Available on request 16 Available on request Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
290 prompts per adult speaker and 210 prompts per child speaker including digits, natural numbers, letter strings, personal, place and business names, application words for adult speakers, command (toy, phone and general) for child speakers, phonetically rich words and sentences and free and elicited spontaneous responses for adult speakers
Dutch (Netherlands) scripted microphone
Dataset Image East African facial images Common Use Cases: Facial Recognition Recording Device: Camera Unit: 13500 images Add Dataset to Quote IMG_FACE_KEN_CN Appen China Human Face N/A Kenya Mixed background and lighting conditions 99 N/A N/A N/A N/A jpg Images of 99 participants containing all combinations of 9 different lighting conditions, 2 different distances between participants face and smartphone, 7 different camera angles. All combinations of these 3 requirements were completed per participant.
A random 32 images per person include occlusions such as sunglasses, masks, wigs or hats
A random 36 shots include different facial expressions including stare, open mouth, pout mouth smile and frown
Lighting conditions: indoor normal light, outdoor normal light, indoor backlight, outdoor backlight, indoor ordinary dark light, full black screen fill light, point light source (white light, street light), neon light (monochromatic red, green and blue, multi-color mixed light), side glare
Distances: 30cm and 50cm
Camera angles: front, left 45°, right 45°, left 15°, right 15°, top 30°, bottom 30°
East African facial images
Dataset Audio English (Arabic – Levant/Egypt) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 28 hours Add Dataset to Quote ENA_ASR001 Appen Global Conversational Speech English Egypt Low background noise 250 2 Available on request 5,619 8 alaw or wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Average length of calls: 10-15 mins
English (Arabic – Levant/Egypt) conversational telephony
Dataset Text English (Australia) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 157,000 words Add Dataset to Quote eng_AUS_PHON Appen Global Pronunciation Dictionary English Australia N/A N/A N/A N/A 157,000 N/A text English (Australia) Pronunciation Dictionary
Dataset Audio English (Australia) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 92 hours Add Dataset to Quote AUS_ASR001 Appen Global Scripted Speech English Australia Low background noise (home/office) 500 1 82,500 35,137 8 alaw Fully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
162 prompts (read speech) per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items (from a set of 215), phonetically rich sentences and words
English (Australia) scripted telephony
Dataset Audio English (Australia) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 118 hours Add Dataset to Quote AUS_ASR002 Appen Global Scripted Speech English Australia Mixed 1,000 1 75,000 18,952 8 alaw Fully transcribed to SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
75 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
The prompts are a mixture of ‘read’ and ‘elicited’ items where 5 prompts per script are ‘spontaneous free speech’
English (Australia) scripted telephony
Dataset Text English (Canada) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 3,000 words Add Dataset to Quote eng_CAN_POS Appen Global Part of Speech Dictionary English Canada N/A N/A N/A N/A 3,000 N/A text English (Canada) Part of Speech Dictionary
Dataset Text English (Canada) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 50,000 words Add Dataset to Quote eng_CAN_PHON Appen Global Pronunciation Dictionary English Canada N/A N/A N/A N/A 50,000 N/A text English (Canada) Pronunciation Dictionary
Dataset Audio English (Canada) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 144 hours Add Dataset to Quote ENC_ASR001 Appen Global Scripted Speech English Canada Mixed 1,000 1 99,000 12,483 8 alaw or wav Fully transcribed to SALA II/SpeechDAT type conventions
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
99 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
English (Canada) scripted telephony
Dataset Text English (Hong Kong) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 18,000 words Add Dataset to Quote eng_HKG_PHON Appen Global Pronunciation Dictionary English Hong Kong N/A N/A N/A N/A 18,000 N/A text English (Hong Kong) Pronunciation Dictionary
Dataset Audio English (India) conversational smartphone Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone Unit: 143 hours Add Dataset to Quote ENI_ASR003 Appen Global Conversational Speech English India Mixed (home, car, public place, outdoor) 272 1 Available on request Available on request 48 wav Two person conversations covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
English (India) conversational smartphone
Dataset Audio English (India) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 67 hours Add Dataset to Quote ENI_ASR002 Appen Global Conversational Speech English India Low background noise 540 2 77,565 11,646 8 alaw or wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
271 telephony conversations are recorded for this project
English (India) conversational telephony
Dataset Text English (India) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 13,000 words Add Dataset to Quote eng_IND_POS Appen Global Part of Speech Dictionary English India N/A N/A N/A N/A 13,000 N/A text English (India) Part of Speech Dictionary
Dataset Text English (India) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 60,000 words Add Dataset to Quote eng_IND_PHON Appen Global Pronunciation Dictionary English India N/A N/A N/A N/A 60,000 N/A text English (India) Pronunciation Dictionary
Dataset Audio English (India) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 217 hours Add Dataset to Quote ENI_ASR001 Appen Global Scripted Speech English India Mixed 2,358 1 115,541 9,190 8 alaw or wav Fully transcribed to SpeechDAT type conventions.
Dataset is accompanied by a pronunciation lexicon [SAMPA] containing all transcribed words
49 prompts per speaker including digits, natural numbers, letter strings, personal, place, and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words
English (India) scripted telephony
Dataset Text English (Ireland) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 12,000 words Add Dataset to Quote eng_IRL_PHON Appen Global Pronunciation Dictionary English Ireland N/A N/A N/A N/A 12,000 N/A text English (Ireland) Pronunciation Dictionary
Dataset Text English (NZ) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 28,000 words Add Dataset to Quote eng_NZL_PHON Appen Global Pronunciation Dictionary English NZ N/A N/A N/A N/A 28,000 N/A text English (NZ) Pronunciation Dictionary
Dataset Audio English (Philippines) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 53 hours Add Dataset to Quote ENF_ASR001 Appen Global Conversational Speech English Philippines Low background noise 450 2 41,602 7,272 8 alaw or wav Dataset is fully transcribed and time stamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Average length of calls: 10-15 mins
English (Philippines) conversational telephony
Dataset Text English (Philippines) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 7,000 words Add Dataset to Quote eng_PHL_PHON Appen Global Pronunciation Dictionary English Philippines N/A N/A N/A N/A 7,000 N/A text English (Philippines) Pronunciation Dictionary
Dataset Text English (United Arab Emirates (UAE)) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 5,000 words Add Dataset to Quote eng_ARE_PHON Appen Global Pronunciation Dictionary English United Arab Emirates (UAE) N/A N/A N/A N/A 5,000 N/A text English (United Arab Emirates (UAE)) Pronunciation Dictionary
Dataset Audio English (United Arab Emirates (UAE)) scripted telephony Common Use Cases: ASR, Virtual Assistant Recording Device: Mobile phone and landline Unit: 33 hours Add Dataset to Quote OrienTel English as spoken in the United Arab Emirates Nuance Scripted Speech English United Arab Emirates (UAE) Low background noise 500 1 25,500 Available on request 8 alaw Dataset is fully transcribed to SpeechDAT type conventions and is accompanied by a pronunciation lexicon and validation report
51 prompts per speaker including digits, natural numbers, letter strings, personal, place and business names, confirmation items (yes, no + fuzzy), generic command and control items, phonetically rich sentences and words and spontaneous items for control
English (United Arab Emirates (UAE)) scripted telephony
Dataset Audio English (United Kingdom) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 150 hours Add Dataset to Quote UKE_ASR001 Appen Global Conversational Speech English United Kingdom Low background noise 1,175 2 298,562 24,193 8 wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
This version contains full 15-minute calls – there is a reduced version with 5 min calls named UKE_ASR001B.
English (United Kingdom) conversational telephony
Dataset Audio English (United Kingdom) conversational telephony Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone and landline Unit: 50 hours Add Dataset to Quote UKE_ASR001B Appen Global Conversational Speech English United Kingdom Low background noise 1,150 2 Available on request 13,192 8 wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
This version contains full 5-minute calls – there is an expanded version with 15 min calls named UKE_ASR001.
English (United Kingdom) conversational telephony
Dataset Text English (United Kingdom) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 155,000 words Add Dataset to Quote eng_GBR_POS Appen Global Part of Speech Dictionary English United Kingdom N/A N/A N/A N/A 155,000 N/A text English (United Kingdom) Part of Speech Dictionary
Dataset Text English (United Kingdom) Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 195,000 words Add Dataset to Quote eng_GBR_PHON Appen Global Pronunciation Dictionary English United Kingdom N/A N/A N/A N/A 195,000 N/A text English (United Kingdom) Pronunciation Dictionary
Dataset Audio English (United Kingdom) scripted microphone – single female Common Use Cases: TTS Recording Device: Headset microphone Unit: 11 hours Add Dataset to Quote TC-STAR female baseline voice Laura Nuance Scripted Speech English United Kingdom Low background noise (studio) 1 1 Available on request Available on request 96 Available on request Dataset includes manual orthographic transcription, automatic segmentation into phonemes, automatic generation of pitch marks (where a certain percentage of phonetic segments and pitch marks has been manually checked)
Dataset is accompanied by a pronunciation lexicon with POS, lemma and phonetic transcription
English (United Kingdom) scripted microphone – single female
Dataset Audio English (United Kingdom) scripted microphone – single male Common Use Cases: TTS Recording Device: Headset microphone Unit: 7 hours Add Dataset to Quote TC-STAR male baseline voice Ian Nuance Scripted Speech English United Kingdom Low background noise (studio) 1 1 Available on request Available on request 96 Available on request Dataset includes manual orthographic transcription, automatic segmentation into phonemes, automatic generation of pitch marks (where a certain percentage of phonetic segments and pitch marks has been manually checked)
Dataset is accompanied by a pronunciation lexicon with POS, lemma and phonetic transcription
English (United Kingdom) scripted microphone – single male
Dataset Audio English (United States – African American) conversational smartphone Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone Unit: 50 hours Add Dataset to Quote USE_ASR004 Appen Global Conversational Speech English United States Mixed (home, car, public place, outdoor) 94 1 Available on request Available on request 48 wav Two person conversations recorded on a smartphone covering a broad range of generic topics including clothing, culture, education, finance, food, health, history, hospitality, insurance, media/entertainment, sports, travel/holiday, weather and work.
Each speaker participates in up to 12 conversations that are 5-15 minutes long.
Pronunciation lexicon not currently available but can be developed upon request
English (United States – African American) conversational smartphone
Dataset Text English (United States) Conversation SMS – Threaded Common Use Cases: Virtual Assistant, Chatbot Recording Device: N/A Unit: 1.04M messages Add Dataset to Quote ENG_SMS001 Appen Global SMS text messages English United States N/A 17382 N/A 1,047,415 Available on request N/A text This dataset contains threaded SMS conversations between 2 participants, using iMessage and Android SMS. All messages are in US English. Contains timestamps and text message exchanges, with metadata including gender, age range and relationship between participants. Consent is obtained from all participants and the dataset does not contain PII. English (United States) Conversation SMS – Threaded
Dataset Text English (United States) Conversation SMS – Threaded Common Use Cases: Virtual Assistant, Chatbot Recording Device: N/A Unit: 106,649 messages Add Dataset to Quote ENG_SMS001A Appen Global SMS text messages English United States N/A 390 N/A 106,649 Available on request N/A text This is a subset of ENG_SMS001. This dataset contains threaded SMS conversations between 2 participants, using iMessage and Android SMS. All messages are in US English. Contains timestamps and text message exchanges, with metadata including gender, age range and relationship between participants. Consent is obtained from all participants and the dataset does not contain PII. English (United States) Conversation SMS – Threaded
Dataset Text English (United States) Conversation WhatsApp – Threaded Common Use Cases: Virtual Assistant, Chatbot Recording Device: N/A Unit: 366,380 messages Add Dataset to Quote ENG_SMS002 Appen Global WhatsApp text messages English United States N/A 1780 N/A 366,380 Available on request N/A text This dataset contains threaded text message conversations between 2 participants, using WhatsApp. All messages are in US English. Contains timestamps and text message exchanges, with metadata including gender, age range and relationship between participants. Consent is obtained from all participants and the dataset does not contain PII. English (United States) Conversation WhatsApp – Threaded
Dataset Audio English (United States) conversational smartphone Common Use Cases: ASR, Conversational AI, Speech Analytics Recording Device: Mobile phone Unit: 1000 hours Add Dataset to Quote USE_ASR003 Appen Global Conversational Speech English United States Low background noise 1,856 1 500,000 52,586 16 wav Dataset is fully transcribed and timestamped
Dataset is accompanied by a pronunciation lexicon containing all transcribed words
Conversations cover a wide variety of topics including: study/major/work, hometown, living arrangements, weather and seasons, punctuality, TV programs/film)
English (United States) conversational smartphone
Dataset Text English (United States) Medical Terms Pronunciation Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 8,000 words Add Dataset to Quote eng_USA_Med_PHON Appen Global Pronunciation Dictionary English United States N/A N/A N/A N/A 8,000 N/A text Pronunciation dictionary of medical terms with their associated transcriptions and domain tagging.
Data is comprised of medical words extracted from PubMed abstracts, as well as pharmaceutical drug names collected by Appen through web-spidering. Pronunciations were processed by native speakers of US English and domain tagging done by a team of US English native speakers with medical transcription or other medical qualifications and experience.
Domains include: Anatomy, Biochem/biological, Condition, General, Organisation, Person, Pharmaceutical, Procedure.
English (United States) Medical Terms Pronunciation Dictionary
Dataset Text English (United States) Part of Speech Dictionary Common Use Cases: ASR, TTS, Language Modelling Recording Device: N/A Unit: 263,000 words Add Dataset to Quote eng_USA_POS Appen Global Part of Speech Dictionary English United States N/A N/A N/A N/A 263,000 N/A text English (United States) Part of Speech Dictionary