Medical Speech, Transcription, and Intent (English)

8.5 hours of audio utterances paired with text for common medical symptoms.

Overview

This data contains thousands of audio utterances for common medical symptoms like “knee pain” or “headache,” totaling more than 8 hours in aggregate. Each utterance was created by individual human contributors based on a given symptom. These audio snippets can be used to train conversational agents in the medical field.

This dataset was created via a multi-job workflow. The first involved contributors writing text phrases to describe symptoms given. For example, for “headache,” a contributor might write “I need help with my migraines.” Subsequent jobs captured audio utterances for accepted text strings.

This dataset contains both the audio utterances and corresponding transcriptions.

 

This input data consists of symptom prompts. Human contributors based their text phrases on these prompts, which were then used to collect audio utterances later in this workflow. The “Data” tab above contains further information and data on the audio recordings that were eventually made from these prompts.

prompt  
Musclepain
Cough
Feelingcold
Hardtobreathe
Headache
Stomachache
Jointpain
Internalpain
Earache
Blurryvision

Data

patient_symptom_audio_test.zip | 2.3 GB
patient_symptom_audio_train.zip | 160.2 MB
patient_symptom_audio_validate.zip | 137.7
recordings-overview.csv | 1.7 MB