Off the shelf machine learning datasets repository from Appen. Find 250+ datasets across 80 languages and dialects for a variety of common AI and ML use cases.
The input data for this job is disaster and threat related information from the Red Cross. This includes topics like illness, injuries, and natural disasters. For each of these categories there is information about what to do in a given situation and how to use available tools to survive potentially life-threatening situations.
This data includes a Swahili language translation for each text string. The Appen platform job is designed to correct the Swahili translation (if needed) and then collect an audio snippet for a person speaking the Swahili translation of the phrase. Additionally, contributors were asked to categorize the phrase based on the context of the given topic. This includes items that will help in a given situation, behavior that can aid in survival, and various other categorizations.
In total, this dataset contains 10 and a half hours of spoken Swahili audio utterances, as well as the attendant English and Swahili text strings.
This input data for this job consists of short, disaster-related text prompts translated to Swahili by native Swahili speakers. An example is below:
|Some diseases are preventable by immunization with vaccines.|
|Many different types of vaccines are available.|
|Vaccines can be enormously successful in preventing some of the major communicable diseases.|
|The immune system protects people from infection.|
|Immunization benefits the whole country.|
|It prevents millions of people dying needlessly each year.|
|It has led to some diseases being eradicated from the world.|
|It promotes health and optimal growth and development in children.|
|It releases resources for other health interventions.|