We’re excited to announce Phil Hall, Senior VP of the Language Resource Division and Luca Rognoni, Linguistic Product Manager, will be presenting research on speech recognition, speaker comparisons, and Pashto speech data at Interspeech August 20-24 at the University of Stockholm in Sweden (read the presentation abstracts below). Dr. Dorota Iskra, Appen VP of Business Development Europe, will be participating on the Young Female Researchers in Speech Science & Technology (YFRSW) senior panel where she’ll speak about her own research and experience as a women in the speech community.
We’re also excited to be exhibiting again at Interspeech and would love to meet up with you to discuss your current business initiatives. Come to booth #4 to get the newest Appen t-shirt – here’s a sneak peek!
Monday, August 21st, 12:20pm | Aula Magna
“English Conversational Telephone Speech Recognition by Humans and Machines”
Authors: George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi,
Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana
Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall
Automated speech recognition word error rates on the Switchboard conversational corpus that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This then raises two issues: what is human performance, and how far down can we still drive speech recognition error rates? In trying to assess human performance, we performed an independent set of measurements on the Switchboard and CallHome subsets of the Hub5 2000 evaluation and found that human accuracy may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve.
Tuesday, August 22nd, 10:00am – 12:00pm | Hall B3
“Speaker Comparison for Forensic and Investigative”
• Jean-François Bonastre, LIA, University of Avignon, France
• Joseph P. Campbell, MIT Lincoln Laboratory, USA
• Anders Eriksson, Stockholm University, Sweden
• Michael Jessen, BKA (Federal Criminal Police Office), Germany
• Reva Schwartz, National Institute of Standards and Technology, USA
• Phil Hall, Appen, Sr. VP – Language Resource Division, Australia
The aim of this special event is to have several structured discussions on speaker comparison for forensic and investigative applications, where many international experts will present their views and participate in the free exchange of ideas. In speaker comparison, speech samples are compared by humans and/or machines for use in investigations or in court to address questions that are of interest to the legal system. Speaker comparison is a high-stakes application that can change people’s lives and it demands the best that science has to offer; however, methods, processes, and practices vary widely. These variations are not necessarily for the better and, although recognized, are not generally appreciated and acted upon. Methods, processes, and practices grounded in science are critical for the proper application (and non-application) of speaker comparison to a variety of international investigative and forensic applications. This event follows the successful Interspeech 2015 and 2016 special events of the same name.
Appen Paper Presentation
Tuesday, August 22nd, 2:50pm | Room C6
“Pashto Intonation Patterns”
• Luca Rognoni, Linguistic Product Manager, Appen
• Judith Bishop, Director of Linguistic Services, Appen
• Miriam Corris, Senior Linguistic Project Manager, Appen
A hand-labelled Pashto speech data set containing spontaneous conversations is analysed in order to propose an intonational inventory of Pashto. Basic intonation patterns observed in the language are summarised. The relationship between pitch accent and part of speech (PoS), which was also annotated for each word in the data set, is briefly addressed. The results are compared with the intonational literature on Persian, a better-described and closely-related language. The results show that Pashto intonation patterns are similar to Persian, as well as reflecting common intonation patterns such as falling tone for statements and WH-questions, and yes/no questions ending in a rising tone. The data also show that the most frequently used intonation pattern in Pashto is the so- called hat pattern. The distribution of pitch accent is quite free both in Persian and Pashto, but there is a stronger association of pitch accent with content than with function words, as is typical of stress-accent languages. The phonetic realisation of focus appears to be conveyed with the same acoustic cues as in Persian, with a higher pitch excursion and longer duration of the stressed syllable of the word in focus. The data also suggest that post-focus compression (PFC) is present in Pashto.