Abstract
This paper uses a crowd-sourced definition of a speech phenomenon we have called focus. Given sentences, text and speech, in isolation and in context, we asked annotators to identify what we term the focus word. We present their consistency in identifying the focused word, when presented with text or speech stimuli. We then build models to show how well we predict that focus word from lexical (and higher) level features. Also, using spectral and prosodic information, we show the differences in these focus words when spoken with and without context. Finally, we show how we can improve speech synthesis of these utterances given focus information.
Original language | English |
---|---|
Title of host publication | Proceedings of the 14th Annual Conference of the International Speech Communication Association : Interspeech 2013 |
Number of pages | 5 |
Publisher | International Speech Communication Association (ISCA) |
Publication date | 2013 |
ISBN (Electronic) | 978-1-62993-443-3 |
Publication status | Published - 2013 |