TY - JOUR
T1 - A storyteller's tale
T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
AU - Carmi, Nehory
AU - Cohen, Azaria
AU - Avigal, Mireille
AU - Lerner, Anat
N1 - Publisher Copyright:
© 2019 ISCA
PY - 2019
Y1 - 2019
N2 - Identifying acoustic properties that characterize reading literary genres can assist in giving a more personal and human tone to the speech of bots and automatic readings. In this paper we consider the following question: given speech segments of audiobooks, how well can we classify them according to their literary genres? In this study we consider three different literary genres: children, horror and suspense, and humorous audio books, taken from two free audio books sites: Librivox and YouTube. We ran four classification experiments: three for each pair of genres, and one for all three genres together. We repeated each experiment twice, with two different network architectures: Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Note that, throughout the reading, there are sections that are more typical to the book's genre than others. As the samples were taken sequentially throughout the reading of the books and were short in duration, we did not expect high classification rates. Nevertheless, the accuracy of all the experiments were at least 72% for all the pair's classifications; and at least 57% for both architectures for the three classes classifications.
AB - Identifying acoustic properties that characterize reading literary genres can assist in giving a more personal and human tone to the speech of bots and automatic readings. In this paper we consider the following question: given speech segments of audiobooks, how well can we classify them according to their literary genres? In this study we consider three different literary genres: children, horror and suspense, and humorous audio books, taken from two free audio books sites: Librivox and YouTube. We ran four classification experiments: three for each pair of genres, and one for all three genres together. We repeated each experiment twice, with two different network architectures: Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). Note that, throughout the reading, there are sections that are more typical to the book's genre than others. As the samples were taken sequentially throughout the reading of the books and were short in duration, we did not expect high classification rates. Nevertheless, the accuracy of all the experiments were at least 72% for all the pair's classifications; and at least 57% for both architectures for the three classes classifications.
KW - Acoustic features
KW - Deep learning
KW - Literary genres
KW - Prosody
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85074703143&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-1154
DO - 10.21437/Interspeech.2019-1154
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85074703143
SN - 2308-457X
VL - 2019-September
SP - 3387
EP - 3390
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Y2 - 15 September 2019 through 19 September 2019
ER -