Speech Summarization Using Prosodic Features and 1-D Convolutional Neural Network

No Thumbnail Available

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

In this work, we have presented a method for speech summarization of audiobooks without converting them into the transcript. The model used is the 1-D convolutional neural network. The audio is segmented into sentences based on the silence between two consecutive sentences. We have used acoustic features of the sentence audio as input to our model. The output of our model is binary, which tells us whether to include this sentence in our summary or not. Thus, we have converted the task of speech summarization into a classification task. Then we have concatenated the classified audio chunks into one summary. We have compared the generated summary against the manually done summary. For better insights, we have used a text summarizer as a reference to see what the summary should include. The transcript is used for only that; otherwise, our method is independent of the text. The results obtained show us a possibility of a language-independent audio summarizer that retains the audio quality since we have used the original audio in our summary. © 2022 IEEE.

Description

Keywords

1-D Convolutional Neural Network, Prosodic/Acoustic features, Speech Summarization

Citation

7th IEEE International Conference on Recent Advances and Innovations in Engineering, ICRAIE 2022 - Proceedings, 2022, Vol., , p. 14-19

Endorsement

Review

Supplemented By

Referenced By