Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/7118
Title: A robust speech rate estimation based on the activation profile from the selected acoustic unit dictionary
Authors: Nagesh, S.
Yarra, C.
Deshmukh, O.D.
Ghosh, P.K.
Issue Date: 2016
Citation: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2016, Vol.2016-May, , pp.5400-5404
Abstract: A typical solution for the speech rate estimation consists of two stages, which involves first computing a short-time feature contour such that most of peaks of the contour correspond to the syllable nuclei followed by the detection of the peaks of the contour corresponding to the syllable nuclei. Temporal correlation selected subband correlation (TCSSBC) is often used as a feature contour for the speech rate estimation in which correlation within and across a few selected sub-band energies are computed. In this work, instead of a fixed set of sub-bands, we learn them in a data-driven manner using a dictionary learning approach. Similarly, instead of the energy contours, we use the activation profile from the learned dictionary elements. We found that the peaks detected from the data-driven approach significantly improve the speech rate estimation when combined with the traditional TCSSBC approach using a proposed peak-merging strategy. Experiments are performed separately using Switchboard, TIMIT and CTIMIT corpora. Except Switchboard, the correlation coefficient for the speech rate estimation using the proposed approach is found to be higher than those by the TCSSBC technique - 3.1% and 5.2% (relative) improvements for TIMIT and CTIMIT respectively. � 2016 IEEE.
URI: http://idr.nitk.ac.in/jspui/handle/123456789/7118
Appears in Collections:2. Conference Papers

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.