Conference Papers

Search Results

Now showing 1 - 2 of 2

Locality-constrained linear coding based fused visual features for robust acoustic event classification
(International Speech Communication Association, 2019) Mulimani, M.; Koolagudi, G.K.
In this paper, a novel Fused Visual Features (FVFs) are proposed for Acoustic Event Classification (AEC) in the meeting room and office environments. The codes of Visual Features (VFs) are evaluated from row vectors and Scale Invariant Feature Transform (SIFT) vectors of the grayscale Gammatonegram of an acoustic event separately using Locality-constrained Linear Coding (LLC). Further, VFs from row vectors and SIFT vectors of the grayscale Gammatonegram are fused to get FVFs. Performance of the proposed FVFs is evaluated on acoustic events of publicly available UPC-TALP and DCASE datasets in clean and noisy conditions. Results show that proposed FVFs are robust to noise and achieve overall recognition accuracy of 96.40% and 90.45% on UPC-TALP and DCASE datasets, respectively. Â© 2019 ISCA
Identification of Palatal Fricative Fronting Using Shannon Entropy of Spectrogram
(Springer Science and Business Media Deutschland GmbH, 2020) Ramteke, P.B.; Supanekar, S.; Aithal, V.; Koolagudi, S.G.
In this paper, an attempt has been made to identify palatal fricative fronting in children speech, where postalveolar /sh/ is mispronounced as dental /s/. In childrenâ€™s speech, the concentration of energy (darkest part) of spectrogram for /s/ ranges 4000Â Hz to 8000Â Hz, whereas it ranges 3000Â Hz 8000Â Hz for /sh/. Gammatonegram follows the frequency subbands of the ear (wider for higher frequencies). Various spectral properties such as spectral centroid, spectral crest factor, spectral decrease, spectral flatness, spectral flux, spectral kurtosis, spectral spread, spectral skewness, spectral slope and Shannon entropy of the spectrogram (interval of 2000Â Hz), extracted from the Gammatonegram are proposed for the characterization of /sh/ and /s/. The dataset recorded from 60 native Kannada speaking children of age between 3 1/2 to 6 1/2 years is considered for the analysis from NITK Kidsâ€™ Speech Corpus. Support vector machine (SVMs) is considered for the classification. Various combinations of the proposed features are considered for the evaluation, along with the MFCCs(39) and LPCCs(39). Combination of MFCCs(39), LPCCs(39) and Entropy(4) is observed to achieve highest mispronunciation identification performance of 83.2983%. Â© 2020, Springer Nature Switzerland AG.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results