Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure
No Thumbnail Available
Date
2018
Authors
Murthy, Y.V.S.
Koolagudi, S.G.
Swaroop, V.G.
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The process of classifying vocal and non-vocal regions in an audio clip is the base for many Music Information Retrieval (MIR) tasks. In this work, we have computed novel features based on formant structure for segmenting the vocal and non-vocal regions of a given music clip. The features such as obtuse angles at formant peak, valley locations, convexity, and concavity have been proposed for this task after thorough analysis. The obtuse angles have been computed for second, third and fourth formants as much discrimination is not found for the first formant. The computed formant related features have been added to the base-line Mel frequency cepstral coefficients (MFCCs) in order to improve the performance. Moreover, singer formant (F5) has also been computed forming a 19-dimensional feature vector. As artificial neural networks (ANNs) are more suitable for handling nonlinear data, they have been considered as a classifier. Further, the 11-point moving window has been applied to avoid intermittent misclassifications. An accuracy of 88% is obtained using the proposed approach with a 19-dimensional feature vector. � 2017 IEEE.
Description
Keywords
Citation
2017 9th International Conference on Advances in Pattern Recognition, ICAPR 2017, 2018, Vol., , pp.304-309