Content-based Music Information Retrieval (CB-MIR) and its Applications Towards Music Recommender System
Date
2019
Authors
Murthy, Y V Srinivasa.
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute of Technology Karnataka, Surathkal
Abstract
Music is a pervasive element of human’s day-to-day activities. Most of the people love to
listen to music all the time for handling their stress and tensions. Some are capable of
creating the music. The importance of music for human beings has exploited the advancements in technology resulting in an enormous number of digital tracks. However, a majority of tracks are available with an inadequate meta-information. The meta-information
is limited to the song title, album name, singer name and composer. Now, the question
is how to organize them effectively in order to retrieve the relevant clips quickly, without
proper meta-information like genre, lyrics, raga, mood, instrument names, etc. The process of labelling the meta-information manually for millions of tracks of the digital cloud
is practically not possible. Hence, an area of research known as music information retrieval (MIR) has been introduced in the early years of 21st century. However, it acquired
much attention of researchers since 2005 with the support of Music Information Retrieval
Evaluation eXchange (MIREX)1 competition. There are several works that have been
proposed for various tasks of MIR such as singing voice detection, singer identification,
genre classification, instrument identification, music mood estimation, lyrics generation,
music annotation and so on. However, the main focus is on Western music, and only a
few works are reported on Indian songs in the literature.
Since Indian popular songs are contributing to a major portion of the global digital
cloud, in this thesis, an attempt has been made to develop a few useful MIR tasks such
as vocal and non-vocal segmentation, singer identification, music mood estimation and
development of music recommender system in Indian scenario. Efforts have been put
to construct relevant databases with a possible coherence for all the tasks mentioned
above. Results include comparative analysis with standard datasets such as MIR-1K and
artist20 are given. For each of the four tasks, some novel approach has been presented in
this thesis.
First, the task of vocal and non-vocal segmentation has been choosen to locate the onset and offset points of singing voice regions. A set of novel features such as formant attack slope (FAS), formant heights from base-to-peak (FH1), formant angle values at peak
(FA1),formant angle values of valley (FA2), and singer formant (F5) have been computed
and used for discriminating vocal and non-vocal segments. Also, an attempt has been
made to develop a feature selection algorithm based on the concepts of genetics, known
as genetic algorithm based feature selection (GAFS). The list of observations made out
of this experimentation using selected features on the Indian and Western databases has
been reported. Second, the task of singer identification (SID) has been considered. A
database with the songs of 10 male and 10 female singers has been constructed. The
songs are taken from two popular cine industries of Indian subcontinent named Tollywood
(Telugu) and Bollywood (Hindi). Various timbral and temporal features have been computed to analyze their effect on singer identification with different classifiers. However, the
feature based systems are found to be less effective, and hence the trending convolutional
neural networks (CNNs) have been used with spectrograms of song clips as inputs.
Identifying mood of the song has been considered as a third objective for this thesis. Six different moods are identified based on the analysis done on the combination of
Russell’s and Thayer’s models (Saari and Eerola, 2014). We have developed, a two-level
classification model for music mood detection. In the first stage, songs have been categorized into energetic or non-energetic songs. The actual class label has been predicted
in the second stage. The performance of the system is found to be better in this case
compared to development of single phase classification recommender system has been
taken up using the labels like the title of a track, singer name(s), mood of a song, and
duration. The graph structure based recommendation system has been proposed in this
work to estimate the similarity in the listening patterns of same listeners. A graph has
been constructed for every user by considering songs as nodes. Further, the similarities
are estimated using the adjacency matrices obtained on listening patterns. This approach
could be more appropriate for improving the performance of song recommender systems.
Description
Keywords
Department of Computer Science & Engineering, Convolutional Neural Networks, Genetic algorithm based feature selection (GAFS), Graph based collaborative filtering, Formant Analysis, Music information retrieval, Music mood estimation, Music recommender system, Singer identification, Singing voice detection, Vocal & non-vocal segmentation