Content-based Music Information Retrieval (CB-MIR) and its Applications Towards Music Recommender System

Murthy, Y V Srinivasa.

Please use this identifier to cite or link to this item: https://idr.nitk.ac.in/jspui/handle/123456789/14487

Title:	Content-based Music Information Retrieval (CB-MIR) and its Applications Towards Music Recommender System
Authors:	Murthy, Y V Srinivasa.
Supervisors:	Koolagudi, Shashidhar G.
Keywords:	Department of Computer Science & Engineering;Convolutional Neural Networks;Genetic algorithm based feature selection (GAFS);Graph based collaborative filtering;Formant Analysis;Music information retrieval;Music mood estimation;Music recommender system;Singer identification;Singing voice detection;Vocal & non-vocal segmentation
Issue Date:	2019
Publisher:	National Institute of Technology Karnataka, Surathkal
Abstract:	Music is a pervasive element of human’s day-to-day activities. Most of the people love to listen to music all the time for handling their stress and tensions. Some are capable of creating the music. The importance of music for human beings has exploited the advancements in technology resulting in an enormous number of digital tracks. However, a majority of tracks are available with an inadequate meta-information. The meta-information is limited to the song title, album name, singer name and composer. Now, the question is how to organize them effectively in order to retrieve the relevant clips quickly, without proper meta-information like genre, lyrics, raga, mood, instrument names, etc. The process of labelling the meta-information manually for millions of tracks of the digital cloud is practically not possible. Hence, an area of research known as music information retrieval (MIR) has been introduced in the early years of 21st century. However, it acquired much attention of researchers since 2005 with the support of Music Information Retrieval Evaluation eXchange (MIREX)1 competition. There are several works that have been proposed for various tasks of MIR such as singing voice detection, singer identification, genre classification, instrument identification, music mood estimation, lyrics generation, music annotation and so on. However, the main focus is on Western music, and only a few works are reported on Indian songs in the literature. Since Indian popular songs are contributing to a major portion of the global digital cloud, in this thesis, an attempt has been made to develop a few useful MIR tasks such as vocal and non-vocal segmentation, singer identification, music mood estimation and development of music recommender system in Indian scenario. Efforts have been put to construct relevant databases with a possible coherence for all the tasks mentioned above. Results include comparative analysis with standard datasets such as MIR-1K and artist20 are given. For each of the four tasks, some novel approach has been presented in this thesis. First, the task of vocal and non-vocal segmentation has been choosen to locate the onset and offset points of singing voice regions. A set of novel features such as formant attack slope (FAS), formant heights from base-to-peak (FH1), formant angle values at peak (FA1),formant angle values of valley (FA2), and singer formant (F5) have been computed and used for discriminating vocal and non-vocal segments. Also, an attempt has been made to develop a feature selection algorithm based on the concepts of genetics, known as genetic algorithm based feature selection (GAFS). The list of observations made out of this experimentation using selected features on the Indian and Western databases has been reported. Second, the task of singer identification (SID) has been considered. A database with the songs of 10 male and 10 female singers has been constructed. The songs are taken from two popular cine industries of Indian subcontinent named Tollywood (Telugu) and Bollywood (Hindi). Various timbral and temporal features have been computed to analyze their effect on singer identification with different classifiers. However, the feature based systems are found to be less effective, and hence the trending convolutional neural networks (CNNs) have been used with spectrograms of song clips as inputs. Identifying mood of the song has been considered as a third objective for this thesis. Six different moods are identified based on the analysis done on the combination of Russell’s and Thayer’s models (Saari and Eerola, 2014). We have developed, a two-level classification model for music mood detection. In the first stage, songs have been categorized into energetic or non-energetic songs. The actual class label has been predicted in the second stage. The performance of the system is found to be better in this case compared to development of single phase classification recommender system has been taken up using the labels like the title of a track, singer name(s), mood of a song, and duration. The graph structure based recommendation system has been proposed in this work to estimate the similarity in the listening patterns of same listeners. A graph has been constructed for every user by considering songs as nodes. Further, the similarities are estimated using the adjacency matrices obtained on listening patterns. This approach could be more appropriate for improving the performance of song recommender systems.
URI:	http://idr.nitk.ac.in/jspui/handle/123456789/14487
Appears in Collections:	1. Ph.D Theses

Files in This Item:

File	Description	Size	Format
135067CS13F05.pdf		8.89 MB	Adobe PDF	View/Open

Show full item record