Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 29

A novel approach for classification of normal/abnormal phonocardiogram recordings using temporal signal analysis and machine learning
(IEEE Computer Society help@computer.org, 2016) Vernekar, S.; Nair, S.; Vijayasenan, D.; Ranjan, R.
This paper discusses a novel approach used for classification of phonocardiogram (PCG) excerpts into normal and abnormal classes as a part of Physionet 2016 challenge [10]. The dataset used for the competition comprises of cardiac abnormalities such as mitral valve prolapse (MVP), benign murmurs, aortic diseases, coronary artery disease, miscellaneous pathological conditions etc. [3], We present the approach used for classification from a general machine learning application standpoint, giving details on feature extraction, type of classifiers used comparing their performances individually and in combination. We propose a technique which leverages previous research on feature extraction with a novel approach to modeling temporal dynamics of the signal using Markov chain analysis [7,9]. These newly introduced Markov features along with other statistical and frequency domain features, trained over an ensemble of artificial neural networks and gradient boosting trees, with bagging, gave us an accuracy of 82% on the validation dataset provided in the competition and was consistent with the test data with the best result of 78%. Â© 2016 CCAL.
A novel approach for Robust Detection of Heart Beats in Multimodal Data using neural networks and boosted trees
(IEEE Computer Society help@computer.org, 2016) Vernekar, S.; Vijayasenan, D.; Ranjan, R.
This work describes a novel approach designed for Physionet 2014 Challenge, Robust Detection of Heart Beats in Multimodal Data [5]. The objective here is to detect the location of R peaks from QRS complex of an electrocardiogram (ECG) excerpt. Robust detection of heart beats in a noisy ECG signal is an extremely difficult task. To overcome the challenge in such situations, besides ECG, blood pressure (BP) signal is also recorded at the same time; hence the idea here is that, if a segment of one of the signals is noisy, the peaks in that segment can be better estimated by peaks found in the corresponding segment of other signal, if good. The approach uses Machine Learning (ML) methods to identify locations of R-peaks in a given segment of ECG or BP signal. Peaks from both ECG and BP signal are found separately using a novel feature representation and subsequent ML approaches that renders R peaks in the signal, easier to be detected, by a simple windowing technique. Individually detected peaks, from both ECG and BP are further analyzed in chunks of equal short time periods, and the best result of the two is chosen in final peak prediction based on variance comparison techniques. The performance of system on the training dataset [11] provided in the competition is 99.95%. The performance on test datasets which are hidden for phase I, phase II and phase III of the competition respectively are 93.27%, 90.28% and 89.74%. The submission resulted in 1st place in all three phases of the competition. Â© 2016 CCAL.
Estimating multiple physical parameters from speech data
(IEEE Computer Society help@computer.org, 2016) Kalluri, S.B.; Vijayakumar, A.; Vijayasenan, D.; Singh, R.
In this work, we explore prediction of different physical parameters from speech data. We aim to predict shoulder size and waist size of people from speech data in addition to the conventional height and weight parameters. A data-set with this information is created from 207 volunteers. A bag of words representation based on log magnitude spectrum is used as features. A support vector regression predicts the physical parameters from the bag of the words representation. The system is able to achieve a root mean square error of 6.6 cm for height estimation, 2.6cm for shoulder size, 7.1cm for waist size and 8.9 kg for weight estimation. The results of height estimation is on par with state of the art results. Â© 2016 IEEE.
Robust features for automatic estimation of physical parameters from speech
(Institute of Electrical and Electronics Engineers Inc., 2017) Kalluri, K.S.; Vijayasenan, D.
Estimating speaker's physical parameters like height, weight and shoulder size can assist in voice forensics by providing additional knowledge about the speaker. In this work, statistics of the components of background GMM are employed as features in estimating the physical parameters. These features improved the performance of height and shoulder size estimation as compared to our earlier attempt based on a Bag of Word representation. The robustness of the features is validated using two different training subsets containing different languages. Â© 2017 IEEE.
Prediction of aesthetic elements in Karnatic music: A machine learning approach
(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2018) Rajan, M.; Vijayakumar, A.; Vijayasenan, D.
Gamakas, the embellishments and ornamentations used to enhance musical experience, are defining features of Karnatic Music (KM). The appropriateness of using gamaka is determined by aesthetics and is often developed by musicians with experience. Therefore, understanding and modeling gamaka is a significant bottleneck in applications like music synthesis, automatic accompaniment, etc. in the context of KM. To this end, we propose to learn both the presence and the type of gamaka in a data-driven manner using annotated symbolic music. In particular, we explore the efficacy of three classes of features - note-based, phonetic and structural - and train a Random Forest Classifier to predict the existence and the type of gamaka. The observed accuracy is âˆ¼70% for gamaka detection and âˆ¼60% for gamaka classification. Finally, we present an analysis of the features and find that frequency and duration of the neighbouring notes prove to be the most important features. Â© 2018 International Speech Communication Association. All rights reserved.
Study of Wireless Channel Effects on Audio Forensics
(Institute of Electrical and Electronics Engineers Inc., 2018) Vijayasenan, D.; Kalluri, S.B.; Sreekanth, K.; Issac, A.
In this work, we try to study the effect of a wireless channel on physical parameter prediction based on speech data. Speech data from 207 speakers along with corresponding speaker's height and weight is collected. A three path Rayleigh fading channel with typical values of Doppler shift, path gain and path delay is utilized to create the mobile channel output audio. A Bag of Words (BoW) representation based on log magnitude spectrum is used as features. Support Vector Regression (SVR) predicts the physical parameter of the speaker from the BoW representation. The proposed system is able to achieve a Root Mean Square Error (RMSE) of 6.6 cm for height estimation and 8.9 Kg for weight estimation for clean speech. The effect of Rayleigh channel increase the RMSE values to 8.17 cm and 11.84 Kg respectively for height and weight. Â© 2016 IEEE.
Frequency contour modeling to synthesize natural flute renditions for carnatic music
(Institute of Electrical and Electronics Engineers Inc., 2018) Ashtamoorthy, A.; Prasad, P.; Dhar, S.; Vijayasenan, D.
Hidden Markov Models used for computer music synthesis do not satisfactorily reproduce Indian Carnatic music and also require large training datasets. The essence of Indian Carnatic music is its micro-tonal frequency variations called Gamakas. In this work, we study the flute note properties, features that characterize the Gamakas, and hence attempt to devise a generalized method for synthesizing Carnatic music flute compositions. Our method uses additive sinusoidal synthesis coupled with a stochastic noise model. In time domain, splines are used to model the amplitude envelope to ensure a natural reconstruction. Integrated frequency contours are used for smooth concatenation of notes and modelling of Gamakas and notes. In order to evaluate our synthesis, we use a Mean Opinion Score (MOS) survey to compare our results with the baseline and the original recordings. The MOS of the proposed method is around 3.5 while the baseline is 2.3. Â© 2018 IEEE.
A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
(Institute of Electrical and Electronics Engineers Inc., 2019) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.
Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files. Â© 2019 IEEE.
An Integrated Deep Learning Approach towards Automatic Evaluation of Ki-67 Labeling Index
(Institute of Electrical and Electronics Engineers Inc., 2019) Lakshmi, S.; Vijayasenan, D.; Sumam David, S.; Sreeram, S.; Suresh, P.K.
Ki-67 labeling index is a widely used biomarker for the diagnosis and monitoring of cancer. Many automated techniques have been proposed for evaluating Ki-67 index. In this paper, we introduce an integrated deep learning based approach. We use MobileUnet model for segmentation and classification and connected component based algorithm for the estimation of Ki-67 index in bladder cancer cases. The average F1 score is 0.92 and dice score is 0.96. The mean absolute error in the evaluated Ki-67 index is 2.1. We also explore possible pre-processing steps to generalize the segmentation model to at least one another type of cancer. Histogram matching and re-sizing improve the performance in breast cancer data by 12% in F1 score and 8% in dice score. Â© 2019 IEEE.
Deep Learning Model based Ki-67 Index estimation with Automatically Labelled Data
(Institute of Electrical and Electronics Engineers Inc., 2020) Lakshmi, S.; Sai Ritwik, K.V.; Vijayasenan, D.; Sumam David, S.; Sreeram, S.; Suresh, P.K.
Ki-67 labelling index is a biomarker which is used across the world to predict the aggressiveness of cancer. To compute the Ki-67 index, pathologists normally count the tumour nuclei from the slide images manually; hence it is timeconsuming and is subject to inter pathologist variability. With the development of image processing and machine learning, many methods have been introduced for automatic Ki-67 estimation. But most of them require manual annotations and are restricted to one type of cancer. In this work, we propose a pooled Otsu's method to generate labels and train a semantic segmentation deep neural network (DNN). The output is postprocessed to find the Ki-67 index. Evaluation of two different types of cancer (bladder and breast cancer) results in a mean absolute error of 3.52%. The performance of the DNN trained with automatic labels is better than DNN trained with ground truth by an absolute value of 1.25%. Â© 2020 IEEE.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results