Browsing by Author "Vijayasenan, D."

Now showing 1 - 20 of 44

A continuous time model for Karnatic flute music synthesis
(Cogent OA, 2023) Rajan, R.M.; Vijayasenan, D.; Suresh, S.
Gamakas, the essential embellishments, are integral parts of Karnatic music. Synthesising any form of Karnatic music necessitates proper modelling and synthesis of different gamakas associated with each note. We propose a spectral model to efficiently synthesize gamakas for Karnatic bamboo flute music from the notes, duration and gamaka information. We model three different components of the flute sound, namely, pitch contour, harmonic weights and time domain amplitude envelope. Cubic splines are used to parametrically represent these components. Subjective analysis of the results shows that the proposed method is better than the existing spectral methods in terms of tonal and aesthetic qualities of gamaka rendition. Hypothesis test results show that the observed improvements over other methods are statistically significant at 95% confidence interval. © 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
A Deep Learning Approach toÂ Enhance Semantic Segmentation ofÂ Bacteria andÂ Pus Cells fromÂ Microscopic Urine Smear Images Using Synthetic Data
(Springer Science and Business Media Deutschland GmbH, 2024) Kanabur, V.R.; Vijayasenan, D.; Sumam David, S.; Govindan, S.
Urine smear analysis aids in preliminary diagnosis of Urinary Tract Infection. But it is time-consuming and requires a lot of medical expertise. Automating the process using machine learning can save time and effort. However obtaining a large medical dataset is difficult due to data privacy concerns and medical expertise requirements. In this study, we propose a method to synthesize a large dataset of gram-stained microscopic images containing pus cells and bacteria. We train a machine learning model to achieve semantic segmentation of bacteria and pus cells using this dataset. Later we use it to perform transfer learning on a relatively small dataset of gram stained urine microscopic images. Our approach improved the F1-score from 50% to 63% for bacteria segmentation and from 77% to 83% for pus cell segmentation. This method has the potential to improve the turn-around time and the quality of preliminary diagnosis of Urinary Tract Infection. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
A Deep Learning Model for the Automatic Detection of Malignancy in Effusion Cytology
(Institute of Electrical and Electronics Engineers Inc., 2020) Aboobacker, S.; Vijayasenan, D.; Sumam David, S.; Suresh, P.K.; Sreeram, S.
The excessive accumulation of fluid between layers of pleura covering lungs is known as pleural effusion. Pleural effusion may be due to various infections, inflammations or malignancy. The cytologists visually examine the microscopic slide to detect the malignant cells. The process is time-consuming, and interpretation of reactive cells and cells with ambiguous levels of atypia may differ between pathologists. Considerable research is happening towards the automation of fluid cytology reporting. We propose an integrated approach based on deep learning, where the network learns directly to detect the malignant cells in effusion cytology images. Architecture U-Net is used to learn the malignant and benign cells from the images and to detect the images that contain malignant cells. The model gives a precision of 0.96, recall of 0.96, and specificity of 0.97. The AUC of the ROC curve is 0.97. The model can be used as a screening tool and has a malignant cell detection rate of 0.96 with a low false alarm rate of 0.03. Â© 2020 IEEE.
A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
(Institute of Electrical and Electronics Engineers Inc., 2019) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.
Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files. Â© 2019 IEEE.
A hybrid CNN-FC approach for automatic grading of brain tumors from non-invasive MRIs
(Institute of Electrical and Electronics Engineers Inc., 2024) Bhaskaracharya, B.; Nair, R.P.; Prakashini, K.; Girish Menon, R.; Litvak, P.; Mandava, P.; Vijayasenan, D.; Sumam David, S.
The grading of brain tumors is essential in treatment planning to effectively control the tumor growth and reduce the associated symptoms. Appropriate treatment planning might help in improving the quality of life and patient life span. Gliomas are indeed the most common type of brain tumor, originating from glial cells. Low-grade gliomas (grades 1 or 2) are typically slow-growing, less invasive, and may be suitable for surgical resection or targeted therapies. On the other hand, higher-grade tumors such as grades 3 or 4 are more aggressive, it might infiltrate the surrounding brain tissue making complete resection challenging. In clinical diagnosis, traditionally tumor grading requires the procedure of resecting a part of the tumor for microscopic examination. To address this, a method to grade the tumor non-invasively using MRIs is proposed. Our work utilized the BraTS2018 dataset to segment the substructure of brain tumors that includes necrosis and non-enhancing, edema, and enhancing regions. These regions are then used to train the proposed grading model. Furthermore, we evaluated the performance of our model on a tertiary hospital dataset consisting of 69 samples. The accuracy scores obtained on the BraTS2018 test sample and tertiary hospital dataset are 0.87 and, 0.85 respectively. This consistent score on both public and tertiary hospital datasets indicates a reliable and stable performance of the model. Â© 2024 IEEE.
A more generalizable DNN based Automatic Segmentation of Brain Tumors from Multimodal low-resolution 2D MRI
(Institute of Electrical and Electronics Engineers Inc., 2021) Bhaskaracharya, B.; Nair, R.P.; Prakashini, K.; Girish Menon, R.; Litvak, P.; Mandava, P.; Vijayasenan, D.; Sumam David, S.
In the field of Neuro-oncology, there is a need for improved diagnosis and prognosis of brain tumors. Brain tumor segmentation is important for treatment planning and assessing the treatment outcomes. Manual segmentation of brain tumors is tedious, time-consuming, and subjective. In this work, an efficient encoder-decoder based architectures were implemented for automatic segmentation of brain tumors from low resolution 2D images. Ensemble of the multiple architectures (EMMA) improves the performance of the brain tumor segmentation. Furthermore, the computational requirements of the proposed models are lower than that of BraTS-challenge methods. The average Fl-scores on the BraTS-challenge validation dataset for Tumor Core, Whole Tumor, and Enhancing Tumor are 0.82, 0.87, and 0.78, respectively. The average Fl-scores on the KMC-Manipal dataset for TC, WT, and ET are 0.74, 0.82, and 0.68 respectively. Â© 2021 IEEE.
A novel approach for classification of normal/abnormal phonocardiogram recordings using temporal signal analysis and machine learning
(IEEE Computer Society help@computer.org, 2016) Vernekar, S.; Nair, S.; Vijayasenan, D.; Ranjan, R.
This paper discusses a novel approach used for classification of phonocardiogram (PCG) excerpts into normal and abnormal classes as a part of Physionet 2016 challenge [10]. The dataset used for the competition comprises of cardiac abnormalities such as mitral valve prolapse (MVP), benign murmurs, aortic diseases, coronary artery disease, miscellaneous pathological conditions etc. [3], We present the approach used for classification from a general machine learning application standpoint, giving details on feature extraction, type of classifiers used comparing their performances individually and in combination. We propose a technique which leverages previous research on feature extraction with a novel approach to modeling temporal dynamics of the signal using Markov chain analysis [7,9]. These newly introduced Markov features along with other statistical and frequency domain features, trained over an ensemble of artificial neural networks and gradient boosting trees, with bagging, gave us an accuracy of 82% on the validation dataset provided in the competition and was consistent with the test data with the best result of 78%. Â© 2016 CCAL.
A novel approach for Robust Detection of Heart Beats in Multimodal Data using neural networks and boosted trees
(IEEE Computer Society help@computer.org, 2016) Vernekar, S.; Vijayasenan, D.; Ranjan, R.
This work describes a novel approach designed for Physionet 2014 Challenge, Robust Detection of Heart Beats in Multimodal Data [5]. The objective here is to detect the location of R peaks from QRS complex of an electrocardiogram (ECG) excerpt. Robust detection of heart beats in a noisy ECG signal is an extremely difficult task. To overcome the challenge in such situations, besides ECG, blood pressure (BP) signal is also recorded at the same time; hence the idea here is that, if a segment of one of the signals is noisy, the peaks in that segment can be better estimated by peaks found in the corresponding segment of other signal, if good. The approach uses Machine Learning (ML) methods to identify locations of R-peaks in a given segment of ECG or BP signal. Peaks from both ECG and BP signal are found separately using a novel feature representation and subsequent ML approaches that renders R peaks in the signal, easier to be detected, by a simple windowing technique. Individually detected peaks, from both ECG and BP are further analyzed in chunks of equal short time periods, and the best result of the two is chosen in final peak prediction based on variance comparison techniques. The performance of system on the training dataset [11] provided in the competition is 99.95%. The performance on test datasets which are hidden for phase I, phase II and phase III of the competition respectively are 93.27%, 90.28% and 89.74%. The submission resulted in 1st place in all three phases of the competition. Â© 2016 CCAL.
A Novel Feature Selection Method for Solar Flare Forecasting
(Institute of Electrical and Electronics Engineers Inc., 2024) Shenoy, A.N.; Vijayasenan, D.; Bobbi, R.S.; Padinhatteeri, S.; Adithya, H.N.
Large solar flares (SFs) can disrupt radio communication and harm instruments and astronauts. Hence, it's crucial to predict SFs. However, the mechanism that triggers SFs is not yet known. We only have several physical features believed to be related to the process. This makes choosing the most impactful features for SF production important. We investigate a feature selection method based on the weights learned by a linear classifier. We use the Spaceweather HMI Active Region Patch (SHARP) summary parameters based on the Solar Dynamics Observatory's Helioseismic and Magnetic Imager data records. The records are from May 2010 to December 2019. Â© 2024 IEEE.
Adversarial Learning Based Semi-supervised Semantic Segmentation ofÂ Low Resolution Gram Stained Microscopic Images
(Springer Science and Business Media Deutschland GmbH, 2024) Singh, H.; Kanabur, V.R.; Sumam David, S.; Vijayasenan, D.; Govindan, S.
Urinary tract infections (UTIs) are infections that affect the urinary system. It is usually caused by bacteria and pus cells. Analyzing urine samples, including examining pus cells, is a standard method for diagnosing and monitoring UTIs. However, manually detecting bacteria or pus cells in microscopic urine images is a time-consuming and labour-intensive task for microbiologists. Therefore, the segmentation of microscopic pus cell images will ease the process of detecting UTI. Especially low resolution microscopic images are hard to annotate; therefore, in this study, we propose an adversarial learning based semi-supervised segmentation method for segmentation of pus cell images at low resolution i.e. 40Ã— using labeled high resolution images i.e. 100Ã—. The proposed methodology aims to ease the process of UTI detection by automating the segmentation of pus cell images. The results of the proposed methodology demonstrate an increase in the Dice coefficient score percentage by 1%, 1.6% and 2.4% on 40Ã— images when compared to fully supervised segmentation model trained on only 100Ã— data using three different architectures- Unet, ResUnet++, and PSPnet, respectively. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
An Integrated Deep Learning Approach towards Automatic Evaluation of Ki-67 Labeling Index
(Institute of Electrical and Electronics Engineers Inc., 2019) Lakshmi, S.; Vijayasenan, D.; Sumam David, S.; Sreeram, S.; Suresh, P.K.
Ki-67 labeling index is a widely used biomarker for the diagnosis and monitoring of cancer. Many automated techniques have been proposed for evaluating Ki-67 index. In this paper, we introduce an integrated deep learning based approach. We use MobileUnet model for segmentation and classification and connected component based algorithm for the estimation of Ki-67 index in bladder cancer cases. The average F1 score is 0.92 and dice score is 0.96. The mean absolute error in the evaluated Ki-67 index is 2.1. We also explore possible pre-processing steps to generalize the segmentation model to at least one another type of cancer. Histogram matching and re-sizing improve the performance in breast cancer data by 12% in F1 score and 8% in dice score. Â© 2019 IEEE.
Artery Vein Segmentation in Handheld Fundus Camera Retinal Images and leveraging Cross Entropy for improved Semantic performance
(Institute of Electrical and Electronics Engineers Inc., 2024) Yohannan, R.P.; Sumam David, S.; Vijayasenan, D.; Chowdary, R.T.; Girish Menon, R.; Menon, S.G.
The segmentation of retinal vessels into arteries and veins in retinal images is a crucial task for analysing the vascular changes with respect to many diseases that manifest ocular symptoms. But most existing research has concentrated on fundus images acquired using tabletop cameras and not much has been studied on images captured by handheld cameras. Such cameras are particularly useful for examining bedridden patients, especially those who may have conditions such as hypertension or diabetes that can affect the retina, since they are portable and can be easily maneuvered by healthcare providers, allowing them to perform retinal examinations conveniently at the patient's bedside. This paper presents an approach to segment such images and assesses the impact of data augmentation on model performance. It further presents a method to compute pixel level weights during training, that allows for fine-grained adjustment of the loss function. Â© 2024 IEEE.
Automatic speaker profiling from short duration speech data
(Elsevier B.V., 2020) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.
Many paralinguistic applications of speech demand the extraction of information about the speaker characteristics from as little speech data as possible. In this work, we explore the estimation of multiple physical parameters of the speaker from the short duration of speech in a multilingual setting. We explore different feature streams for age and body build estimation derived from the speech spectrum at different resolutions, namely – short-term log-mel spectrogram, formant features and harmonic features of the speech. The statistics of these features over the speech recording are used to learn a support vector regression model for speaker age and body build estimation. The experiments performed on the TIMIT dataset show that each of the individual features is able to achieve results that outperform previously published results in height and age estimation. Furthermore, the estimation errors from these different feature streams are complementary, which allows the combination of estimates from these feature streams to further improve the results. The combined system from short audio snippets achieves a performance of 5.2 cm, and 4.8 cm in Mean Absolute Error (MAE) for male and female respectively for height estimation. Similarly in age estimation the MAE is of 5.2 years, and 5.6 years for male, and female speakers respectively. We also extend the same physical parameter estimation to other body build parameters like shoulder width, waist size and weight along with height on a dataset we collected for speaker profiling. The duration analysis of the proposed scheme shows that the state of the art results can be achieved using only around 1–2 s of speech data. To the best of our knowledge, this is the first attempt to use a common set of features for estimating the different physical traits of a speaker. © 2020 Elsevier B.V.
CNN Based Tropical Cyclone Intensity Estimation Using Satellite Images Around Indian Subcontinent
(Springer Science and Business Media Deutschland GmbH, 2024) Jha, P.; Sumam David, S.; Vijayasenan, D.
In this work, we have used deep learning models for estimating tropical cyclone (TC) intensity using satellite images. This is an image to regression problem, where an image is given as input and intensity value is estimated as output. In the literature, various deep learning methods have been proposed for TC intensity estimation but their focus on cyclones around the Indian subcontinent is limited. We have implemented three models: regression model, classification model, and a multitask model having regression and classification output as two tasks. We have worked with two sets of input data. One set of data contains single channel input containing infrared (IR) brightness temperature satellite image. Another set of data contains two channel inputs having infrared (IR) brightness temperature satellite image as one of the channels, and rain rate derived from passive microwave (PMW) satellite image as another channel. We have used satellite images for cyclones occurring in the Atlantic, Northeast Pacific, and North Central Pacific regions from 2006 through 2016. For cyclones around the Indian subcontinent, we have used satellite images from 2005- 2016. Â© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
COVID-19 detection from spectral features on the DiCOVA dataset
(International Speech Communication Association, 2021) Ritwik, K.V.S.; Kalluri, S.B.; Vijayasenan, D.
In this paper we investigate the cues of COVID-19 on sustained phonation of Vowel-/i/, deep breathing and number counting data of the DiCOVA dataset. We use an ensemble of classifiers trained on different features, namely, super-vectors, formants, harmonics and MFCC features. We fit a two-class Weighted SVM classifier to separate the COVID-19 audio from Non-COVID-19 audio. Weighted penalties help mitigate the challenge of class imbalance in the dataset. The results are reported on the stationary (breathing, Vowel-/i/) and nonstationary( counting data) data using individual and combination of features on each type of utterance. We find that the Formant information plays a crucial role in classification. The proposed system resulted in an AUC score of 0.734 for cross validation, and 0.717 for evaluation dataset. Â© Â© 2021 ISCA.
Deep Learning Model based Ki-67 Index estimation with Automatically Labelled Data
(Institute of Electrical and Electronics Engineers Inc., 2020) Lakshmi, S.; Sai Ritwik, K.V.; Vijayasenan, D.; Sumam David, S.; Sreeram, S.; Suresh, P.K.
Ki-67 labelling index is a biomarker which is used across the world to predict the aggressiveness of cancer. To compute the Ki-67 index, pathologists normally count the tumour nuclei from the slide images manually; hence it is timeconsuming and is subject to inter pathologist variability. With the development of image processing and machine learning, many methods have been introduced for automatic Ki-67 estimation. But most of them require manual annotations and are restricted to one type of cancer. In this work, we propose a pooled Otsu's method to generate labels and train a semantic segmentation deep neural network (DNN). The output is postprocessed to find the Ki-67 index. Evaluation of two different types of cancer (bladder and breast cancer) results in a mean absolute error of 3.52%. The performance of the DNN trained with automatic labels is better than DNN trained with ground truth by an absolute value of 1.25%. Â© 2020 IEEE.
A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
(2019) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.
Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files. � 2019 IEEE.
Estimating multiple physical parameters from speech data
(2016) Kalluri, S.B.; Vijayakumar, A.; Vijayasenan, D.; Singh, R.
In this work, we explore prediction of different physical parameters from speech data. We aim to predict shoulder size and waist size of people from speech data in addition to the conventional height and weight parameters. A data-set with this information is created from 207 volunteers. A bag of words representation based on log magnitude spectrum is used as features. A support vector regression predicts the physical parameters from the bag of the words representation. The system is able to achieve a root mean square error of 6.6 cm for height estimation, 2.6cm for shoulder size, 7.1cm for waist size and 8.9 kg for weight estimation. The results of height estimation is on par with state of the art results. � 2016 IEEE.
Estimating multiple physical parameters from speech data
(IEEE Computer Society help@computer.org, 2016) Kalluri, S.B.; Vijayakumar, A.; Vijayasenan, D.; Singh, R.
In this work, we explore prediction of different physical parameters from speech data. We aim to predict shoulder size and waist size of people from speech data in addition to the conventional height and weight parameters. A data-set with this information is created from 207 volunteers. A bag of words representation based on log magnitude spectrum is used as features. A support vector regression predicts the physical parameters from the bag of the words representation. The system is able to achieve a root mean square error of 6.6 cm for height estimation, 2.6cm for shoulder size, 7.1cm for waist size and 8.9 kg for weight estimation. The results of height estimation is on par with state of the art results. Â© 2016 IEEE.
Frequency contour modeling to synthesize natural flute renditions for carnatic music
(2018) Ashtamoorthy, A.; Prasad, P.; Dhar, S.; Vijayasenan, D.
Hidden Markov Models used for computer music synthesis do not satisfactorily reproduce Indian Carnatic music and also require large training datasets. The essence of Indian Carnatic music is its micro-tonal frequency variations called Gamakas. In this work, we study the flute note properties, features that characterize the Gamakas, and hence attempt to devise a generalized method for synthesizing Carnatic music flute compositions. Our method uses additive sinusoidal synthesis coupled with a stochastic noise model. In time domain, splines are used to model the amplitude envelope to ensure a natural reconstruction. Integrated frequency contours are used for smooth concatenation of notes and modelling of Gamakas and notes. In order to evaluate our synthesis, we use a Mean Opinion Score (MOS) survey to compare our results with the baseline and the original recordings. The MOS of the proposed method is around 3.5 while the baseline is 2.3. � 2018 IEEE.