Conference Papers

Now showing 1 - 4 of 4

Efficient audio segmentation in soccer videos
(Institute of Electrical and Electronics Engineers Inc., 2016) Raghuram, M.A.; Chavan, N.R.; Koolagudi, S.G.; Ramteke, P.B.
Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses on the audio segmentation in broadcast soccer videos into audio classes such as Silence, Speech Only, Speech Over Crowd, Crowd Only and Excited, with an alternative feature set which is simplistic as well as robust to changes in the environment conditions. Support Vector Machines (SVMs), Neural Networks and Random Forest are used for the classification. The accuracy achieved with SVMs, Neural Networks and Random Forest are 83.80%, 86.07%, and 88.35% respectively. The proposed features and Random Forest classifier are found to achieve better accuracy compared to the other classifiers. Â© 2016 IEEE.
Damage identification and assessment using image processing on post-disaster satellite imagery
(Institute of Electrical and Electronics Engineers Inc., 2017) Joshi, A.R.; Tarte, I.; Suresh, S.; Koolagudi, S.G.
Natural disasters such as earthquakes and tsunamis often have a devastating effect on human life and cause noticeable damage to infrastructure. Active research has been ongoing to mitigate the impact of these catastrophes and preclude the economic losses. The existing methods that utilize pre-event and post-event images not only require the immediate and guaranteed availability of the appropriate data set but are also encumbered by manual mapping of the images, necessitating the indication of corresponding control points in the two images. This paper highlights the use of only post-event imagery in the absence of reference data to achieve a more timely delivery to produce damage maps as the output. This eliminates the need for manual georeferencing of images. Our method incorporates simple linear iterative clustering (SLIC) for segmenting the images into uniform superpixels and extraction of 62 features for each superpixel. We used various classifiers of which Random Forest classifier was found to give a comparatively high accuracy of 90.4% over others. To enumerate the accuracy of the method proposed, we used 1500 data regions of which 20% were used for testing, and 80% were used for training. The aerial images taken by GeoEye1 after the 2011 Christchurch earthquake and 2011 Japan earthquake and tsunami are utilized in this study to detect building damage. In the case of availability of ground truth, we compare the histograms of the pre- and post-imagery to quantify similarity as the SSD (Sum of Squared Distances) value and thus, our approach produces an assessment as an output map displaying the extent of damage in the area covered by each superpixel. We consider 6 levels of damage ranging from 1 to 6, where 1 signifies no damage, and 6, maximum damage. Â© 2017 IEEE.
Prediction of aesthetic elements in Karnatic music: A machine learning approach
(International Speech Communication Association publication@isca-speech.org 4 Rue des Fauvettes - Lous Tourils Baixas 66390, 2018) Rajan, M.; Vijayakumar, A.; Vijayasenan, D.
Gamakas, the embellishments and ornamentations used to enhance musical experience, are defining features of Karnatic Music (KM). The appropriateness of using gamaka is determined by aesthetics and is often developed by musicians with experience. Therefore, understanding and modeling gamaka is a significant bottleneck in applications like music synthesis, automatic accompaniment, etc. in the context of KM. To this end, we propose to learn both the presence and the type of gamaka in a data-driven manner using annotated symbolic music. In particular, we explore the efficacy of three classes of features - note-based, phonetic and structural - and train a Random Forest Classifier to predict the existence and the type of gamaka. The observed accuracy is âˆ¼70% for gamaka detection and âˆ¼60% for gamaka classification. Finally, we present an analysis of the features and find that frequency and duration of the neighbouring notes prove to be the most important features. Â© 2018 International Speech Communication Association. All rights reserved.
Non-Invasive Detection of Anemia Using Deep Learning on Conjunctival Images
(Institute of Electrical and Electronics Engineers Inc., 2025) Kedar, D.S.; Pandey, G.; Koolagudi, S.G.
Anemia, characterized by low levels of red blood cells or hemoglobin, affects millions worldwide, significantly affecting public health. Traditional diagnostic methods, while effective, are invasive, costly, and inaccessible in resource-constrained settings. This paper proposes a non-invasive approach for anemia detection using conjunctival images analyzed through deep learning techniques. The proposed methodology involves capturing high-resolution conjunctival images, pre-processing them, and using a customized Convolutional Neural Network (CNN) for feature extraction and classification. The results achieved by the customized CNN fine-tuned with a batch size of 16 give an Accuracy of 96%, Precision of 95%, Recall of 96%, and ROC-AUC score of 0.99. The customized CNN outperformed the other models for this work, such as Random Forest, XGBoost, SVM, ResNet50, and MobileNetV2. This work highlights the potential for non-invasive diagnostic tools to improve accessibility and efficiency in healthcare, particularly for underserved populations. The findings endorse integrating deep learning in healthcare as a transformative approach to address global challenges such as anemia. Â© 2025 IEEE.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results