Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 7 of 7
  • Item
    Speech enhancement using multiple deep neural networks
    (Institute of Electrical and Electronics Engineers Inc., 2018) Karjol, P.; Kumar, M.A.; Ghosh, P.K.
    In this work, we present a variant of multiple deep neural network (DNN) based speech enhancement method. We directly estimate clean speech spectrum as a weighted average of outputs from multiple DNNs. The weights are provided by a gating network. The multiple DNNs and the gating network are trained jointly. The objective function is set as the mean square logarithmic error between the target clean spectrum and the estimated spectrum. We conduct experiments using two and four DNNs using the TIMIT corpus with nine noise types (four seen noises and five unseen noises) taken from the AURORA database at four different signal-to-noise ratios (SNRs). We also compare the proposed method with a single DNN based speech enhancement scheme and existing multiple DNN schemes using segmental SNR, perceptual evaluation of speech quality (PESQ) and short-term objective intelligibility (STOI) as the evaluation metrics. These comparisons show the superiority of proposed method over baseline schemes in both seen and unseen noises. Specifically, we observe an absolute improvement of 0.07 and 0.04 in PESQ measure compared to single DNN when averaged over all noises and SNRs for seen and unseen noise cases respectively. © 2018 IEEE.
  • Item
    Singer Identification from Smaller Snippets of Audio Clips Using Acoustic Features and DNNs
    (Institute of Electrical and Electronics Engineers Inc., 2018) S Murthy, Y.V.; Jeshventh Raja, T.K.R.; Zoeb, M.; Saumyadip, M.; Koolagudi, S.G.
    Singer identification (SID) is one of the crucial tasks of music information retrieval (MIR). The presence of background accompaniment makes the task little complicated. The performance of SID with the combination of the cepstral and chromagram features has been analyzed in this work. Mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstral features (LPCCs) have been computed as cepstral features and added to 12-dimensional chroma vector which is obtained from chromagram. Two different datasets have been used for experimentation, of which one is standard artist-20 and the other one is Indian singers database, which is proposed by us, with 20 Indian singers. Two different classifiers, namely random forest (RF) and deep neural networks (DNNs) are considered based on their performance in estimating the singers. The proposed approach is found to be efficient even if the input clip is of length five seconds. © 2018 IEEE.
  • Item
    Explainable Deep Neural Models for COVID-19 Prediction from Chest X-Rays with Region of Interest Visualization
    (Institute of Electrical and Electronics Engineers Inc., 2021) Nedumkunnel, I.M.; Elizabeth George, L.; Kamath S․, S.S.; Rosh, N.A.; Mayya, V.
    COVID-19 has been designated as a once-in-a-century pandemic, and its impact is still being felt severely in many countries, due to the extensive human and green casualties. While several vaccines are under various stage of development, effective screening procedures that help detect the disease at early stages in a non-invasive and resource-optimized manner are the need of the hour. X-ray imaging is fairly accessible in most healthcare institutions and can prove useful in diagnosing this respiratory disease. Although a chest X-ray scan is a viable method to detect the presence of this disease, the scans must be analyzed by trained experts accurately and quickly if large numbers of tests are to be processed. In this paper, a benchmarking study of different preprocessing techniques and state-of-the-art deep learning models is presented to provide comprehensive insights into both the objective and subjective evaluation of their performance. To analyze and prevent possible sources of bias, we preprocessed the dataset in two ways-first, we segmented the lungs alone, and secondly, we formed a bounding box around the lung and used only this area to train. Among the models chosen to benchmark, which were DenseNet201, EfficientNetB7, and VGG-16, DenseNet201 performed better for all three datasets. © 2021 IEEE.
  • Item
    Sketch-Based Image Retrieval Using Convolutional Neural Networks Based on Feature Adaptation and Relevance Feedback
    (Springer Science and Business Media Deutschland GmbH, 2022) Kumar, N.; Ahmed, R.; B Honnakasturi, V.; Kamath S․, S.; Mayya, V.
    Sketch-based Image Retrieval (SBIR) is an approach where natural images are retrieved according to the given input sketch query. SBIR has many applications, for example, searching for a product given the sketch pattern in digital catalogs, searching for missing people given their prominent features from a digital people photo repository etc. The main challenge involved in implementing such a system is the absence of semantic information in the sketch query. In this work, we propose a combination of image prepossessing and deep learning-based methods to tackle this issue. A binary image highlighting the edges in the natural image is obtained using Canny-Edge detection algorithm. The deep features were extracted by an ImageNet based CNN model. Cosine similarity and Euclidean distance measures are adopted to generate the rank list of candidate natural images. Relevance feedback using Rocchio’s method is used to adapt the query of sketch images and feature weights according to relevant images and non-relevant images. During the experimental evaluation, the proposed approach achieved a Mean average precision (MAP) of 71.84%, promising performance in retrieving relevant images for the input query sketch images. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    Deep Neural Models for Early Diagnosis of Knee Osteoarthritis and Severity Grade Prediction
    (Springer Science and Business Media Deutschland GmbH, 2022) Shenoy, T.N.; Medayil, M.; Kamath S․, S.
    Osteoarthritis (OA) is a type of arthritis that results in malfunction and eventual loss of the cartilage of joints. It occurs when the cartilage that cushions the ends of the bones wear out. OA is the most common joint disease which frequently occurs after the age of 45 in case of males and 55 in the case of females. Manual detection of OA is a tedious and labour-intensive task and is performed by trained specialists. We propose a fully automated computer-aided diagnosis system to detect and grade osteoarthritis severity as per the Kellgren-Lawrence (KL) classification. In this paper, we experiment with various approaches for automated OA detection from X-ray images. Image-level information such as content descriptors and image transforms are identified and assigned weights using Fisher scores. KL-grade is then projected using weighted nearest neighbours, and different stages of OA severity are classified. Pre-processing, segmentation, and classification of the X-ray images are achieved using data augmentation, deep neural network, and residual neural networks. We present experimental results and discussion with respect to the best-performing models in our experiments. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
  • Item
    A Survey on Semantic Segmentation Models for Underwater Images
    (Springer, 2023) Anand, S.K.; Kumar, P.V.; Saji, R.; Gadagkar, A.V.; Chandavarkar, B.R.
    Semantic segmentation remains a key research field in modern day computer vision and has been used in a myriad of applications across various fields. It can be extremely beneficial in the study of underwater scenes. Various underwater applications, such as unmanned explorations and autonomous underwater vehicles, require accurate object classification and detection to allow the probes to avoid malicious objects. However, the models that work well for terrestrial images rarely work just as well for underwater images. This is because underwater images suffer from high blue light intensity as well as other ill effects such as poor lighting and contrast. This can be fixed using preprocessing techniques to manually improve the image characteristics. Trying to improve the model to account for bad image quality is not a great method as the model may misidentify noise as an image characteristic. In this chapter, 6 different deep learning semantic segmentation models—SegNet, Pyramid Scene Parsing Network (PSP-Net), U-Net, DNN-VGG (Deep Neural Network-VGG), DeepLabv3+, and SUIM-Net—are explored. Their architectures, technical aspects with respect to underwater images, advantages, and disadvantages are all investigated. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
  • Item
    Semantic Segmentation of Underwater Images with CNN Based Adaptive Thresholding
    (Springer Science and Business Media Deutschland GmbH, 2025) Anand, S.K.; Kumar, P.V.; Saji, R.; Gadagkar, A.V.; Chandavarkar, B.R.
    Semantic segmentation remains a key research field in modern day computer vision and has been used in a myriad of applications across various fields. It can be extremely beneficial in the study of underwater scenes. Various underwater applications, like unmanned explorations and autonomous underwater vehicles, require accurate object classification and detection to allow the probes to avoid malicious objects. However, the models which work well for terrestrial images rarely work just as well for underwater images. This is because underwater images suffer from high blue light intensity as well as other ill-effects such as poor lighting and contrast. Trying to improve the model to account for bad image quality is not a great method as the model may misidentify noise as an image characteristic. In this paper, a unique CNN-based approach for post-processing image thresholding is proposed, on top of 3 models used for the semantic segmentation itself–Segnet, U-Net, and Deeplabv3+. The models’ outputs are then subject to the CNN-based post-processing technique to binarize the outputs into masks, and provides improved segmentation results compared to the base models. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.