Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 8 of 8
  • Item
    Gender Identification from Children's Speech
    (Institute of Electrical and Electronics Engineers Inc., 2018) Ramteke, P.B.; Dixit, A.A.; Supanekar, S.; Dharwadkar, N.V.; Koolagudi, S.G.
    Children's speech can be characterized by higher pitch and format frequencies compared to the adult speech. Gender identification task from children's speech is difficult as there is no significant difference in the acoustic properties of male and female child. Here, an attempt has been made to explore the features efficient in discriminating the gender from children's speech. Different combinations of spectral features such as Mel-frequency cepstral coefficients (MFCCs), ΔMFCCs and ΔΔMFCCs, Formants, Linear predictive cepstral coefficients (LPCCs); Shimmer and Jitter; Prosodic features like pitch and its statistical variations along with Δpitch related features are explored. Features are evaluated using non linear classifiers namely Artificial Neural Network (ANNs), Deep Neural Network (DNNs) and Random Forest (RF). From the results it is observed that the RF achieves an highest accuracy of 84.79% amongst the other classifiers. © 2018 IEEE.
  • Item
    An Integrated Deep Learning Approach towards Automatic Evaluation of Ki-67 Labeling Index
    (Institute of Electrical and Electronics Engineers Inc., 2019) Lakshmi, S.; Vijayasenan, D.; Sumam David, S.; Sreeram, S.; Suresh, P.K.
    Ki-67 labeling index is a widely used biomarker for the diagnosis and monitoring of cancer. Many automated techniques have been proposed for evaluating Ki-67 index. In this paper, we introduce an integrated deep learning based approach. We use MobileUnet model for segmentation and classification and connected component based algorithm for the estimation of Ki-67 index in bladder cancer cases. The average F1 score is 0.92 and dice score is 0.96. The mean absolute error in the evaluated Ki-67 index is 2.1. We also explore possible pre-processing steps to generalize the segmentation model to at least one another type of cancer. Histogram matching and re-sizing improve the performance in breast cancer data by 12% in F1 score and 8% in dice score. © 2019 IEEE.
  • Item
    A Deep Learning Model for the Automatic Detection of Malignancy in Effusion Cytology
    (Institute of Electrical and Electronics Engineers Inc., 2020) Aboobacker, S.; Vijayasenan, D.; Sumam David, S.; Suresh, P.K.; Sreeram, S.
    The excessive accumulation of fluid between layers of pleura covering lungs is known as pleural effusion. Pleural effusion may be due to various infections, inflammations or malignancy. The cytologists visually examine the microscopic slide to detect the malignant cells. The process is time-consuming, and interpretation of reactive cells and cells with ambiguous levels of atypia may differ between pathologists. Considerable research is happening towards the automation of fluid cytology reporting. We propose an integrated approach based on deep learning, where the network learns directly to detect the malignant cells in effusion cytology images. Architecture U-Net is used to learn the malignant and benign cells from the images and to detect the images that contain malignant cells. The model gives a precision of 0.96, recall of 0.96, and specificity of 0.97. The AUC of the ROC curve is 0.97. The model can be used as a screening tool and has a malignant cell detection rate of 0.96 with a low false alarm rate of 0.03. © 2020 IEEE.
  • Item
    Loss Optimised Video Captioning using Deep-LSTM, Attention Mechanism and Weighted Loss Metrices
    (Institute of Electrical and Electronics Engineers Inc., 2021) Yadav, N.; Naik, D.
    The aim of the video captioning task is to use multiple natural-language sentences to define video content. Photographic, graphical, and auditory data are all used in the videos. Our goal is to investigate and recognize the video's visual features, as well as to create a caption so that anyone can get the video's information within a second. Despite the fact, that phase encoder-decoder models have made significant progress, but it still needs many improvements. In the present work, we enhanced the top-down architecture using Bahdanau Attention, Deep-Long Short-Term Memory (Deep-LSTM) and weighted loss function. VGG16 is used to extract the features from the frames. To understand the actions in the video, Deep-LSTM is paired with an attention system. On the MSVD dataset, we analysed the efficiency of our model, which indicates a major improvement over the other state-of-art model. © 2021 IEEE.
  • Item
    Semantic Segmentation on Low Resolution Cytology Images of Pleural and Peritoneal Effusion
    (Institute of Electrical and Electronics Engineers Inc., 2022) Aboobacker, S.; Verma, A.; Vijayasenan, D.; Sumam David, S.; Suresh, P.K.; Sreeram, S.
    Automation in the detection of malignancy in effusion cytology helps to save time and workload for cytopathologists. Cytopathologists typically consider a low-resolution image to identify the malignant regions. The identified regions are scanned at a higher resolution to confirm malignancy by investigating the cell level behaviour. Scanning and processing time can be saved by zooming only the identified malignant regions instead of entire low-resolution images. This work predicts malignancy in cytology images at a very low resolution (4X). Annotation of cytology images at a very low resolution is challenging due to the blurring of features such as nuclei and texture. We address this issue by upsampling the very low-resolution images using adversarial training. This work develops a semantic segmentation model trained on 10X images and reuse the network to utilize the 4X images. The prediction results of low resolution images improved by 15% in average F-score for adversarial based upsampling compared to a bicubic filter. The high resolution model gives a 95% average F-score for high resolution images. Also, the sub-area of the whole slide that requires to be scanned at high magnification is reduced by approximately 61% while using adversarial based upsampling compared to a bicubic filter. © 2022 IEEE.
  • Item
    CNN-MFCC Model for Speaker Recognition using Emotive Speech
    (Institute of Electrical and Electronics Engineers Inc., 2023) Tomar, S.; Koolagudi, S.G.
    Finding the appropriate speaker using voice recognition is called "speaker recognition."Emotive Environment Speaker Recognition (EESR) identifies speakers using distinct emotional speech. A real-life situation that becomes a requirement for many applications is speaker recognition, which utilizes various moods. If there is no emotion in the conversation, speaker recognition algorithms work almost flawlessly. This work aims to improve the accuracy of text-dependent and emotional speaker recognition system in emotional speech contexts. The proposed method is developed using Mel-Frequency Cepstral Coefficient (MFCC) feature and the classifier considered is Convolutional Neural Networks (CNN) for various emotions. The suggested system's performance is assessed based on emotional datasets from the Kannada Language and Emotional Database (EmoDB). These emotions are present in both datasets: happy, sad, angry, fear, and neutral. Due to the complexity of emotions, speaker recognition in various emotional states is challenging. The proposed system offers an accuracy of 96.2% in the EmoDB and 97.8% in the Kannada dataset. The proposed method provides a high recognition rate for different emotions. © 2023 IEEE.
  • Item
    Automated Summarization of Gastrointestinal Endoscopy Video
    (Springer Science and Business Media Deutschland GmbH, 2023) Sushma, B.; Aparna., P.
    Gastrointestinal (GI) endoscopy enables many minimally invasive procedures for diagnosing diseases such as esophagitis, ulcer, polyps and cancers. Guided by the endoscope’s video sequence, a physician can diagnose the diseases and administer the treatment. Unfortunately, due to the huge amount of data generated, physicians are currently discarding procedural video and rely on a small number of carefully chosen images to record a procedure. In addition, when a patient seeks a second opinion, the assessment of lesions in a huge video stream necessitates a thorough examination, which is a time-consuming process that demands much attention. To reduce the length of the video stream, an automated method to generate the summary of endoscopy video recordings consisting only of abnormal frames by using deep convolutional neural networks trained to classify normal, abnormal and uninformative frames is proposed. Results show that our method can efficiently detect abnormal frames and is robust to the variations in the frames. The proposed CNN architecture outperforms the other classification models with an accuracy of 0.9698 with less number of parameters. © IFIP International Federation for Information Processing 2023.
  • Item
    Outlier Detection in Streaming Data Using Deep Learning Models
    (Institute of Electrical and Electronics Engineers Inc., 2024) Dudipala, S.; Gangavarapu, S.; Girish, K.K.; Bhowmik, B.
    In the realm of the Internet of Things (IoT), devices continuously generate a vast and relentless stream of data, providing a real-time representation of digital landscape. The continuous and high-velocity nature of this streaming data poses significant challenges for real-time analysis. Accurate outlier detection within this data is essential, as such anomalies may indicate critical issues, attacks, or errors. Nevertheless, the dynamic and rapidly evolving characteristics of streaming data render traditional outlier detection methods inadequate. This paper investigates the application of Artificial Neural Networks (ANNs), specifically a Multi-Layer Perceptron (MLP), for outlier detection in streaming IoT data. The selection of the MLP from a range of Deep Neural Networks (DNNs) is based on its optimal balance between computational efficiency and model complexity. The model's efficacy is confirmed through rigorous experimentation, demonstrating strong performance across diverse scenarios and data classes. The MLP achieved an accuracy of 99.4%, underscoring its ability to detect even minor deviations from expected patterns. This high level of accuracy establishes the MLP as a robust tool for outlier detection in dynamic IoT environments. © 2024 IEEE.