Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 9 of 9
  • Item
    Food classification from images using convolutional neural networks
    (Institute of Electrical and Electronics Engineers Inc., 2017) Attokaren, D.J.; Fernandes, I.G.; Sriram, A.; Vishnu Srinivasa Murthy, Y.V.; Koolagudi, S.G.
    The process of identifying food items from an image is quite an interesting field with various applications. Since food monitoring plays a leading role in health-related problems, it is becoming more essential in our day-to-day lives. In this paper, an approach has been presented to classify images of food using convolutional neural networks. Unlike the traditional artificial neural networks, convolutional neural networks have the capability of estimating the score function directly from image pixels. A 2D convolution layer has been utilised which creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. There are multiple such layers, and the outputs are concatenated at parts to form the final tensor of outputs. We also use the Max-Pooling function for the data, and the features extracted from this function are used to train the network. An accuracy of 86.97% for the classes of the FOOD-101 dataset is recognised using the proposed implementation. © 2017 IEEE.
  • Item
    Efficient Traffic Signboard Recognition System Using Convolutional Networks
    (Springer, 2020) Mothukuri, S.K.P.; Tejas, R.; Patil, S.; Darshan, V.; Koolagudi, S.G.
    In this paper, a smart automatic traffic sign recognition system is proposed. This signboard recognition system plays a vital role in the automated driving system of transport vehicles. The model is built based on convolutional neural network. The German Traffic Sign Detection Benchmark (GTSDB), a standard open-source segmented image dataset with forty-three different signboard classes is considered for experimentation. Implementation of the system is highly focused on processing speed and classification accuracy. These aspects are concentrated, such that the built model is suitable for real-time automated driving systems. Similar experiments are carried in comparison with the pre-trained convolution models. The performance of the proposed model is better in the aspects of fast responsive time. © Springer Nature Singapore Pte Ltd. 2020.
  • Item
    Kannada Dialect Classification Using CNN
    (Springer Science and Business Media Deutschland GmbH, 2020) Hegde, P.; Chittaragi, N.B.; Mothukuri, S.K.P.; Koolagudi, S.G.
    Kannada is one of the prominent languages spoken in southern India. Since the Kannada is a lingua franca and spoken by more than 70 million people, it is evident to have dialects. In this paper, we identified five major dialectal regions in Karnataka state. An attempt is made to classify these five dialects from sentence-level utterances. Sentences are segmented from continuous speech automatically by using spectral centroid and short term energy features. Mel frequency cepstral coefficient (MFCC) features are extracted from these sentence units. These features are used to train the convolutional neural networks (CNN). Along with MFCCs, shifted delta and double delta coefficients are also attempted to train the CNN model. The proposed CNN based dialect recognition system is also tested with internationally known standard Intonation Variation in English (IViE) dataset. The CNN model has resulted in better performance. It is observed that the use of one convolution layer and three fully connected layers balances computational complexity and results in better accuracy with both Kannada and English datasets. © 2020, Springer Nature Switzerland AG.
  • Item
    Singer identification for Indian singers using convolutional neural networks
    (Springer, 2021) Vishnu Srinivasa Murthy, Y.V.S.; Koolagudi, S.G.; Jeshventh Raja, T.K.
    Singer identification is one of the important aspects of music information retrieval (MIR). In this work, traditional feature-based and trending convolutional neural network (CNN) based approaches are considered and compared for identifying singers. Two different datasets, namely artist20 and the Indian popular singers’ database with 20 singers are used in this work to evaluate proposed approaches. Cepstral features such as Mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs) are considered to represent timbre information. Shifted delta cepstral (SDC) features are also computed beside the cepstral coefficients to capture temporal information. In addition, chroma features are computed from 12 semitones of a musical octave, overall forming a 46-dimensional feature vector. Experiments are conducted with different feature combinations, and suitable features are selected using the genetic algorithm-based feature selection (GAFS) approach. Two different classification techniques, namely artificial neural networks (ANNs) and random forest (RF), are considered on the features mentioned above. Further, spectrograms and chromagrams of audio clips are directly fed to CNN for classification. The singer identification results obtained using CNNs seem to be better than the traditional isolated and ensemble classifiers. Average accuracy of around 75% is observed with CNN in the case of Indian popular singers database. Whereas, on artist20 dataset, the proposed configuration of feature-based approach and CNN could not give better than 60% accuracy. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Acoustic scene classification using projection Kervolutional neural network
    (Springer, 2023) Mulimani, M.; Nandi, R.; Koolagudi, S.G.
    In this paper, a novel Projection Kervolutional Neural Network (ProKNN) is proposed for Acoustic Scene Classification (ASC). ProKNN is a combination of two special filters known as the left and right projection layers and Kervolutional Neural Network (KNN). KNN replaces the linearity of the Convolutional Neural Network (CNN) with a non-linear polynomial kernel. We extend the ProKNN to learn from the features of two channels of audio recordings in the initial stage. The performance of the ProKNN is evaluated on the two publicly available datasets: TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets. Results show that the proposed ProKNN outperforms the existing systems with an absolute improvement of accuracy of 8% and 14% on TUT Urban Acoustic Scenes 2018 and TUT Urban Acoustic Scenes Mobile 2018 development datasets respectively, as compared to the baseline model of Detection and Classification of Acoustic Scene and Events (DCASE) - 2018 challenge. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model
    (Birkhauser, 2024) Spoorthy, V.; Koolagudi, S.G.
    Identifying a scene based on the environment in which the related audio is recorded is known as acoustic scene classification (ASC). In this paper, a bi-level light-weight Convolutional Neural Network (CNN)-based model is presented to perform ASC. The proposed approach performs classification in two levels. The scenes are classified into three broad categories in the first level as indoor, outdoor, and transportation scenes. The three classes are further categorized into individual scenes in the second level. The proposed approach is implemented using three features: log Mel band energies, harmonic spectrograms and percussive spectrograms. To perform the classification, three CNN classifiers, namely, MobileNetV2, Squeeze-and-Excitation Net (SENet), and a combination of these two architectures, known as SE-MobileNet are used. The proposed combined model encashes the advantages of both MobileNetV2 and SENet architectures. Extensive experiments are conducted on DCASE 2020 (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development and DCASE 2016 ASC datasets. The proposed SE-MobileNet model resulted in a classification accuracy of 96.9% and 86.6% for the first and second levels, respectively, on DCASE 2020 dataset, and 97.6% and 88.4%, respectively, on DCASE 2016 dataset. The proposed model is reported to be better in terms of both complexity and accuracy as compared to the state-of-the-art low-complexity ASC systems. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
  • Item
    Polyphonic Sound Event Detection Using Mel-Pseudo Constant Q-Transform and Deep Neural Network
    (Taylor and Francis Ltd., 2024) Spoorthy, V.; Koolagudi, S.G.
    The task of identification of sound events in a particular surrounding is known as Sound Event Detection (SED) or Acoustic Event Detection (AED). The occurrence of sound events is unstructured and also displays wide variations in both temporal structure and frequency content. Sound events may be non-overlapped (monophonic) or overlapped (polyphonic) in nature. In real-time scenarios, polyphonic SED is most commonly seen as compared to monophonic SED. In this paper, a Mel-Pseudo Constant Q-Transform (MP-CQT) technique is introduced to perform polyphonic SED to effectively learn both monophonic and polyphonic sound events. A pseudo CQT technique is adapted to extract features from the audio files and their Mel spectrograms. The Mel-scale is believed to broadly simulate human perception system. The classifier used is a Convolutional Recurrent Neural Network (CRNN). Comparison of the performance of the proposed MP-CQT technique along with CRNN is presented and a considerable performance improvement is observed. The proposed method achieved an average error rate of 0.684 and average F1 score of 52.3%. The proposed approach is also analyzed for the robustness by adding an additional noise at different Signal to Noise Ratios (SNRs) to the audio files. The proposed method for SED task has displayed improved performance as compared to state-of-the-art SED systems. The introduction of new feature extraction technique has shown promising improvement in the performance of the polyphonic SED system. © 2024 IETE.
  • Item
    MICAnet: A Deep Convolutional Neural Network for mineral identification on Martian surface
    (Elsevier B.V., 2024) Kumari, P.; Soor, S.; Shetty, A.; Koolagudi, S.G.
    Mineral identification plays a vital role in understanding the diversity and past habitability of the Martian surface. Mineral mapping by the traditional manual method is time-consuming and the unavailability of ground truth data limited the research on building supervised learning models. To address this issue an augmentation process is already proposed in the literature that generates training data replicating the spectra in the MICA (Minerals Identified in CRISM Analysis) spectral library while preserving absorption signatures and introducing variability. This study introduces MICAnet, a specialized Deep Convolutional Neural Network (DCNN) architecture for mineral identification using the CRISM (Compact Reconnaissance Imaging Spectrometer for Mars) hyperspectral data. MICAnet is inspired by the Inception-v3 and InceptionResNet-v1 architectures, but it is tailored with 1-dimensional convolutions for processing the spectra at the pixel level of a hyperspectral image. To the best of the authors’ knowledge, this is the first DCNN architecture solely dedicated to mineral identification on the Martian surface. The model is evaluated by its matching with a TRDR (Targeted Reduced Data Record) dataset obtained using a hierarchical Bayesian model. The results demonstrate an impressive f-score of at least .77 among different mineral groups in the MICA library, which is on par with or better than the unsupervised models previously applied to this objective. © 2024
  • Item
    Video forgery localization using inter-frame denoising and intra-frame segmentation
    (Springer, 2025) Banerjee, D.; Chittaragi, N.B.; Koolagudi, S.G.
    Video forgery detection has been necessary with recent spurt in fake videos like Deepfakes and doctored videos from multiple video capturing devices. In this paper, we provide a novel technique of detecting fake videos by creating an ensemble network, based on statistical and deep learning methods to detect interframe forgery and intraframe forgery in forged videos separately. In this paper, Noise signature extraction of a particular image capturing sensor and an Autoencoder-based Convolutional Neural Network model (CNN) are used to localize the forged regions. We have trained the model to localize Deepfake video forgeries as well as copy-paste forgeries with effective results in the test data. The proposed fake video detector can be applied at the back-end of on-line video aggregating services and check their authenticity to verify the genuineness of videos. The results achieved have shown better performances in detecting fake videos compared to existing methods. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.