Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
15 results
Search Results
Item Novel hybrid feature selection models for unsupervised document categorization(Institute of Electrical and Electronics Engineers Inc., 2017) Bhopale, A.P.; Kamath S․, S.Dealing with high dimensional data is a challenging and computationally complex task in the data pre-processing phase of text clustering. Conventionally, union and intersection approaches have been used to combine results of different feature selection methods to optimize relevant feature space for document collection. Union method selects all features from considered sub-models, whereas, intersection method selects only common features identified by sub-models. However, in reality, any type of feature selection can cause a loss of some potentially important features. In this paper, a hybrid feature selection model called Modified Hybrid Union (MHU) is proposed, which selects features by considering the individual strengths and weaknesses of each constituent component of the model. A comparative evaluation of its performance for K-means clustering and Bio-inspired Flockbased clustering is also presented on standard data sets such as OWL-S TC and Reuters-21578. © 2017 IEEE.Item Optimal Selection of Bands for Hyperspectral Images Using Spectral Clustering(Springer Verlag service@springer.de, 2019) Gupta, V.; Gupta, S.K.; Shukla, D.P.High spectral resolution of hyperspectral images comes hand in hand with high data redundancy (i.e. multiple bands carrying similar information), which further contributes to high computational costs, complexity and data storage. Hence, in this work, we aim at performing dimensionality reduction by selection of non-redundant bands from hyperspectral image of Indian Pines using spectral clustering. We represent the dataset in the form of similarity graphs computed from metrics such as Euclidean, and Tanimoto Similarity using K-Nearest neighbor method. The optimum k for our dataset is identified using methods like Distribution Compactness (DC) algorithm, elbow plot, histogram and visual inspection of the similarity graphs. These methods give us a range for the optimum value of k. The exact value of clusters k is estimated using Silhouette, Calinski-Harbasz, Dunn’s and Davies-Bouldin Index. The value indicated by majority of indices is chosen as value of k. Finally, we have selected the bands closest to the centroids of the clusters, computed by using K-means algorithm. Tanimoto similarity suggests 17 bands out of 220 bands, whereas the Euclidean metric suggests 15 bands for the same. The accuracy of classified image before band selection using support vector machine (SVM) classifier is 76.94% and after band selection is 75.21% & 75.56% for Tanimoto and Euclidean matrices respectively. © 2019, Springer Nature Singapore Pte Ltd.Item Performance evaluation of dimensionality reduction techniques on high dimensional data(Institute of Electrical and Electronics Engineers Inc., 2019) Vikram, M.; Pavan, R.; Dineshbhai, N.D.; Mohan, B.R.With a large amount of data being generated each day, the task on analyzing and making inferences from data is becoming an increasingly challenging task. One of the major challenges is the curse of dimensionality which is dealt with by using several popular dimensionality reduction techniques such as ICA, PCA, NMF etc. In this work, we make a systematic performance evaluation of the efficiency and effectiveness of various dimensionality reduction techniques. We present a rigorous evaluation of various techniques benchmarked on real-world datasets. This work is intended to assist data science practitioners to select the most suitable dimensionality reduction technique based on the trade-off between the corresponding effectiveness and efficiency. ©2019 IEEE.Item Quality assessment of dimensionality reduction techniques on hyperspectral data: A neural network based approach(International Society for Photogrammetry and Remote Sensing, 2020) C, C.; Shetty, A.; Narasimhadhan, A.V.Dimensionality reduction of hyperspectral images plays a vital role in remote sensing data analysis. The rapid advances in hyperspectral remote sensing has brought in a lot of opportunities to researchers to come up with advanced algorithms to analyse such voluminous data to better explore earth surface features. Modern machine learning algorithms can be applied to explore the underlying structure of high dimensional hyperspectral data and reduce the redundant information through feature extraction techniques. Limited studies have been carried out on dimensionality reduction for mineral exploration. The current study mainly focuses on the application of autoencoders for dimensionality reduction and provides a qualitative (visual) analysis of the obtained representations. The performance of autoencoders are investigated on Cuprite scene. Coranking matrix is used as evaluation criteria. From the obtained results it is evident that, deep autoencoders provide better results compared to single layer autoencoders. An increase in the number of hidden layers provides a better embedding. The neighborhood size K ≥ 40 of deep autoencoders provides a better transformation compared to autoencoders which shows an improved embedding only after K ≥ 80. © 2020 International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives.Item Clustering Enhanced Encoder–Decoder Approach to Dimensionality Reduction and Encryption(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2021) Mukesh, B.R.; Madhumitha, N.; Aditya, N.P.; Vivek, S.; Anand Kumar, M.Dimensionality reduction refers to reducing the number of attributes that are being considered, by producing a set of principal variables. It can be divided into feature selection and feature extraction. Dimensionality reduction serves as one of the preliminary challenges in storage management and is useful for effective transmission over the Internet. In this paper, we propose a deep learning approach using encoder–decoder networks for effective (almost-lossless) compression and encryption. The neural network essentially encrypts data into an encoded format which can only be decrypted using the corresponding decoders. Clustering is essential to reduce the variation in the dataset to ensure overfit. Using clustering resulted in a net gain of 1% over the standard encoder architecture over three MNIST datasets. The compression ratio achieved was 24.6:1. The usage of image datasets is for visualization only and the proposed pipeline could be applied for textual and visual data as well. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Predicting Survival of People with Heart Failure Using Oversampling, Feature Selections and Dimensionality Reduction(Institute of Electrical and Electronics Engineers Inc., 2022) Niharika, G.; Lekha, A.I.; Leela Akshaya, T.; Anand Kumar, A.M.Cardiovascular diseases are deadly and kill millions of people around the world every year. Heart failure is one of the unfortunate consequence where the heart is unable to pump enough blood for the body. A medical checkup of these patients with attributes including creatinine phosphokinase, ejection fraction, serum creatinine and serum sodium can be used for analysis. In this paper, we have analysed this clinical data and built machine learning models that can predict the survival rate of heart failure of a person. We have used various dimensionality reduction techniques to analyse the data with the aim of reducing the dimensions of the dataset. Finally, we reduced the overfitting of data using Synthetic Minority Oversampling Technique(SMOTE) and Adaptive Synthetic(ADASYN). © 2022 IEEE.Item Performance evaluation of dimensionality reduction techniques on hyperspectral data for mineral exploration(Springer Science and Business Media Deutschland GmbH, 2023) C, D.; Shetty, A.; Narasimhadhan, A.V.With recent advances in hardware and wide range of applications, hyperspectral remote sensing proves to be a promising technology for analysing terrain. However, the sheer volume of bands, strong inter band correlation and redundant information makes interpretation of hyperspectral data a tedious task. Aforementioned issues can be addressed to a considerable extent by reducing the dimensionality of hyperspectral data. Though plethora of algorithms exist to downsize hyperspectral data, quality assessment of these techniques remains unanswered. Since Dimensionality Reduction (DR) is a special case of unsupervised learning, classification accuracy cannot be directly used to compare the performance of different dimensionality reduction techniques. As a consequence, a different type of goodness measure is essential which is expected to be easily interpretable, robust against outliers and applicable to most algorithms and datasets. In this paper, fifteen popular dimensionality reduction algorithms are reviewed, evaluated and compared on hyperspectral dataset for mineral exploration. The performance of various DR algorithms is tested on hyperspectral mineral data since the extensive study of DR for mineral mapping is scarce compared to land cover mapping. Also, DR techniques are evaluated based on coranking criteria which is independent of label information. This facilitates to demonstrate the robust technique for mineral mapping and also provides meaningful insight into topology preservation. These techniques play a vital role in mineral exploration since in field observation is expensive, time consuming and requires more man power. From experimental results it is evident that, deep autoencoders provide better embedding with a quality index value of 0.9938, when K = 120 compared to other existing nonlinear techniques. The conclusions presented are unique since previous studies have not evaluated the results qualitatively and comparison between conventional machine learning and deep learning algorithms is limited. © 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.Item Crop stage classification of hyperspectral data using unsupervised techniques(2013) Senthilnath, J.; Omkar, S.N.; Mani, V.; Karnwal, N.; Shreyas, P.B.The presence of a large number of spectral bands in the hyperspectral images increases the capability to distinguish between various physical structures. However, they suffer from the high dimensionality of the data. Hence, the processing of hyperspectral images is applied in two stages: dimensionality reduction and unsupervised classification techniques. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The selected dimensions are classified using Niche Hierarchical Artificial Immune System (NHAIS). The NHAIS combines the splitting method to search for the optimal cluster centers using niching procedure and the merging method is used to group the data points based on majority voting. Results are presented for two hyperspectral images namely EO-1 Hyperion image and Indian pines image. A performance comparison of this proposed hierarchical clustering algorithm with the earlier three unsupervised algorithms is presented. From the results obtained, we deduce that the NHAIS is efficient. © 2008-2012 IEEE.Item Performance prediction of data streams on high-performance architecture(Springer Berlin Heidelberg, 2019) Gautam, B.; Annappa, A.Worldwide sensor streams are expanding continuously with unbounded velocity in volume, and for this acceleration, there is an adaptation of large stream data processing system from the homogeneous to rack-scale architecture which makes serious concern in the domain of workload optimization, scheduling, and resource management algorithms. Our proposed framework is based on providing architecture independent performance prediction model to enable resource adaptive distributed stream data processing platform. It is comprised of seven pre-defined domain for dynamic data stream metrics including a self-driven model which tries to fit these metrics using ridge regularization regression algorithm. Another significant contribution lies in fully-automated performance prediction model inherited from the state-of-the-art distributed data management system for distributed stream processing systems using Gaussian processes regression that cluster metrics with the help of dimensionality reduction algorithm. We implemented its base on Apache Heron and evaluated with proposed Benchmark Suite comprising of five domain-specific topologies. To assess the proposed methodologies, we forcefully ingest tuple skewness among the benchmarking topologies to set up the ground truth for predictions and found that accuracy of predicting the performance of data streams increased up to 80.62% from 66.36% along with the reduction of error from 37.14 to 16.06%. © 2019, The Author(s).Item A metaheuristic framework based automated Spatial-Spectral graph for land cover classification from multispectral and hyperspectral satellite images(Elsevier B.V., 2020) Suresh, S.; Lal, S.Land cover classification of satellite images has been a very predominant area since the last few years. An increase in the amount of information acquired by satellite imaging systems, urges the need for automatic tools for classification. Satellite images exhibit spatial and/or temporal dependencies in which the conventional machine learning algorithms fail to perform well. In this paper, we propose an improved framework for automated land cover classification using Spatial Spectral Schroedinger Eigenmaps (SSSE) optimized by Cuckoo Search (CS) algorithm. Support Vector Machine (SVM) is adopted for the final thematic map generation following dimensionality reduction and clustering by the proposed approach. The novelty of the proposed framework is that the applicability of optimized SSSE for land cover classification of medium and high resolution multi-spectral satellite images is tested for the first time. The proposed method makes land cover classification system fully automatic by optimizing the algorithm specific image dependent parameter ? using CS algorithm. Experiments are carried out over publicly available high and medium resolution multi-spectral satellite image datasets (Landsat 5 TM and IKONOS 2 MS) and hyper-spectral satellite image datasets (Pavia University and Indian Pines) to assess the robustness of the proposed approach. Performance comparisons of the proposed method against state-of-the-art multi-spectral and hyper-spectral land cover classification methods reveal the efficiency of the proposed method. © 2020 Elsevier B.V.
