Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
6 results
Search Results
Item Novel hybrid feature selection models for unsupervised document categorization(Institute of Electrical and Electronics Engineers Inc., 2017) Bhopale, A.P.; Kamath S․, S.Dealing with high dimensional data is a challenging and computationally complex task in the data pre-processing phase of text clustering. Conventionally, union and intersection approaches have been used to combine results of different feature selection methods to optimize relevant feature space for document collection. Union method selects all features from considered sub-models, whereas, intersection method selects only common features identified by sub-models. However, in reality, any type of feature selection can cause a loss of some potentially important features. In this paper, a hybrid feature selection model called Modified Hybrid Union (MHU) is proposed, which selects features by considering the individual strengths and weaknesses of each constituent component of the model. A comparative evaluation of its performance for K-means clustering and Bio-inspired Flockbased clustering is also presented on standard data sets such as OWL-S TC and Reuters-21578. © 2017 IEEE.Item Optimal Selection of Bands for Hyperspectral Images Using Spectral Clustering(Springer Verlag service@springer.de, 2019) Gupta, V.; Gupta, S.K.; Shukla, D.P.High spectral resolution of hyperspectral images comes hand in hand with high data redundancy (i.e. multiple bands carrying similar information), which further contributes to high computational costs, complexity and data storage. Hence, in this work, we aim at performing dimensionality reduction by selection of non-redundant bands from hyperspectral image of Indian Pines using spectral clustering. We represent the dataset in the form of similarity graphs computed from metrics such as Euclidean, and Tanimoto Similarity using K-Nearest neighbor method. The optimum k for our dataset is identified using methods like Distribution Compactness (DC) algorithm, elbow plot, histogram and visual inspection of the similarity graphs. These methods give us a range for the optimum value of k. The exact value of clusters k is estimated using Silhouette, Calinski-Harbasz, Dunn’s and Davies-Bouldin Index. The value indicated by majority of indices is chosen as value of k. Finally, we have selected the bands closest to the centroids of the clusters, computed by using K-means algorithm. Tanimoto similarity suggests 17 bands out of 220 bands, whereas the Euclidean metric suggests 15 bands for the same. The accuracy of classified image before band selection using support vector machine (SVM) classifier is 76.94% and after band selection is 75.21% & 75.56% for Tanimoto and Euclidean matrices respectively. © 2019, Springer Nature Singapore Pte Ltd.Item Performance evaluation of dimensionality reduction techniques on high dimensional data(Institute of Electrical and Electronics Engineers Inc., 2019) Vikram, M.; Pavan, R.; Dineshbhai, N.D.; Mohan, B.R.With a large amount of data being generated each day, the task on analyzing and making inferences from data is becoming an increasingly challenging task. One of the major challenges is the curse of dimensionality which is dealt with by using several popular dimensionality reduction techniques such as ICA, PCA, NMF etc. In this work, we make a systematic performance evaluation of the efficiency and effectiveness of various dimensionality reduction techniques. We present a rigorous evaluation of various techniques benchmarked on real-world datasets. This work is intended to assist data science practitioners to select the most suitable dimensionality reduction technique based on the trade-off between the corresponding effectiveness and efficiency. ©2019 IEEE.Item Quality assessment of dimensionality reduction techniques on hyperspectral data: A neural network based approach(International Society for Photogrammetry and Remote Sensing, 2020) C, C.; Shetty, A.; Narasimhadhan, A.V.Dimensionality reduction of hyperspectral images plays a vital role in remote sensing data analysis. The rapid advances in hyperspectral remote sensing has brought in a lot of opportunities to researchers to come up with advanced algorithms to analyse such voluminous data to better explore earth surface features. Modern machine learning algorithms can be applied to explore the underlying structure of high dimensional hyperspectral data and reduce the redundant information through feature extraction techniques. Limited studies have been carried out on dimensionality reduction for mineral exploration. The current study mainly focuses on the application of autoencoders for dimensionality reduction and provides a qualitative (visual) analysis of the obtained representations. The performance of autoencoders are investigated on Cuprite scene. Coranking matrix is used as evaluation criteria. From the obtained results it is evident that, deep autoencoders provide better results compared to single layer autoencoders. An increase in the number of hidden layers provides a better embedding. The neighborhood size K ≥ 40 of deep autoencoders provides a better transformation compared to autoencoders which shows an improved embedding only after K ≥ 80. © 2020 International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences - ISPRS Archives.Item Clustering Enhanced Encoder–Decoder Approach to Dimensionality Reduction and Encryption(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2021) Mukesh, B.R.; Madhumitha, N.; Aditya, N.P.; Vivek, S.; Anand Kumar, M.Dimensionality reduction refers to reducing the number of attributes that are being considered, by producing a set of principal variables. It can be divided into feature selection and feature extraction. Dimensionality reduction serves as one of the preliminary challenges in storage management and is useful for effective transmission over the Internet. In this paper, we propose a deep learning approach using encoder–decoder networks for effective (almost-lossless) compression and encryption. The neural network essentially encrypts data into an encoded format which can only be decrypted using the corresponding decoders. Clustering is essential to reduce the variation in the dataset to ensure overfit. Using clustering resulted in a net gain of 1% over the standard encoder architecture over three MNIST datasets. The compression ratio achieved was 24.6:1. The usage of image datasets is for visualization only and the proposed pipeline could be applied for textual and visual data as well. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item Predicting Survival of People with Heart Failure Using Oversampling, Feature Selections and Dimensionality Reduction(Institute of Electrical and Electronics Engineers Inc., 2022) Niharika, G.; Lekha, A.I.; Leela Akshaya, T.; Anand Kumar, A.M.Cardiovascular diseases are deadly and kill millions of people around the world every year. Heart failure is one of the unfortunate consequence where the heart is unable to pump enough blood for the body. A medical checkup of these patients with attributes including creatinine phosphokinase, ejection fraction, serum creatinine and serum sodium can be used for analysis. In this paper, we have analysed this clinical data and built machine learning models that can predict the survival rate of heart failure of a person. We have used various dimensionality reduction techniques to analyse the data with the aim of reducing the dimensions of the dataset. Finally, we reduced the overfitting of data using Synthetic Minority Oversampling Technique(SMOTE) and Adaptive Synthetic(ADASYN). © 2022 IEEE.
