Browsing by Author "Patil, N."

Now showing 1 - 20 of 118

A Deep Learning Framework for Plant Disease Detection
(Springer Science and Business Media Deutschland GmbH, 2025) Munda, K.K.; Patil, N.
As a major source of nutritious food, the agriculture industry supports economies and feeds people. Yet, the production of food is severely hampered by plant diseases. Major crops like wheat (21.5%), rice (30.0%), maize (22.6%), potatoes (17.2%), and soybeans (21.4%) have significant annual output declines due to numerous diseases, according to recent studies. Since deep learning technologies have been developed, image categorization accuracy has increased dramatically. Using CNN and vision transformer models, we examine the Plant Village dataset in this study, which consists of 54,305 sample images that illustrate various plant disease species in 38 classifications. Using a focus on potato leaves and a total of 2151 samples, we evaluate the modelâ€™s performance in comparison to other models in terms of training and testing accuracy, and we obtained impressive results. The modelsâ€™ respective training accuracy is 97.27% for the CNN and 94.7% for the ViT model, while their validation accuracy is 100% and 94.27%. Â© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
A fast and novel approach based on grouping and weighted mRMR for feature selection and classification of protein sequence data
(Inderscience Publishers, 2020) Kaur, K.; Patil, N.
The analysis of protein sequences under bioinformatics has gained wide importance in research area. Newly added protein sequences can be analysed using existing proteins and converting them into feature vector form. However, it emerges as a challenging task to deal with huge number of features obtained using sequence encoding techniques. Since all the features obtained are not actually required, a three-stage feature selection approach has been proposed. In the first stage, features are ranked and most irrelevant features are removed; in the second stage, conflicting features are grouped together; and in third stage, a fast approach based on weighted Minimum Redundancy Maximum Relevance (wMRMR) has been proposed and applied on grouped features. Different classification methods are used to analyse the performance of the proposed approach. It is observed that the proposed approach has increased classification accuracy results and reduced time consumption in comparison to the state-of-the-art methods. © 2020 Inderscience Enterprises Ltd.
A Hybrid Lead Scoring-BiGRU Model for Extractive Summarization of News Articles
(Springer, 2025) Rosamma, K.S.; Patil, N.
With the rapid expansion of digital content, efficient and accurate text summarization methods are essential to condense information effectively. Traditional extractive summarization approaches often fail to capture the most relevant sentences because they rely on simple heuristics. This research introduces a more advanced summarization model that integrates lead scoring with a Bidirectional Gated Recurrent Unit (BiGRU) network. The proposed Hybrid Lead Scoring BiGRU Model leverages the initial relevance of lead sentences while enhancing context comprehension through the BiGRU architecture. Despite advancements in text summarization, existing methods struggle to maintain contextual coherence and accurately identify key sentences. To address these challenges, our model combines the strengths of lead scoring, which selects the first five sentences of an article, with deep learning techniques. The BiGRU then processes these selections bi-directionally to capture dependencies from both past and future contexts, ultimately selecting the top three sentences for the summary. The model was evaluated using the CNN/Daily Mail dataset and showed promising results. During training, the model achieved the best validation loss of 0.4442, with an early stopping mechanism preventing overfitting. The test phase yielded a test loss of 0.4299, demonstrating good generalization performance. Additionally, selected generated summaries produced showed better performance with ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scores, achieving ROUGE-1, ROUGE-2, and ROUGE-L scores of 0.7143, 0.6000, and 0.5714, respectively. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
A Hybrid Weighted Loss Function forÂ Enhanced Protein Interaction Site Prediction
(Springer Science and Business Media Deutschland GmbH, 2025) Bhat, P.; Patil, N.
Accurately predicting protein interaction sites is crucial for applications such as protein design, drug discovery, and functional protein analysis. However, a significant challenge in this task arises from the inherent class imbalance between interacting and non-interacting sites in protein datasets. While data augmentation techniques are commonly used to mitigate this imbalance, they often introduce noise, potentially reducing prediction accuracy. In this study, we present a novel approach to improve protein interaction site prediction by developing a customized loss function that combines focal loss and cost-sensitive loss, specifically designed to address class imbalance without relying on data augmentation. Our model, which integrates graph convolutional networks (GCNs) to process evolutionary and structural features of proteins, is evaluated using robust performance metrics suited for imbalanced data: Matthews Correlation Coefficient (MCC) and Area Under Precision-Recall Curve (AUPRC). We evaluate the proposed method on the Test_60 dataset, achieving an MCC of 0.342 and an AUPRC of 0.425, providing a modest improvement over the standard cross-entropy loss function. These findings highlight the effectiveness of our tailored loss function in handling class imbalance and improving prediction performance in protein interaction site prediction. Â© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
A multidimensional approach to blog mining
(Springer Verlag service@springer.de, 2018) Sandeep, K.S.; Patil, N.
Blogs are textual web documents published by bloggers to share their experience or opinion about a particular topic(s). These blogs are frequently retrieved by the readers who are in need of such information. Existing techniques for text mining and web document mining can be applied to blogs to ease the blog retrieval. But these existing techniques consider only the content of the blogs or tags associated with them for mining topics from these blogs. This paper proposes a Multidimensional Approach to Blog Mining which defines a method to combine the Blog Content and Blog Tags to obtain Blog Patterns. These Blog Patterns represent a blog better when compared to Blog Content Patterns or Blog Tag Patterns. These Blog Patterns can either be used for Blog Clustering or used by Blog Retrieval Engines to compare with user queries. The proposed approach has been implemented and evaluated on real-world blog data. Â© Springer Nature Singapore Pte Ltd. 2018.
A novel filter–wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets
(Elsevier Ltd, 2019) Gangavarapu, T.; Patil, N.
The predictive accuracy of high-dimensional biomedical datasets is often dwindled by many irrelevant and redundant molecular disease diagnosis features. Dimensionality reduction aims at finding a feature subspace that preserves the predictive accuracy while eliminating noise and curtailing the high computational cost of training. The applicability of a particular feature selection technique is heavily reliant on the ability of that technique to match the problem structure and to capture the inherent patterns in the data. In this paper, we propose a novel filter–wrapper hybrid ensemble feature selection approach based on the weighted occurrence frequency and the penalty scheme, to obtain the most discriminative and instructive feature subspace. The proposed approach engenders an optimal feature subspace by greedily combining the feature subspaces obtained from various predetermined base feature selection techniques. Furthermore, the base feature subspaces are penalized based on specific performance dependent penalty parameters. We leverage effective heuristic search strategies including the greedy parameter-wise optimization and the Genetic Algorithm (GA) to optimize the subspace ensembling process. The effectiveness, robustness, and flexibility of the proposed hybrid greedy ensemble approach in comparison with the base feature selection techniques, and prolific filter and state-of-the-art wrapper methods are justified by empirical analysis on three distinct high-dimensional biomedical datasets. Experimental validation revealed that the proposed greedy approach, when optimized using GA, outperformed the selected base feature selection techniques by 4.17%–15.14% in terms of the prediction accuracy. © 2019 Elsevier B.V.
A novel semi-supervised approach for protein sequence classification
(Institute of Electrical and Electronics Engineers Inc., 2015) Chaturvedi, B.; Patil, N.
Bioinformatics is an emerging research area. Classification of protein sequence dataset is the biggest challenge for researcher. This paper deals with supervised and semi-supervised classification of human protein sequence. Amino acid composition (AAC) used for feature extraction of the protein sequence. The classification techniques like Support Vector Machine (SVM), Naive Bayes, K-Nearest Neighbour (KNN), Random Forest, Decision Tree are using for classification of protein sequence dataset. Amongst these classifiers SVM reported the best result with higher accuracy. The limitation with SVM is that it works only with supervised(labeled dataset). It doesn't work with unsupervised or semi-supervised dataset (unlabeled dataset or large amount of unlabeled dataset among small amount of labeled dataset). A novel semi-supervised support vector machine (SSVM) classifier is proposed which works with combination of labled and unlabled dataset. In results it observed that the proposed approach gives higher accuracy with semi-supervised dataset. Principal component analysis (PCA) used for feature reduction of protein sequence. The proposed semi-supervised support vector machine (SSVM) using PCA gives increased accuracy of about 5 to 10%. Â© 2015 IEEE.
A novel technique of feature selection with relieff and CFS for protein sequence classification
(Springer Verlag service@springer.de, 2019) Kaur, K.; Patil, N.
Bioinformatics has gained wide importance in research area for the last few decades. The main aim is to store the biological data and analyze it for better understanding. To predict the functions of newly added protein sequences, the classification of existing protein sequence is of great use. The rate at which protein sequence data is getting accumulated is increasing exponentially. So, it emerges as a very challenging task for the researcher, to deal with large number of features obtained by the use of various encoding techniques. Here, a two-stage algorithm is proposed for feature selection that combines ReliefF and CFS technique that takes extracted features as input and provides us with the discriminative set of features. The n-gram sequence encoding technique has been used to extract the feature vector from the protein sequences. In the first stage, ReliefF approach is used to rank the features and obtain candidate feature set. In the second stage, CFS is applied on this candidate feature set to obtain features that have high correlation with the class but less correlation with other features. The classification methods like Naive-Bayes, decision tree, and k-nearest neighbor can be used to analyze the performance of proposed approach. It is observed that this approach has increased accuracy of classification methods in comparison to existing methods. © Springer Nature Singapore Pte Ltd. 2019
A pragmatics-oriented high utility mining for itemsets of size two for boosting business yields
(Springer Verlag service@springer.de, 2018) Gahlot, G.; Patil, N.
Retail market has paced with an enormous rate, sprawling its effect over the nations. The B2C companies have been putting lucrative offers and schemes to fetch the customersâ€™ attractions in the awe of upbringing the business profits, but with the mindless notion of the same. Knowledge discovery in the field of data mining can be well harnessed to achieve the profit benefits. This article proposes the novel way for determining the items to be given on sale, with the logical clubs, thus extending the Apriori algorithm. The dissertation proposes the high-utility mining for itemsets of size two (HUM-IS2) Algorithm using the transactional logs of the superstores. The pruning strategies have been introduced to remove unnecessary formations of the clubs. The essence of the algorithm has been proved by experimenting with various datasets. Â© Springer Nature Singapore Pte Ltd. 2018.
A semantic approach to classifying Twitter users
(Springer Verlag service@springer.de, 2018) Joseph, R.J.; Narendra, P.; Shetty, J.; Patil, N.
Social media has grown rapidly in the past several years. Twitter in particular has seen a significant rise in its user audience because of the short and compact Tweet concept (140 characters). As more users come on board, it provides a large market for companies to advertise and find prospective customers by classifying users into different market categories. Traditional classification methods use TFâ€“IDF and bag of words concept as the feature vector which inevitably is of large dimensions. In this paper we propose a method to improve the method of classification using semantic information to reduce dimensions of the feature vectors and validate this method by feeding them into multiple learning algorithms and evaluating the results. Â© Springer Nature Singapore Pte Ltd. 2018.
An effective feature extraction with deep neural network architecture for protein-secondary-structure prediction
(Springer, 2021) Jayasimha, A.; Mudambi, R.; Pavan, P.; Lokaksha, B.M.; Bankapur, S.; Patil, N.
With the increased importance of proteins in day-to-day life, it is imperative to know the protein functions. Deciphering protein structure elucidates protein functions. Experimental approaches for protein-structure analysis are expensive and time-consuming, and require high dexterity. Thus, finding a viable computational approach is vital. Due to the high complexity of predicting protein structure (tertiary structure) directly, research in this field aims at the protein-secondary-structure prediction which is directly related to its tertiary structure. This research aims at exploring a plethora of features, namely position-specific scoring matrices, hidden Markov model alignment matrices, and physicochemical properties, that carry rich information required to predict the secondary structure. Furthermore, it aims at exploring a suitable combination of the features which could capture diverse information about the protein secondary structure. Finally, a cascaded convolutional neural network and bidirectional long short-term memory architecture is fit on the models, and two evaluation metrics, namely, Q8 score and segment overlap score, are benchmarked on various datasets. Our proposed model trained on data of CB6133 dataset and tested on CB513 dataset beats the benchmark models by a minimum of 2.9%. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.
An Effective Multi-Label Protein Sub-Chloroplast Localization Prediction by Skipped-Grams of Evolutionary Profiles Using Deep Neural Network
(Institute of Electrical and Electronics Engineers Inc., 2022) Bankapur, S.; Patil, N.
Chloroplast is one of the most classic organelles in algae and plant cells. Identifying the locations of chloroplast proteins in the chloroplast organelle is an important as well as a challenging task in deciphering their functions. Biological-based experiments to identify the Protein Sub-Chloroplast Localization (PSCL) is time-consuming and cost-intensive. Over the last decade, a few computational methods have been developed to predict PSCL in which earlier works assumed to predict only single-location; whereas, recent works are able to predict multiple-locations of chloroplast organelle. However, the performances of all the state-of-the-art predictors are poor. This article proposes a novel skip-gram technique to extract highly discriminating patterns from evolutionary profiles and a multi-label deep neural network to predict the PSCL. The proposed model is assessed on two publicly available datasets, i.e., Benchmark and Novel. Experimental results demonstrate that the proposed work outperforms significantly when compared to the state-of-the-art multi-label PSCL predictors. A multi-label prediction accuracy (i.e., Overall Actual Accuracy) of the proposed model is enhanced by an absolute minimum margin of 6.7 percent on Benchmark dataset and 7.9 percent on Novel dataset when compared to the best PSCL predictor from the literature. Further, result of statistical t-test concludes that the performance of the proposed work is significantly improved and thus, the proposed work is an effective computational model to solve multi-label PSCL prediction. The proposed prediction model is hosted on web-server and available at https://nitkit-vgst727-nppsa.nitk.ac.in/deeplocpred/. © 2004-2012 IEEE.
An efficient colossal closed itemset mining algorithm for a dataset with high dimensionality
(King Saud bin Abdulaziz University, 2022) Vanahalli, M.K.; Patil, N.
The greater interest of research in the field of bioinformatics and the ample amount of available data across the different domains paved the way for the generation of the dataset with high dimensionality. The number of features in the dataset with high dimensionality are very high and number of rows are less. The significance of the Frequent Colossal Closed Itemsets (FCCI) is high for diverse applications and also for the field of bioinformatics. FCCI are very prominent in the process of the decision making. Amount of information extraction from the dataset with high dimensionality is huge and this extraction is a non-trivial task. The pruning of all the inadmissible features and rows is not performed by the state-of-the-art algorithms. The proposed work articulates the pruning of all the inadmissible features and rows, an efficient pruning strategy to snip the row enumeration mining search space and closure method for checking the closeness of the rowset. An efficient row enumeration algorithm enclosing the rowset closure checking method and pruning strategy is designed to efficiently mine the complete set of FCCI. The experimental results demonstrate the effectiveness of pruning all the inadmissible features and rows. © 2020 The Authors
An efficient dynamic switching algorithm for mining colossal closed itemsets from high dimensional datasets
(Elsevier B.V., 2019) Vanahalli, M.K.; Patil, N.
The abundant data across a variety of domains including bioinformatics has led to the formation of dataset with high dimensionality. The conventional algorithms expend most of their time in mining a large number of small and mid-sized itemsets which does not enclose complete and valuable information for decision making. The recent research is focused on Frequent Colossal Closed Itemsets (FCCI), which plays a significant role in decision making for many applications, especially in the field of bioinformatics. The state-of-the-art algorithms in mining FCCI from datasets consisting of a large number of rows and a large number of features are computationally expensive, as they are either pure row or feature enumeration based algorithms. Moreover, the existing preprocessing techniques fail to prune the complete set of irrelevant features and irrelevant rows. The proposed work emphasizes an Effective Improvised Preprocessing (EIP) technique to prune the complete set of irrelevant features and irrelevant rows, and a novel efficient Dynamic Switching Frequent Colossal Closed Itemset Mining (DSFCCIM) algorithm. The proposed DSFCCIM algorithm efficiently switches between row and feature enumeration methods based on data characteristics during the mining process. Further, the DSFCCIM algorithm is integrated with a novel Rowset Cardinality Table, Itemset Support Table, two efficient methods to check the closeness of rowset and itemset, and two efficient pruning strategies to cut down the search space. The proposed DSFCCIM algorithm is the first dynamic switching algorithm to mine FCCI from datasets consisting of a large number of rows and a large number of features. The performance study shows the improved effectiveness of the proposed EIP technique over the existing preprocessing techniques and the improved efficiency of the proposed DSFCCIM algorithm over the existing algorithms. © 2019 Elsevier B.V.
An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets
(Elsevier Inc. usjcs@elsevier.com, 2019) Vanahalli, M.K.; Patil, N.
Mining colossal itemsets from high dimensional datasets have gained focus in recent times. The conventional algorithms expend most of the time in mining small and mid-sized itemsets, which do not enclose valuable and complete information for decision making. Mining Frequent Colossal Closed Itemsets (FCCI) from a high dimensional dataset play a highly significant role in decision making for many applications, especially in the field of bioinformatics. To mine FCCI from a high dimensional dataset, the existing preprocessing techniques fail to prune the complete set of irrelevant features and irrelevant rows. Besides, the state-of-the-art algorithms for the same are sequential and computationally expensive. The proposed work highlights an Effective Improved Parallel Preprocessing (EIPP) technique to prune the complete set of irrelevant features and irrelevant rows from high dimensional dataset and a novel efficient Parallel Frequent Colossal Closed Itemset Mining (PFCCIM) algorithm. Further, the PFCCIM algorithm is integrated with a novel Rowset Cardinality Table (RCT), an efficient method to check the closeness of a rowset and also an efficient pruning strategy to cut down the mining search space. The proposed PFCCIM algorithm is the first parallel algorithm to mine FCCI from a high dimensional dataset. The performance study shows the improved effectiveness of the proposed EIPP technique over the existing preprocessing techniques and the improved efficiency of the proposed PFCCIM algorithm over the existing algorithms. © 2018 Elsevier Inc.
An Efficient Rainfall Prediction Model Using Deep Learning Method
(Institute of Electrical and Electronics Engineers Inc., 2023) Verma, V.K.; Janagama, H.S.; Patil, N.
Rainfall is a crucial aspect of the Earth's natural cycle and it is necessary for various activities such as agriculture, water supply and hydroelectric power generation. However excessive rainfall can lead to floods, landslides and other destructive consequences, while insufficient rainfall can cause droughts and water shortages. Therefore accurate estimation of rainfall is essential to manage and mitigate the impacts of rainfall. In this study, the dataset is collected from the NASA Power database [22] to predict the annual rainfall in Mangalore(Karnataka), India. The data is collected from January 1, 2003 to February 04, 2023 using NASA POWER API. The study used four models MLP[15], LSTM, BiLSTM, CNN to predict the daily average precipitation that contributes to the annual rainfall. The input parameters considered for the prediction are maximum monthly temperature, minimum monthly temperature, humidity, atmospheric pressure and wind speed[9]. The model's performance is measured using mean squared error (MSE) and mean absolute error (MAE) of the predicted values on training and testing ratio 80:20. CNN(Convolutional Neural Network) model outperforms and gives the MSE and MAE for the CNN(Convolutional Neural Network) model are 0.0041 and 0.0456 respectively. Â© 2023 IEEE.
An Enhanced Protein Fold Recognition for Low Similarity Datasets Using Convolutional and Skip-Gram Features with Deep Neural Network
(Institute of Electrical and Electronics Engineers Inc., 2021) Bankapur, S.; Patil, N.
The protein fold recognition is one of the important tasks of structural biology, which helps in addressing further challenges like predicting the protein tertiary structures and its functions. Many machine learning works are published to identify the protein folds effectively. However, very few works have reported the fold recognition accuracy above 80% on benchmark datasets. In this study, an effective set of global and local features are extracted from the proposed Convolutional (Conv) and SkipXGram bi-gram (SXGbg) techniques, and the fold recognition is performed using the proposed deep neural network. The performance of the proposed model reported 91.4% fold accuracy on one of the derived low similarity (< 25%) datasets of latest extended version of SCOPe_2.07. The proposed model is further evaluated on three popular and publicly available benchmark datasets such as DD, EDD, and TG and obtained 85.9%, 95.8%, and 88.8% fold accuracies, respectively. This work is first to report fold recognition accuracy above 85% on all the benchmark datasets. The performance of the proposed model has outperformed the best state-of-the-art models by 5% to 23% on DD, 2% to 19% on EDD, and 3% to 30% on TG dataset. © 2002-2011 IEEE.
An enhanced protein secondary structure prediction using deep learning framework on hybrid profile based features
(Elsevier Ltd, 2020) Kumar, P.; Bankapur, S.; Patil, N.
Accurate protein secondary structure prediction (PSSP) is essential to identify structural classes, protein folds, and its tertiary structure. To identify the secondary structure, experimental methods exhibit higher precision with the trade-off of high cost and time. In this study, we propose an effective prediction model which consists of hybrid features of 42-dimensions with the combination of convolutional neural network (CNN) and bidirectional recurrent neural network (BRNN). The proposed model is accessed on four benchmark datasets such as CB6133, CB513, CASP10, and CAP11 using Q3, Q8, and segment overlap (Sov) metrics. The proposed model reported Q3 accuracy of 85.4%, 85.4%, 83.7%, 81.5%, and Q8 accuracy 75.8%, 73.5%, 72.2%, and 70% on CB6133, CB513, CASP10, and CAP11 datasets respectively. The results of the proposed model are improved by a minimum factor of 2.5% and 2.1% in Q3 and Q8 accuracy respectively, as compared to the popular existing models on CB513 dataset. Further, the quality of the Q3 results is validated by structural class prediction and compared with PSI-PRED. The experiment showed that the quality of the Q3 results of the proposed model is higher than that of PSI-PRED. © 2019 Elsevier B.V.
An exhaustive review of computational prediction techniques for PPI sites, protein locations, and protein functions
(Springer, 2023) Bhat, P.; Patil, N.
The field of proteomics encompasses a comprehensive examination of proteins, encompassing their structural properties, interactions with other biomolecules, subcellular localization, functional roles, interaction sites, regions of disorder, and exploring novel protein designs. Each of these domains interlinks, contributing valuable information to the study of each other part. Extensive research in most of these areas has given rise to many more challenges that require further exploration. This review mainly concentrates on prediction approaches for proteinâ€“protein interaction sites, protein subcellular locations, and protein functions. We provide an exhaustive collection of several latest works in the above three domains, along with a digest of their descriptions in the most recent times. We conclude the review by highlighting the existing challenges and emphasizing the need for a deeper exploration of the research gaps in these studies. Â© 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.
An exhaustive review of computational prediction techniques for PPI sites, protein locations, and protein functions
(Springer, 2023) Bhat, P.; Patil, N.
The field of proteomics encompasses a comprehensive examination of proteins, encompassing their structural properties, interactions with other biomolecules, subcellular localization, functional roles, interaction sites, regions of disorder, and exploring novel protein designs. Each of these domains interlinks, contributing valuable information to the study of each other part. Extensive research in most of these areas has given rise to many more challenges that require further exploration. This review mainly concentrates on prediction approaches for protein–protein interaction sites, protein subcellular locations, and protein functions. We provide an exhaustive collection of several latest works in the above three domains, along with a digest of their descriptions in the most recent times. We conclude the review by highlighting the existing challenges and emphasizing the need for a deeper exploration of the research gaps in these studies. © 2023, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.