Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
10 results
Search Results
Item Efficient and Effective Multiple Protein Sequence Alignment Model Using Dynamic Progressive Approach with Novel Look Back Ahead Scoring System(Springer Verlag service@springer.de, 2017) Bankapur, S.; Patil, N.Multiple protein sequence alignment is the elementary hurdle towards addressing further challenges like prediction of protein structure and its functions, protein sub-cellular localization, drug discovery etc. For the last 3 decades numerous models have been proposed to address this challenge however the models are either computationally complex or not effective with respect to aligned results. In this paper, a computationally efficient and effective model is proposed to solve multiple protein sequence alignment. Our proposed model follows dynamic progressive global alignment approach in which a sequence pair is merged dynamically based on novel scoring system, named Look Back Ahead (LBA). Proposed model results were validated with aligned reference results on benchmark datasets (PREFAB4refm and SABrem), using four metrics: Sum-of-Pairs (SP), Total Gap Penalty (TGP), Column Score (CS) and Total Mutation Count Pair-wise (TMCP). Experimental results demonstrate that the proposed method outperforms benchmark reference results in any three evaluation metrics by 77.46% and 68.65% for PREFAB4refm and SABrem datasets respectively. © 2017, Springer International Publishing AG.Item Position-residue specific dynamic gap penalty scoring strategy for multiple sequence alignment(Association for Computing Machinery acmhelp@acm.org, 2017) Bankapur, S.; Patil, N.Multiple Sequence Alignment (MSA) is a basic tool for biological sequence analysis and also a crucial step utilized by biologists to analyze phylogentic, gene regulations, homology marker, drug discovery, and predicting the protein structure and its functions. Effective Alignment of multiple sequences having biologic relevance is still an open problem. Accuracy of MSA is highly dependent on the scoring function, which aligns a given residue to its appropriate position during alignment. Scoring function has three possible cases to score a pair of residues: i) a residue with same residue, ii) a residue with different residue and iii) a residue with gap. A number of biological meaningful approaches are developed for the first two cases. However, for the third case, most of the approaches follow the default score for gap penalty, which is provided as an input by an expert. In this study, we propose a new, biologically relevant, and position-residue specific dynamic scoring approach for gap penalty. Position-Residue Specific Dynamic Gap Penalty (PRSDGP) scoring function is tested on the BAliBASE benchmark dataset. The proposed PRSDGP scoring approach is compared with the CLUSTAL O program and Quality metric improvement ranges from 46.2% to 81.5%. © 2017 Association for Computing Machinery.Item Protein secondary structural class prediction using effective feature modeling and machine learning techniques(Institute of Electrical and Electronics Engineers Inc., 2018) Bankapur, S.; Patil, N.Protein Secondary Structural Class (PSSC) prediction is an important step to find its further folds, tertiary structure and functions, which in turn have potential applications in drug discovery. Various computational methods have been developed to predict the PSSC, however, predicting PSSC on the basis of protein sequences is still a challenging task. In this study, we propose an effective approach to extract features using two techniques (i) SkipXGram bi-gram: in which skipped bi-gram features are extracted and (ii) Character embedded features: in which features are extracted using word embedding approach. The combined feature sets from the proposed feature modeling approach are explored using various machine learning classifiers. The best performing classifier (i.e. Random Forest) is benchmarked against state-of-the-art PSSC prediction models. The proposed model was assessed on two low sequence similarity benchmark datasets i.e. 25PDB and FC699. The performance analysis demonstrates that the proposed model consistently outperformed state-of-the-art models by a factor of 3% to 23% and 4% to 6% for 25PDB and FC699 datasets respectively. © 2018 IEEE.Item Analysis and Prediction of Fantasy Cricket Contest Winners Using Machine Learning Techniques(Springer Science and Business Media Deutschland GmbH info@springer-sbm.com, 2021) Karthik, K.; S. Krishnan, G.S.; Shetty, S.; Bankapur, S.; Kolkar, R.; Ashwin, T.S.; Vanahalli, M.K.Cricket is one of the well-known sports across the world. The increasing interest of cricket in recent years resulted in different forms like T20, T10 from test and one day format. The craze of all these formats of cricket matches today has come into online fantasy cricket league games. Dream11 is one such app that is most popular in this context, along with many similar apps. Creating a dream team of 11 players from playing 11 of both teams involves skills, ideas and luck. Predicting a winner among all the joined contestants based on the previous historical data is a challenging task. In this paper, we used a feed-forward deep neural network (DNN) classifier for predicting the winning contestant for the top three positions in a fantasy league cricket contest. The performance of the DNN approach was compared against that of state-of-the-art machine learning approaches like k-nearest neighbours (KNN), logistic regression (LR), Naive Bayes (NB), random forest (RF), support vector machines (SVM) and in predicting the fantasy cricket contest winners. Among the methods used, DNN showed the best results for all three positions, showing its consistency in predicting the winners and outperforms the state-of-the-art machine learning classifiers by 13%, 8% and 9%, respectively, for first, second and third winning positions, respectively. © 2021, The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.Item An enhanced protein secondary structure prediction using deep learning framework on hybrid profile based features(Elsevier Ltd, 2020) Kumar, P.; Bankapur, S.; Patil, N.Accurate protein secondary structure prediction (PSSP) is essential to identify structural classes, protein folds, and its tertiary structure. To identify the secondary structure, experimental methods exhibit higher precision with the trade-off of high cost and time. In this study, we propose an effective prediction model which consists of hybrid features of 42-dimensions with the combination of convolutional neural network (CNN) and bidirectional recurrent neural network (BRNN). The proposed model is accessed on four benchmark datasets such as CB6133, CB513, CASP10, and CAP11 using Q3, Q8, and segment overlap (Sov) metrics. The proposed model reported Q3 accuracy of 85.4%, 85.4%, 83.7%, 81.5%, and Q8 accuracy 75.8%, 73.5%, 72.2%, and 70% on CB6133, CB513, CASP10, and CAP11 datasets respectively. The results of the proposed model are improved by a minimum factor of 2.5% and 2.1% in Q3 and Q8 accuracy respectively, as compared to the popular existing models on CB513 dataset. Further, the quality of the Q3 results is validated by structural class prediction and compared with PSI-PRED. The experiment showed that the quality of the Q3 results of the proposed model is higher than that of PSI-PRED. © 2019 Elsevier B.V.Item ProgSIO-MSA: Progressive-based single iterative optimization framework for multiple sequence alignment using an effective scoring system(World Scientific Publishing Co. Pte Ltd wspc@wspc.com.sg, 2020) Bankapur, S.; Patil, N.Aligning more than two biological sequences is termed multiple sequence alignment (MSA). To analyze biological sequences, MSA is one of the primary activities with potential applications in phylogenetics, homology markers, protein structure prediction, gene regulation, and drug discovery. MSA problem is considered as NP-complete. Moreover, with the advancement of Next-Generation Sequencing techniques, all the gene and protein databases are consistently loaded with a vast amount of raw sequence data which are neither analyzed nor annotated. To analyze these growing volumes of raw sequences, the need of computationally-efficient (polynomial time) models with accurate alignment is high. In this study, a progressive-based alignment model is proposed, named ProgSIO-MSA, which consists of an effective scoring system and an optimization framework. The proposed scoring system aligns sequences effectively using the combination of two scoring strategies, i.e. Look Back Ahead, that scores a residue pair dynamically based on the status information of the previous position to improve the sum-of-pair score, and Position-Residue-Specific Dynamic Gap Penalty, that dynamically penalizes a gap using mutation matrix on the basis of residue and its position information. The proposed single iterative optimization (SIO) framework identifies and optimizes the local optima trap to improve the alignment quality. The proposed model is evaluated against progressive-based state-of-the-art models on two benchmark datasets, i.e. BAliBASE and SABmark. The alignment quality (biological accuracy) of the proposed model is increased by a factor of 17.7% on BAliBASE dataset. The proposed model's efficiency is compared with state-of-the-art models using time complexity as well as runtime analysis. Wilcoxon signed-rank statistical test results concluded that the quality of the proposed model significantly outperformed progressive-based state-of-the-art models. © 2020 World Scientific Publishing Europe Ltd.Item An Enhanced Protein Fold Recognition for Low Similarity Datasets Using Convolutional and Skip-Gram Features with Deep Neural Network(Institute of Electrical and Electronics Engineers Inc., 2021) Bankapur, S.; Patil, N.The protein fold recognition is one of the important tasks of structural biology, which helps in addressing further challenges like predicting the protein tertiary structures and its functions. Many machine learning works are published to identify the protein folds effectively. However, very few works have reported the fold recognition accuracy above 80% on benchmark datasets. In this study, an effective set of global and local features are extracted from the proposed Convolutional (Conv) and SkipXGram bi-gram (SXGbg) techniques, and the fold recognition is performed using the proposed deep neural network. The performance of the proposed model reported 91.4% fold accuracy on one of the derived low similarity (< 25%) datasets of latest extended version of SCOPe_2.07. The proposed model is further evaluated on three popular and publicly available benchmark datasets such as DD, EDD, and TG and obtained 85.9%, 95.8%, and 88.8% fold accuracies, respectively. This work is first to report fold recognition accuracy above 85% on all the benchmark datasets. The performance of the proposed model has outperformed the best state-of-the-art models by 5% to 23% on DD, 2% to 19% on EDD, and 3% to 30% on TG dataset. © 2002-2011 IEEE.Item Enhanced protein structural class prediction using effective feature modeling and ensemble of classifiers(Institute of Electrical and Electronics Engineers Inc., 2021) Bankapur, S.; Patil, N.Protein Secondary Structural Class (PSSC) information is important in investigating further challenges of protein sequences like protein fold recognition, protein tertiary structure prediction, and analysis of protein functions for drug discovery. Identification of PSSC using biological methods is time-consuming and cost-intensive. Several computational models have been developed to predict the structural class; however, they lack in generalization of the model. Hence, predicting PSSC based on protein sequences is still proving to be an uphill task. In this article, we proposed an effective, novel and generalized prediction model consisting of a feature modeling and an ensemble of classifiers. The proposed feature modeling extracts discriminating information (features) by leveraging three techniques: (i) Embedding – features are extracted on the basis of spatial residue arrangements of the sequences using word embedding approaches; (ii) SkipXGram Bi-gram – various sets of skipped bi-gram features are extracted from the sequences; and (iii) General Statistical (GS) based features are extracted which covers the global information of structural sequences. The combined effective sets of features are trained and classified using an ensemble of three classifiers: Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machines (GBM). The proposed model when assessed on five benchmark datasets (high and low sequence similarity), viz. z277, z498, 25PDB, 1189, and FC699, reported an overall accuracy of 93.55, 97.58, 81.82, 81.11, and 93.93 percent respectively. The proposed model is further validated on a large-scale updated low similarity (?25%) dataset, where it achieved an overall accuracy of 81.11 percent. The proposed generalized model is robust and consistently outperformed several state-of-the-art models on all the five benchmark datasets. © 2021 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.Item An effective feature extraction with deep neural network architecture for protein-secondary-structure prediction(Springer, 2021) Jayasimha, A.; Mudambi, R.; Pavan, P.; Lokaksha, B.M.; Bankapur, S.; Patil, N.With the increased importance of proteins in day-to-day life, it is imperative to know the protein functions. Deciphering protein structure elucidates protein functions. Experimental approaches for protein-structure analysis are expensive and time-consuming, and require high dexterity. Thus, finding a viable computational approach is vital. Due to the high complexity of predicting protein structure (tertiary structure) directly, research in this field aims at the protein-secondary-structure prediction which is directly related to its tertiary structure. This research aims at exploring a plethora of features, namely position-specific scoring matrices, hidden Markov model alignment matrices, and physicochemical properties, that carry rich information required to predict the secondary structure. Furthermore, it aims at exploring a suitable combination of the features which could capture diverse information about the protein secondary structure. Finally, a cascaded convolutional neural network and bidirectional long short-term memory architecture is fit on the models, and two evaluation metrics, namely, Q8 score and segment overlap score, are benchmarked on various datasets. Our proposed model trained on data of CB6133 dataset and tested on CB513 dataset beats the benchmark models by a minimum of 2.9%. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Austria, part of Springer Nature.Item An Effective Multi-Label Protein Sub-Chloroplast Localization Prediction by Skipped-Grams of Evolutionary Profiles Using Deep Neural Network(Institute of Electrical and Electronics Engineers Inc., 2022) Bankapur, S.; Patil, N.Chloroplast is one of the most classic organelles in algae and plant cells. Identifying the locations of chloroplast proteins in the chloroplast organelle is an important as well as a challenging task in deciphering their functions. Biological-based experiments to identify the Protein Sub-Chloroplast Localization (PSCL) is time-consuming and cost-intensive. Over the last decade, a few computational methods have been developed to predict PSCL in which earlier works assumed to predict only single-location; whereas, recent works are able to predict multiple-locations of chloroplast organelle. However, the performances of all the state-of-the-art predictors are poor. This article proposes a novel skip-gram technique to extract highly discriminating patterns from evolutionary profiles and a multi-label deep neural network to predict the PSCL. The proposed model is assessed on two publicly available datasets, i.e., Benchmark and Novel. Experimental results demonstrate that the proposed work outperforms significantly when compared to the state-of-the-art multi-label PSCL predictors. A multi-label prediction accuracy (i.e., Overall Actual Accuracy) of the proposed model is enhanced by an absolute minimum margin of 6.7 percent on Benchmark dataset and 7.9 percent on Novel dataset when compared to the best PSCL predictor from the literature. Further, result of statistical t-test concludes that the performance of the proposed work is significantly improved and thus, the proposed work is an effective computational model to solve multi-label PSCL prediction. The proposed prediction model is hosted on web-server and available at https://nitkit-vgst727-nppsa.nitk.ac.in/deeplocpred/. © 2004-2012 IEEE.
