Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    NLP based sentiment analysis on Twitter data using ensemble classifiers
    (Institute of Electrical and Electronics Engineers Inc., 2015) Kanakaraj, M.; Guddeti, G.
    Most sentiment analysis systems use bag-of-words approach for mining sentiments from the online reviews and social media data. Rather considering the whole sentence/ paragraph for analysis, the bag-of-words approach considers only individual words and their count as the feature vectors. This may mislead the classification algorithm especially when used for problems like sentiment classification. Traditional machine learning algorithms like Naive Bayes, Maximum Entropy, SVM etc. are widely used to solve the classification problems. These machine learning algorithms often suffer from biasness towards a particular class. In this paper, we propose Natural Language (NLP) based approach to enhance the sentiment classification by adding semantics in feature vectors and thereby using ensemble methods for classification. Adding semantically similar words and context-sense identities to the feature vectors will increase the accuracy of prediction. Experiments conducted demonstrate that the semantics based feature vector with ensemble classifier outperforms the traditional bag-of-words approach with single machine learning classifier by 3-5%. © 2015 IEEE.
  • Item
    Classification of protein sequences by means of an ensemble classifier with an improved feature selection strategy
    (Springer Verlag, 2018) Sriram, A.; Sanapala, M.; Patel, R.; Patil, N.
    With decreasing cost of biological sequencing, the influx of new sequences into biological databases such as NCBI, SwissProt, UniProt is increasing at an ever-growing pace. Annotating these newly sequenced proteins will aid in ground breaking discoveries for developing novel drugs and potential therapies for diseases. Previous work in this field has harnessed the high computational power of modern machines to achieve good prediction quality but at the cost of high dimensionality. To address this disparity, we propose a novel word segmentation-based feature selection strategy to classify protein sequences using a highly condensed feature set. Using an incremental classifier selection strategy was seen to yield better results than all existing methods. The antioxidant protein data curated in the previous work was used in order to facilitate a level ground for evaluation and comparison of results. The proposed method was found to outperform all existing works on this data with an accuracy of 95%. © Springer Nature Singapore Pte Ltd. 2018.