Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 5 of 5
  • Item
    Automatic identification and ranking of emergency AIDS in social media macro community
    (CEUR-WS, 2017) Gautam, B.; Annappa, B.
    Online social microblogging platforms including Twiter are increasingly used for aiding relief operations during disaster events. During most of the calamities that can be natural disasters or even armed atacks, non-governmental organizations look for critical information about resources to support effected people. Despite the recent advancement of natural language processing with deep neural networks, retrieval and ranking of short text becomes a challenging task because a lot of conversational and sympathy content merged with the critical information. In this paper, we address the problem of categorical information retrieval and ranking of most relevance information while considering the presence of short-text and multilingual languages that arise during such events. Our proposed model is based on the formation of embedding vector with the help of textual and statistical preprocessing, and finally, entire training 2,100,000 vectors were normalized using feed-forward neural network for need and availability tweets. Another important contribution of this paper lies in novel weighted Ranking Key algorithm based on top five general terms to rank the classified tweets in most relevance with classification. Lastly, we test our model on Nepal Earthquake dataset (contains short text and multilingual language tweets) and achieved 6.81% of mean average precision on 5,250,000 unlabeled embedding vectors of disaster relief tweets.
  • Item
    When and where?: Behavior dominant location forecasting with micro-blog streams
    (IEEE Computer Society, 2018) Gautam, B.; Annappa, B.; Singh, A.; Agrawal, A.
    The proliferation of smartphones and wearable devices has increased the availability of large amounts of geospatial streams to provide significant automated discovery of knowledge in pervasive environments, but most prominent information related to altering interests have not yet adequately capitalized. In this paper, we provide a novel algorithm to exploit the dynamic fluctuations in user's point-of-interest while forecasting the future place of visit with fine granularity. Our proposed algorithm is based on the dynamic formation of collective personality communities using different languages, opinions, geographical and temporal distributions for finding out optimized equivalent content. We performed extensive empirical experiments involving, real-time streams derived from 0.6 million stream tuples of micro-blog comprising 1945 social person fusion with graph algorithm and feed-forward neural network model as a predictive classification model. Lastly, The framework achieves 62.10% mean average precision on 1,20,000 embeddings on unlabeled users and surprisingly 85.92% increment on the state-of-the-art approach. © 2018 IEEE.
  • Item
    Detecting Semantic Similarity of Documents Using Natural Language Processing
    (Elsevier B.V., 2021) Agarwala, S.; Anagawadi, A.; Reddy Guddeti, R.M.
    The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall's Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents. © 2021 Elsevier B.V.. All rights reserved.
  • Item
    Impact of Vector Embeddings on the Performance of Tolerance Near Sets-based Sentiment Classifier for Text Classification
    (Elsevier B.V., 2023) Hegde, T.; Sanjay, K.S.; Thomas, S.M.; Kambhammettu, R.; Anand Kumar, M.; Ramanna, S.
    In recent years, Natural Language Processing (NLP) has gained significant attention, and sentiment analysis is an essential subfield of NLP that deals with identifying the sentiment or emotion conveyed in the text. Tolerance near sets (TNS) is a mathematical framework that has shown promising results in sentiment analysis tasks. However, the choice of word embeddings can significantly impact the performance of TNS-based classifiers. This paper investigates the impact of using different embeddings on the performance of tolerance near sets-based sentiment classifiers. This paper compares the use of different embeddings, including DistilBERT, MiniLM, and Word Embeddings, and their combinations, to understand their impact on TNS-based sentiment analysis. The TSC 2.0 model proposed in this paper achieves a weighted F1 score of 92.1% in one of the datasets, an improvement due to the sentence embeddings used. Experimental results have led to the observation that tie-breaking and variance-based classification may have led to a noticeable improvement in cases with more than three. © 2023 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
  • Item
    fastText-Based Siamese Network for Hindi Semantic Textual Similarity
    (Springer Science and Business Media Deutschland GmbH, 2025) Chandrashekar, A.; Rushad, M.; Nambiar, A.; Rashmi, V.; Koolagudi, S.G.
    Semantic textual similarity is a measurement of the degree of similarity or equivalence between two sentences semantically. Semantic sentence pairs have the ability to substitute text from each other and retain their meaning. Various rule-based and machine learning models have gained quick prominence in the field, especially in a language like English, where there is an abundance of lexical tools and resources. However, other languages like Hindi have not seen much improvement in state-of-the-art methods and models to evaluate semantic similarity of text data. This paper proposes a fastText-based Siamese neural network architecture to evaluate the semantic equivalency between a Hindi sentence pair. The pair is scored on a scale of 0–5, where 0 indicates least similar and 5 indicates most similar. The corpus contains a combination of two datasets containing manually scored sentence pairs. The performance parameters used to evaluate this approach are model accuracy and model loss over a training period of multiple epochs. The proposed architecture incorporates a fastText-based embedding layer and a bi-directional Long Short Term Memory layer to achieve a similarity score. The proposed architecture can extract semantic and various global features of the text to determine a similarity score. This model achieves an accuracy of 85.5% on a compiled Hindi-Hindi sentence pair dataset, which is a considerable improvement over existing rule and supervise-based systems. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.