Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
Search Results
Item Hate Speech Detection Using Audio in Portuguese Language(Springer Science and Business Media Deutschland GmbH, 2024) Tembe, L.A.; Anand Kumar, M.This study focuses on hate speech in Portuguese language using audio and introduces a novel methodology that integrates audio-to-text and self-image technologies to effectively tackle this problem. We utilize Machine Learning and Deep Learning models to differentiate between hate speech and normal speech. The research utilized a total of 200 datasets, which were categorized into hate speech and normal speech. These datasets were collected by me personally for this project. Four distinct models are presented in the analysis: LSTM, SVM, CNN, and Random Forest. The findings highlight the superior performance of the CNN model when applied to spectrogram data, achieving an accuracy rate of 90%. Conversely, the Random Forest model outperforms others when dealing with text data, achieving an impressive accuracy rate of 73.1%. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.Item SCaLAR NITK at Touché: Comparative Analysis of Machine Learning Models for Human Value Identification(CEUR-WS, 2024) Praveen, K.; Darshan, R.K.; Reddy, C.T.; Anand Kumar, M.This study delves into task of detecting human values in textual data by making use of Natural Language Processing (NLP) techniques. With the increasing use of social media and other platforms, there is an abundance in data that is generated. Finding human values in these text data will help us to understand and analyze human behavior in a better way, because these values are the core principle that influence human behavior. Analyzing these human values will help not only in research but also for practical applications such as sentiment evaluation, market analysis and personalized recommendation systems. The study tries to evaluate the performance of different existing models along with proposing novel techniques. Models used in this study range from simple machine learning model like SVM, KNN and Random Forest algorithms for classification using embeddings obtained from BERT till transformer models like BERT and RoBERTa for text classification and Large Language Models like Mistral-7b. The task that has be performed is a multilabel, multitask classification. QLoRA quantization method is used for reducing the size of weights of the model which makes it computationally less expensive for training and Supervised Fine Tuning (SFT) trainer is used for fine tuning LLMs for this specific task. It was found that LLMs performed better compared to all other models. © 2024 Copyright for this paper by its authors.
