Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 10 of 83
  • Item
    Overview of Arnekt IECSIL at Fire-2018 track on information extraction for conversational systems in Indian languages
    (CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2018) Barathi Ganesh, H.; Padannayil, K.P.; Reshma, U.; Kale, M.; Mankame, P.; Kulkarni, G.; Kale, A.; Anand Kumar, M.
    This overview paper describes the first shared task on Information Extractor for Conversational Systems in Indian Languages (IECSIL) which has been organized by FIRE 2018. Motivated by the need of Information Extractor, corpora has been developed to perform the Named Entity Recognition (Task A) and Relation Extraction (Task B) for five Indian languages (Hindi, Tamil, Malayalam, Telugu and Kannada). Task A is to identify and classify the named entities to one of the many classes and Task B is to extract the relation among the entities present in the sentences. Altogether, nearly 100 submission of 10 different teams were evaluated. In this paper, we have given an overview of the approaches and also discussed the results that the participated teams have attained. © 2018 CEUR-WS. All Rights Reserved.
  • Item
    Overview of the second shared task on Indian native language identification (INLI)
    (CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2018) Anand Kumar, M.; Barathi Ganesh, H.; Ajay, S.G.; Padannayil, K.P.
    This overview paper describes the second shared task on Indian Native Language Identification (INLI) that was organized by FIRE 2018. Given a corpus with comments in English from various Facebook newspapers pages, the objective of the task is to identify the native language among the following six Indian languages: Bengali, Hindi, Kannada, Malayalam, Tamil, and Telugu. Altogether, 31 approaches of 14 different teams are evaluated. In this paper, we report the overview of the participant’s systems and the results of second INLI shared task. We have also compared the results of the first INLI shared task conducted with FIRE-2017. © 2018 CEUR-WS. All Rights Reserved.
  • Item
    KCe_Dalab@maponsms-Fire2018: Effective word and character-based features for multilingual author profiling
    (CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2018) Sharmila Devi, V.; Subramanian, S.; Ravikumar, G.; Anand Kumar, M.
    This paper illustrates the work on identification of gender and age-group in Multilingual Author Profiling on SMS messages (MAPonSMS) shared task conducted in the Forum for Information Retrieval and Evaluation (FIRE 2018). To develop the Multilingual Author profiling system, the organizers released the training corpus which includes multilingual (Roman Urdu and English) SMS messages and its corresponding profiles. In gender identification, a profile may be either male or female. The author's age-group fall into one of the three categories: 15-19, 20-24, 25-xx. We have developed the author profiling system 1 using the word and character-based Term Frequency & Inverse Document Frequency (TFIDF) features and classify with Support Vector Machine classifier. The proposed system achieved the State-of-Art performance in the multilingual author profiling on SMS task. The accuracy obtained for identification of age-group is 65% and for gender, it is 87%. The performance is also evaluated jointly where the accuracy gained is 57%. We also experimented with the system by changing different parameters and report the cross-validation accuracy. © 2018 CEUR-WS. All Rights Reserved.
  • Item
    Indian native language identification - INLI 2018
    (Association for Computing Machinery acmhelp@acm.org, 2018) Anand Kumar, M.; Barathi Ganesh, H.B.; Padannayil, K.P.; Ajay, S.G.
    The growth of digital platforms enables the industries to serve user specific services. Most of the time, the information of the internet users are not explicitly available and it acts as a constrain in developing the personalized applications. There comes the need for author profiling tasks, which intends to predict the internet users characteristics from their texts. Native language Identification is one among the author profiling task, that predicts the authors native language from their texts available in other language. We have proposed Indian Native Language Identification task, where the internet users texts are written in English and participants needs to find, whether the user’s native language is from Tamil, Malayalam, Kannada, Telugu, Bengali and Hindi. The corpus is collected from texts from regional news paper pages available in Facebook by considering the hypothesis that the user belongs to a particular region will read the news from respective regional news paper. © 2018 Association for Computing Machinery.
  • Item
    Information extraction for conversational systems in Indian languages - ARnekt IECSIL
    (Association for Computing Machinery acmhelp@acm.org, 2018) Hb, B.G.; Kp, S.; Reshma, U.; Kale, M.; Mankame, P.; Kulkarni, G.; Kale, A.; Anand Kumar, M.
    Data being the new source of wealth, mining intelligence from every possible units of it, has become today’s salient feature in many fields. Text data is not limited to one language and this has showcased its usability in creating multiple applications from various languages. Development of Indian languages is just getting better both in terms of resource and application specific. Information Extraction for Conversational Systems in Indian Languages - Arnekt IECSIL has taken its step in creating its own resource in Indian languages (Hindi, Kannada, Malayalam, Tamil and Telugu) for Named Entity Recognition (NER) and Information Extraction (IE) tasks. This overview paper will be detailing more on the existing Indian language corpora development and the steps taken for building our own corpus along with its statistics. © 2016 Association for Computing Machinery.
  • Item
    Bot and gender identification from twitter notebook for PAN at CLEF 2019
    (CEUR-WS ceurws@sunsite.informatik.rwth-aachen.de, 2019) Radarapu, R.; Vishwakarma, Y.; Sai Gopal, A.S.; Anand Kumar, M.
    The popularity of social media raises a concern about the quality of content over its platforms. The quality of data is important, especially for fair and considerable predictive analysis. If the quality of data is less, it may result in the prediction of wrong circumstances of an event. This causes misleading trending problems and more importantly, the sensitive stock price may fluctuate. The contents available on social media can be corrupted and overflowed by bots. There are a variety of bots available such as Spam Bots, Influence Bots, etc. Our target is to identify such bots on Twitter. Twitter data is mostly used by data analysts for applications related to scientific predictions or opinion analysis. This working note is capitalized on earlier approaches and Machine Learning (ML) approaches used to classify between a bot and human and find the gender further for interesting studies in crime detection etc. By sharing many attributes for user profiles, we have identified the pattern to find out that the given user is a bot or human based on the tweets posted. © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 September 2019, Lugano, Switzerland.
  • Item
    Automated Traffic Light Signal Violation Detection System Using Convolutional Neural Network
    (Springer, 2020) Bordia, B.; Nishanth, N.; Patel, S.; Anand Kumar, M.; Rudra, B.
    Automated traffic light violation detection system relies on the detection of traffic light color from the video captured with the CCTV camera, detection of the white safety line before the traffic signal and vehicles. Detection of the vehicles crossing traffic signals is generally done with the help of sensors which get triggered when the traffic signal turns red or yellow. Sometimes, these sensors get triggered even when the person crosses the line or some animal crossover or because of some bad weather that gives false results. In this paper, we present a software which will work on image processing and convolutional neural network to detect the traffic signals, vehicles and the white safety line present in front of the traffic signals. We present an efficient way to detect the white safety line in this paper combined with the detection of traffic lights trained on the Bosch dataset and vehicle detection using the TensorFlow object detection SSD model. © 2020, Springer Nature Singapore Pte Ltd.
  • Item
    Overview of the track on HASOC-offensive Language Identification-DravidianCodeMix
    (CEUR-WS, 2020) Chakravarthi, B.R.; Anand Kumar, M.; Mccrae, J.P.; Premjith, B.; Padannayil, K.P.; Mandl, T.
    We present the results and main findings of the HASOC-Offensive Language Identification on code mixed Dravidian languages. The task featured two tasks. Task 1 is about offensive language identification in Malayalam language where the comment were written in both native script and Latin script. Task 2 is about offensive language identification in Tamil and Malayalam languages where the comments were written in Latin script (non-native script). For both the task, given a comment the participants should develop a system to classify the text into offensive or not-offensive. In total 96 participants participated and 12 participants submitted the papers. In this paper, we present the task, data, the results and discuss the system submission and methods used by participants. © 2020 Copyright for this paper by its authors.
  • Item
    NITK NLP at FinCausal-2020 Task 1 Using BERT and Linear models.
    (Association for Computational Linguistics (ACL), 2020) LekshmiAmmal, R.L.; Anand Kumar, M.
    FinCausal-2020 is the shared task which focuses on the causality detection of factual data for financial analysis. The financial data facts don’t provide much explanation on the variability of these data. This paper aims to propose an efficient method to classify the data into one which is having any financial cause or not. Many models were used to classify the data, out of which SVM model gave an F-Score of 0.9435, BERT with specific fine-tuning achieved best results with F-Score of 0.9677. © 2020 FNP-FNS 2020 - 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, Proceedings. All rights reserved.
  • Item
    Leveraging multimodal behavioral analytics for automated job interview performance assessment and feedback
    (Association for Computational Linguistics (ACL), 2020) Agrawal, A.; George, R.A.; Ravi, S.S.; Kamath S․, S.; Anand Kumar, M.
    Behavioral cues play a significant part in human communication and cognitive perception. In most professional domains, employee recruitment policies are framed such that both professional skills and personality traits are adequately assessed. Hiring interviews are structured to evaluate expansively a potential employee’s suitability for the position - their professional qualifications, interpersonal skills, ability to perform in critical and stressful situations, in the presence of time and resource constraints, etc. Therefore, candidates need to be aware of their positive and negative attributes and be mindful of behavioral cues that might have adverse effects on their success. We propose a multimodal analytical framework that analyzes the candidate in an interview scenario and provides feedback for predefined labels such as engagement, speaking rate, eye contact, etc. We perform a comprehensive analysis that includes the interviewee’s facial expressions, speech, and prosodic information, using the video, audio, and text transcripts obtained from the recorded interview. We use these multimodal data sources to construct a composite representation, which is used for training machine learning classifiers to predict the class labels. Such analysis is then used to provide constructive feedback to the interviewee for their behavioral cues and body language. Experimental validation showed that the proposed methodology achieved promising results. © 2017 Association for Computational Linguistics