IIMH: Intention Identification in Multimodal Human Utterances

dc.contributor.authorKeerthan Kumar, T.G.
dc.contributor.authorDhakate, H.
dc.contributor.authorKoolagudi, S.G.
dc.date.accessioned2026-02-06T06:34:27Z
dc.date.issued2023
dc.description.abstractIntention identification is a challenging problem in the field of natural language processing, speech processing, and computer vision. People often use contradictory or ambiguous words in different contexts, which can sometimes be very confusing to identify the intention behind an utterance. Intention identification has many practical applications in the fields of natural language processing, sentiment analysis, social media analysis, robotics, and human-computer interaction, where valuable insights into user behavior can be achieved by identifying intention. In this work, we propose a model to determine whether an utterance made by a person is intentional or not intentional. To achieve this, we collected a multimodal dataset containing text, video, and speech from various TV shows, movies, and YouTube videos and labeled them with their corresponding intention. Feature extraction is done at both utterance and word levels to get useful information from all three modalities. We trained the baseline model using SVM to set a benchmark performance. We designed an architecture to detect the contradiction between positive spoken words with negative facial expressions or speech to identify an utterance as non-intentional. Along with the architecture, we used different approaches for classification and got the best results with the Support vector machine (SVM) classifier using RBF kernel, with an accuracy of 78.83% and proven to be better compared to the baseline approach. © 2023 ACM.
dc.identifier.citationACM International Conference Proceeding Series, 2023, Vol., , p. 337-344
dc.identifier.urihttps://doi.org/10.1145/3607947.3608016
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/29224
dc.publisherAssociation for Computing Machinery
dc.subjectBERT
dc.subjectDeep Learning
dc.subjectIntention
dc.subjectMultimodal
dc.subjectNLP
dc.subjectSentiment
dc.subjectSVM
dc.subjectUtterance-level features
dc.subjectWord-level features
dc.titleIIMH: Intention Identification in Multimodal Human Utterances

Files