Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 3 of 3

Multi-stream Multi-attention Deep Neural Network for Context-Aware Human Action Recognition
(Institute of Electrical and Electronics Engineers Inc., 2022) Rashmi, M.; Guddeti, R.M.R.
Technological innovations in deep learning models have enabled reasonably close solutions to a wide variety of computer vision tasks such as object detection, face recognition, and many more. On the other hand, Human Action Recognition (HAR) is still far from human-level ability due to several challenges such as diversity in performing actions. Due to data availability in multiple modalities, HAR using video data recorded by RGB-D cameras is frequently used in current research. This paper proposes an approach for recognizing human actions using depth and skeleton data captured using the Kinect depth sensor. Attention modules have been introduced in recent years to assist in focusing on the most important features in computer vision tasks. This paper proposes a multi-stream deep learning model with multiple attention blocks for HAR. At first, the depth and skeletal modalities' action data are represented using two distinct action descriptors. Each generates an image from the action data gathered from numerous frames. The proposed deep learning model is trained using these descriptors. Additionally, we propose a set of score fusion techniques for accurate HAR using all the features and trained CNN + LSTM streams. The proposed method is evaluated on two benchmark datasets using well known cross-subject evaluation protocol. The proposed technique achieved 89.83% and 90.7% accuracy on the MSRAction3D and UTDMHAD datasets, respectively. The experimental results establish the validity and effectiveness of the proposed model. Â© 2022 IEEE.
Exploiting skeleton-based gait events with attention-guided residual deep learning model for human identification
(Springer, 2023) Rashmi, M.; Guddeti, R.M.R.
Human identification using unobtrusive visual features is a daunting task in smart environments. Gait is among adequate biometric features when the camera cannot correctly capture the human face due to environmental factors. In recent years, gait-based human identification using skeleton data has been intensively studied using a variety of feature extractors and more sophisticated deep learning models. Although skeleton data is susceptible to changes in covariate variables, resulting in noisy data, most existing algorithms employ a single feature extraction technique for all frames to generate frame-level feature maps. This results in degraded performance and additional features, necessitating increased computing power. This paper proposes a robust feature extractor that extracts a quantitative summary of gait event-specific information, thereby reducing the total number of features throughout the gait cycle. In addition, a novel Attention-guided LSTM-based deep learning model with residual connections is proposed to learn the extracted features for gait recognition. The proposed approach outperforms the state-of-the-art works on five publicly available datasets on various benchmark evaluation protocols and metrics. Further, the CMC test revealed that the proposed model obtained higher than 97% Accuracy in lower-level ranks on these datasets. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Human action recognition using multi-stream attention-based deep networks with heterogeneous data from overlapping sub-actions
(Springer Science and Business Media Deutschland GmbH, 2024) Rashmi, M.; Guddeti, R.M.R.
Vision-based Human Action Recognition is difficult owing to the variations in the same action performed by various people, the temporal variations in actions, and the difference in viewing angles. Researchers have recently adopted multi-modal visual data fusion strategies to address the limitations of single-modality methodologies. Many researchers strive to produce more discriminative features because most existing techniques’ success relies on feature representation in the data modality under consideration. Human action consists of several sub-actions whose duration vary between individuals. This paper proposes a multifarious learning framework employing action data in depth and skeleton formats. Firstly, a novel action representation named Multiple Sub-action Enhanced Depth Motion Map (MS-EDMM), integrating depth features from overlapping sub-actions, is proposed. Secondly, an efficient method is introduced for extracting spatio-temporal features from skeleton data. This is achieved by dividing the skeleton sequence into sub-actions and summarizing skeleton joint information for five distinct human body regions. Next, a multi-stream deep learning model with Attention-guided CNN and residual LSTM is proposed for classification, followed by several score fusion operations to reap the benefits of streams trained with multiple data types. The proposed method demonstrated a superior performance of 1.62% over an existing method that utilized skeleton and depth data, achieving an accuracy 89.76% on a single-view UTD-MHAD dataset. Furthermore, on the multi-view NTU RGB+D dataset demonstrated encouraging performance with an accuracy of 89.75% in cross-view and 83.8% in cross-subject evaluations. © The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results