Multi-stream Multi-attention Deep Neural Network for Context-Aware Human Action Recognition
No Thumbnail Available
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Technological innovations in deep learning models have enabled reasonably close solutions to a wide variety of computer vision tasks such as object detection, face recognition, and many more. On the other hand, Human Action Recognition (HAR) is still far from human-level ability due to several challenges such as diversity in performing actions. Due to data availability in multiple modalities, HAR using video data recorded by RGB-D cameras is frequently used in current research. This paper proposes an approach for recognizing human actions using depth and skeleton data captured using the Kinect depth sensor. Attention modules have been introduced in recent years to assist in focusing on the most important features in computer vision tasks. This paper proposes a multi-stream deep learning model with multiple attention blocks for HAR. At first, the depth and skeletal modalities' action data are represented using two distinct action descriptors. Each generates an image from the action data gathered from numerous frames. The proposed deep learning model is trained using these descriptors. Additionally, we propose a set of score fusion techniques for accurate HAR using all the features and trained CNN + LSTM streams. The proposed method is evaluated on two benchmark datasets using well known cross-subject evaluation protocol. The proposed technique achieved 89.83% and 90.7% accuracy on the MSRAction3D and UTDMHAD datasets, respectively. The experimental results establish the validity and effectiveness of the proposed model. © 2022 IEEE.
Description
Keywords
Attention, CNN, Computer Vision, depth data, Human Action Recognition (HAR), Skeleton data
Citation
2022 IEEE Region 10 Symposium, TENSYMP 2022, 2022, Vol., , p. -
