Multi-stream Multi-attention Deep Neural Network for Context-Aware Human Action Recognition

No Thumbnail Available

Date

2022

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

Technological innovations in deep learning models have enabled reasonably close solutions to a wide variety of computer vision tasks such as object detection, face recognition, and many more. On the other hand, Human Action Recognition (HAR) is still far from human-level ability due to several challenges such as diversity in performing actions. Due to data availability in multiple modalities, HAR using video data recorded by RGB-D cameras is frequently used in current research. This paper proposes an approach for recognizing human actions using depth and skeleton data captured using the Kinect depth sensor. Attention modules have been introduced in recent years to assist in focusing on the most important features in computer vision tasks. This paper proposes a multi-stream deep learning model with multiple attention blocks for HAR. At first, the depth and skeletal modalities' action data are represented using two distinct action descriptors. Each generates an image from the action data gathered from numerous frames. The proposed deep learning model is trained using these descriptors. Additionally, we propose a set of score fusion techniques for accurate HAR using all the features and trained CNN + LSTM streams. The proposed method is evaluated on two benchmark datasets using well known cross-subject evaluation protocol. The proposed technique achieved 89.83% and 90.7% accuracy on the MSRAction3D and UTDMHAD datasets, respectively. The experimental results establish the validity and effectiveness of the proposed model. © 2022 IEEE.

Description

Keywords

Attention, CNN, Computer Vision, depth data, Human Action Recognition (HAR), Skeleton data

Citation

2022 IEEE Region 10 Symposium, TENSYMP 2022, 2022, Vol., , p. -

Endorsement

Review

Supplemented By

Referenced By