Deep learning-based multi-view 3D-human action recognition using skeleton and depth data

Ghosh, S.K.Rashmi, M.Mohan, B.R.Guddeti, R.M.R.2026-02-042023Multimedia Tools and Applications, 2023, 82, 13, pp. 19829-1985113807501https://doi.org/10.1007/s11042-022-14214-yhttps://idr.nitk.ac.in/handle/123456789/21919Human Action Recognition (HAR) is a fundamental challenge that smart surveillance systems must overcome. With the rising affordability of capturing human actions with more advanced depth cameras, HAR has garnered increased interest over the years, however the majority of these efforts have been on single-view HAR. Recognizing human actions from arbitrary viewpoints is more challenging, as the same action is observed differently from different angles. This paper proposes a multi-stream Convolutional Neural Network (CNN) model for multi-view HAR using depth and skeleton data. We also propose a novel and efficient depth descriptor, Edge Detected-Motion History Image (ED-MHI), based on Canny Edge Detection and Motion History Image. Also, the proposed skeleton descriptor, Motion and Orientation of Joints (MOJ), represent the appropriate action by using joint motion and orientation. Experimental results on two datasets of human actions: NUCLA Multiview Action3D and NTU RGB-D using a Cross-subject evaluation protocol demonstrated that the proposed system exhibits the superior performance as compared to the state-of-the-art works with 93.87% and 85.61% accuracy, respectively. © 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.ConvolutionDeep neural networksMusculoskeletal systemNeural network modelsSecurity systemsConvolutional neural networkDeep learningDepth cameraFeatures fusionsHuman actionsHuman-action recognitionMotion history imagesMulti-viewsScore fusionSmart surveillance systemsConvolutional neural networksDeep learning-based multi-view 3D-human action recognition using skeleton and depth data