Unobtrusive Context-Aware Human Identification and Action Recognition System for Smart Environments
Date
2023
Authors
M, Rashmi
Journal Title
Journal ISSN
Volume Title
Publisher
National Institute Of Technology Karnataka Surathkal
Abstract
A smart environment has the ability to securely integrate multiple technological so-
lutions to manage its assets, such as the information systems of local government de-
partments, schools, transportation networks, hospitals, and other community services.
They utilize low-power sensors, cameras, and software with Artificial Intelligence to
continuously monitor the system’s operation. Smart environments require appropriate
monitoring technologies for a secure living environment and efficient management.
Global security threats have produced a considerable demand for intelligent surveil-
lance systems in smart environments. Consequently, the number of cameras deployed
in smart environments to record the happenings in the vicinity is increasing rapidly. In
recent years, the proliferation of cameras such as Closed Circuit Television (CCTV),
depth sensors, and mobile phones used to monitor human activities has led to an ex-
plosion of visual data. It requires considerable effort to interpret and store all of this
visual data. Numerous applications of intelligent environments rely on the content of
captured videos, including smart video surveillance to monitor human activities, crime
detection, intelligent traffic management, human identification, etc.
Intelligent surveillance systems must perform unobtrusive human identification and
human action recognition to ensure a secure and pleasant life in a smart environment.
This research thesis presents various approaches using advanced deep learning technol-
ogy for unobtrusive human identification and human action recognition based on visual
data in various data modalities. This research thesis explores the unobtrusive identifica-
tion of humans based on skeleton and depth data. Also, several methods for recognizing
human actions using RGB, depth, and skeleton data are presented.
Initially, a domain-specific human action recognition system employing RGB data
for a computer laboratory in a college environment is introduced. A dataset of human
actions particular to the computer laboratory environment is generated using sponta-
neous video data captured by cameras installed in laboratories. The dataset contains
several instances of five distinct human actions in college computer laboratories. Also,
human action recognition system based on transfer learning is presented for locating
and recognizing multiple human actions in an RGB image.
Human action recognition systems based on skeleton data is developed and evalu-
ated on publicly available datasets using benchmark evaluation protocols and metrics.
The skeleton data-based action recognition mainly concentrates on the 3D coordinates
of various skeleton joints of the human body. This research thesis presents several ef-
ficient action representation methods from the data sequence in skeleton frames. Askeleton data-based human action recognition system places the skeleton joints in a
specific order, and the distance between joints is extracted as features. A multi-layer
deep learning model is proposed to learn the features and recognize human actions.
Human gait is one of the most useful biometric features for human identification.
The vision-based gait data allows human identification unobtrusively. This research
thesis presents deep learning-based human identification systems using gait data in
skeleton format. We present an efficient feature extraction method that captures human
skeleton joints’ spatial and temporal features during walking. This specifically focuses
on the features of different gait events in the entire gait cycle. Also, deep learning
models are developed to learn these features for accurate human identification systems.
The developed models are evaluated on publicly available single and multi-view gait
datasets using various evaluation protocols and performance metrics.
In addition, multi-modal human action recognition and human identification sys-
tems are developed using skeleton and depth data. This presents efficient image rep-
resentations of human actions from the sequence of frames in skeleton and depth data
formats. Various deep learning models using CNN, LSTM, and advanced techniques
such as Attention is presented to extract and learn the features from image represen-
tation of the actions. Also, another work presents a method focusing on overlapping
sub-actions of action in depth and skeleton format for action representation and fea-
ture extraction. In addition, the image representation of the gait cycle in skeleton and
depth data, along with a deep learning model, is proposed. Multi-stream deep learning
models are proposed to learn features from multi-modal data for human action recogni-
tion and human identification. In addition, various score fusion operations are proposed
to merge the results from multiple streams of deep learning models to ensure efficient
performance. The developed systems are evaluated on publicly available multi-modal
datasets for human actions and human gait using standard evaluation protocols.
Description
Keywords
Attention, Deep learning, Depth data, Human action recog- nition