Faculty Publications
Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736
Publications by NITK Faculty
Browse
4 results
Search Results
Item A novel real-time face detection system using modified affine transformation and Haar cascades(Springer Verlag service@springer.de, 2019) Sharma, R.; Ashwin, T.S.; Guddeti, R.M.R.Human Face Detection is an important problem in the area of Computer Vision. Several approaches are used to detect the face for a given frame of an image but most of them fail to detect the faces which are tilted, occluded, or with different illuminations. In this paper, we propose a novel real-time face detection system which detects the faces that are tilted, occluded, or with different illuminations, any difficult pose. The proposed system is a desktop application with a user interface that not only collects the images from web camera but also detects the faces in the image using a Haar-cascaded classifier consisting of Modified Census Transform features. The problem with cascaded classifier is that it does not detect the tilted or occluded faces with different illuminations. Hence to overcome this problem, we proposed a system using Modified Affine Transformation with Viola Jones. Experimental results demonstrate that proposed face detection system outperforms Viola–Jones method by 6% (99.7% accuracy for the proposed system when compare to 93.5% for Voila Jones) with respect to three different datasets namely FDDB, YALE and “Google top 25 ‘tilted face’” image datasets. © Springer Nature Singapore Pte Ltd. 2019Item Multimodal group activity state detection for classroom response system using convolutional neural networks(Springer Verlag service@springer.de, 2019) Sebastian, A.G.; Singh, S.; Manikanta, P.B.T.; Ashwin, T.S.; Guddeti, R.M.R.Human–Computer Interaction is a crucial and emerging field in computer science. This is because computers are replacing humans in many jobs to provide services. This has resulted in the computer being needed to interact with the human in the same way as the human does with another. When humans talk to each other, they gain feedback based on how the other person responds non-verbally. Since computers are now interacting with humans, they need to be able to detect these facial cues and accordingly adjust their services based on this feedback. Our proposed method aims at building a Multimodal Group Activity State Detection for Classroom Response System which tries to recognize the learning behavior of a classroom for providing effective feedback and inputs to the teacher. The key challenges dealt here are to detect and analyze as many students as possible for a non-biased evaluation of the mood of the students and classify them into three activity states defined: Active, passive, and inactive. © Springer Nature Singapore Pte Ltd. 2019Item Face Detection and Recognition Using OpenCV and Vision Transformer(Institute of Electrical and Electronics Engineers Inc., 2023) Kumar, K.; Pingale, N.; Rudra, B.Face recognition technology is vital in the real world with diverse applications. It is primarily used for security, law enforcement, personalization, healthcare, and education. Face recognition systems use biometric features like facial landmarks, texture, and shape to identify and verify individuals. The suggested approach employs a transformer-based architecture that solely relies on self-attention and does not utilize Convolutional Layers. This design choice enables the model to be trained efficiently with minimal computational power and fewer parameters than a CNN. The application of Vision Transformer (ViT) in various computer vision tasks has been highly successful, making it a state-of-the-art approach. Given its superior performance, we are interested in exploring whether ViT can enhance the accuracy of sheep face recognition.In this paper, we show that ViT can be a useful technique for facial recognition. Since there was no predefined dataset for face recognition, a PCI dataset was built for this investigation. Along with the PCI dataset, two more well-known datasets, AT&T and 5-Celebrity, we used to examine performance. In our model was seen that ViT could identify human faces on the PCI dataset with a 99% accuracy rate and perform much better than other face recognition algorithms like Eigenface, FisherFace, and LBPH. © 2023 IEEE.Item Video summarization and captioning using dynamic mode decomposition for surveillance(Springer Science and Business Media B.V., 2021) Radarapu, R.; Gopal, A.S.S.; Nh, M.; Anand Kumar, M.Video surveillance has become a major tool in security maintenance. But analyzing in a playback version to detect any motion or any sort of movements might be tedious work because only for a short length of the video there would be any motion. There would be a lot of time wasted in analyzing the video and also it is impossible to always find the accurate frame where the transition has occurred. So there is a need in obtaining a summary video that captures any changes/motion. With the advancements in image processing using OpenCV and deep learning, video summarization is no longer an impossible work. Captions are generated for the summarized videos using an encoder–decoder captioning model. With the help of large, well-labeled video data sets like common objects in context, Microsoft video description, video captioning is a feasible task. Encoder–decoder models are used extensively to extract text from visual features with the arrival of long short term memory (LSTM). Attention mechanism has been widely used on decoder for the work of video captioning. Keyframes are obtained from very long videos using methods like dynamic mode decomposition, an algorithm in fluid dynamics, OpenCV’s absdiff(). We propose these tools for motion detection and video/image captioning for very long videos which are common in video surveillance. © 2021, Bharati Vidyapeeth's Institute of Computer Applications and Management.
