Conference Papers
Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506
Browse
2 results
Search Results
Item Multi-Vehicle Tracking and Speed Estimation Model using Deep Learning(Association for Computing Machinery, 2022) Prajwal, K.; Navaneeth, P.; Tharun, K.; Anand Kumar, M.A.Speed estimation of vehicles is one of the prime application of speed estimation of moving objects. The YOLOv5 model has proven to have a very good accuracy in detecting moving objects in real-time. The vehicles on the road are extracted from each frame of the video by running it through a custom YOLOv5 object detector. The YOLO model splits the frame into a grid and each grid detects a vehicle within itself. An instance identifier tracks the vehicle across the frames. The tracking algorithm computes deep features for every bounding box and utilizes the similarities within the deep features to identify and track the object. The pixel per meter metric has to adjusted based on perspective after which the speed of the vehicle can be estimated. Finally a comparison of our model metrics with the existing state of the art models is provided. © 2022 ACM.Item Language Detection in Overlapping Multilingual Speech: A Focus on Indian Languages(Institute of Electrical and Electronics Engineers Inc., 2025) Kolsur, A.A.; Prajwal, K.; Vijayasenan, D.The growing demand for technology capable of recognizing spoken languages and extracting information from real-world audio, especially in scenarios with overlapping speech, has become a significant focus of research due to its essential role in improving global connectivity and accessibility. In our paper, we focus on identifying languages present in audio files that consist of overlapping speech. We have focused our research particularly on Indian languages, as there is limited research on identifying low-resource languages in overlapping speech. In this paper, we have synthesized a custom dataset from the VoxLingua107 dataset due to the lack of overlapping Indian speech data. Further, we have developed a novel solution that first separates the overlapped audio using a speaker separation model and then uses a language recognition model to detect the languages present in the separated audio. We have compared the results obtained through our method with the current state-of-the-art model, Whisper, and concluded that our solution significantly outperforms the Whisper model. The results highlight the potential for significant improvements in multilingual communication systems and speech processing applications, paving the way for more inclusive and accurate language recognition technologies. © 2025 IEEE.
