Conference Papers

Permanent URI for this collectionhttps://idr.nitk.ac.in/handle/123456789/28506

Browse

Search Results

Now showing 1 - 5 of 5

Factor analysis methods for joint speaker verification and spoof detection
(Institute of Electrical and Electronics Engineers Inc., 2017) Dhanush, B.K.; Suparna, S.; Aarthy, R.; Likhita, C.; Shashank, D.; Harish, H.; Ganapathy, S.
The performance of a speaker verification system is severely degraded by spoofing attacks generated from artificial speech synthesizers. Recently, several approaches have been proposed for classifying natural and synthetic speech (spoof detection) which can be used in conjunction with a speaker verification system. In this paper, we attempt to develop a joint modelling approach which can detect the presence of spoofing attacks while also performing the speaker verification task. We propose a factor modelling approach where the spoof variability subspace and the speaker variability subspace are jointly trained. The lower dimensional projections in these subspaces are used for speaker verification as well as spoof detection tasks. We also investigate the benefits of linear discriminant analysis (LDA), widely used in speaker recognition, for the spoof detection task. Several experiments are performed using the speaker and spoofing (SAS) database. For speaker verification, we compare the performance of the proposed method with a baseline method of fusing a conventional speaker verification system and a spoof detection system. In these experiments, the proposed approach provides substantial improvements for spoof detection (relative improvements of 20% in EER over the baseline) as well as speaker verification under spoofing conditions (relative improvements of 40% in EER over the baseline). Â© 2017 IEEE.
A Deep Neural Network Based End to End Model for Joint Height and Age Estimation from Short Duration Speech
(Institute of Electrical and Electronics Engineers Inc., 2019) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.
Automatic height and age prediction of a speaker has a wide variety of applications in speaker profiling, forensics etc. Often in such applications only a few seconds of speech data is available to reliably estimate the speaker parameters. Traditionally, age and height were predicted separately using different estimation algorithms. In this work, we propose a unified DNN architecture to predict both height and age of a speaker for short durations of speech. A novel initialization scheme for the deep neural architecture is introduced, that avoids the requirement for a large training dataset. We evaluate the system on TIMIT dataset where the mean duration of speech segments is around 2.5s. The DNN system is able to improve the age RMSE by at least 0.6 years as compared to a conventional support vector regression system trained on Gaussian Mixture Model mean supervectors. The system achieves an RMSE error of 6.85 and 6.29 cm for male and female height prediction. In case of age estimation, the RMSE errors are 7.60 and 8.63 years for male and female respectively. Analysis of shorter speech segments reveals that even with 1 second speech input the performance degradation is at most 3% compared to the full duration speech files. Â© 2019 IEEE.
Nisp: A multi-lingual multi-accent dataset for speaker profiling
(Institute of Electrical and Electronics Engineers Inc., 2021) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.; Rajan, M.; Krishnan, P.
Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by developing a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. Â© 2021 IEEE.
The DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
(International Speech Communication Association, 2023) Baghel, S.; Ramoji, S.; Sidharth; Ranjana, H.; Singh, P.; Jain, S.; Chowdhuri, P.R.; Kulkarni, K.; Padhi, S.; Vijayasenan, D.; Ganapathy, S.
In multilingual societies, social conversations often involve code-mixed speech. The current speech technology may not be well equipped to extract information from multi-lingual multispeaker conversations. The DISPLACE challenge entails a first-of-kind task to benchmark speaker and language diarization on the same data, as the data contains multi-speaker conversations in multilingual code-mixed speech. The challenge attempts to highlight outstanding issues in speaker diarization (SD) in multilingual settings with code-mixing. Further, language diarization (LD) in multi-speaker settings also introduces new challenges, where the system has to disambiguate speaker switches with code switches. For this challenge, a natural multilingual, multi-speaker conversational dataset is distributed for development and evaluation purposes. The systems are evaluated on single-channel far-field recordings. We also release a baseline system and report the highlights of the system submissions. Â© 2023 International Speech Communication Association. All rights reserved.
The Second DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
(International Speech Communication Association, 2024) Kalluri, S.B.; Singh, P.; Roy Chowdhuri, P.; Kulkarni, A.; Baghel, S.; Hegde, P.; Sontakke, S.; Deepak, K.T.; Mahadeva Prasanna, S.R.; Vijayasenan, D.; Ganapathy, S.
The DIarization of SPeaker and LAnguage in Conversational Environments (DISPLACE) 2024 challenge is the second in the series of DISPLACE challenges, which involves tasks of speaker diarization (SD) and language diarization (LD) on a challenging multilingual conversational speech dataset. In the DISPLACE 2024 challenge, we also introduced the task of automatic speech recognition (ASR) on this dataset. The dataset containing 158 hours of speech, consisting of both supervised and unsupervised mono-channel far-field recordings, was released for LD and SD tracks. Further, 12 hours of close-field mono-channel recordings were provided for the ASR track conducted on 5 Indian languages. The details of the dataset, baseline systems and the leader board results are highlighted in this paper. We have also compared our baseline models and the team's performances on evaluation data of DISPLACE-2023 to emphasize the advancements made in this second version of the challenge. Â© 2024 International Speech Communication Association. All rights reserved.

Conference Papers

Browse

Filters

Settings

Sort By

Results per page

Search Results