Faculty Publications

Permanent URI for this communityhttps://idr.nitk.ac.in/handle/123456789/18736

Publications by NITK Faculty

Browse

Search Results

Now showing 1 - 5 of 5
  • Item
    Estimating multiple physical parameters from speech data
    (IEEE Computer Society help@computer.org, 2016) Kalluri, S.B.; Vijayakumar, A.; Vijayasenan, D.; Singh, R.
    In this work, we explore prediction of different physical parameters from speech data. We aim to predict shoulder size and waist size of people from speech data in addition to the conventional height and weight parameters. A data-set with this information is created from 207 volunteers. A bag of words representation based on log magnitude spectrum is used as features. A support vector regression predicts the physical parameters from the bag of the words representation. The system is able to achieve a root mean square error of 6.6 cm for height estimation, 2.6cm for shoulder size, 7.1cm for waist size and 8.9 kg for weight estimation. The results of height estimation is on par with state of the art results. © 2016 IEEE.
  • Item
    Robust features for automatic estimation of physical parameters from speech
    (Institute of Electrical and Electronics Engineers Inc., 2017) Kalluri, K.S.; Vijayasenan, D.
    Estimating speaker's physical parameters like height, weight and shoulder size can assist in voice forensics by providing additional knowledge about the speaker. In this work, statistics of the components of background GMM are employed as features in estimating the physical parameters. These features improved the performance of height and shoulder size estimation as compared to our earlier attempt based on a Bag of Word representation. The robustness of the features is validated using two different training subsets containing different languages. © 2017 IEEE.
  • Item
    Nisp: A multi-lingual multi-accent dataset for speaker profiling
    (Institute of Electrical and Electronics Engineers Inc., 2021) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.; Rajan, M.; Krishnan, P.
    Many commercial and forensic applications of speech demand the extraction of information about the speaker characteristics, which falls into the broad category of speaker profiling. The speaker characteristics needed for profiling include physical traits of the speaker like height, age, and gender of the speaker along with the native language of the speaker. Many of the datasets available have only partial information for speaker profiling. In this paper, we attempt to overcome this limitation by developing a new dataset which has speech data from five different Indian languages along with English. The metadata information for speaker profiling applications like linguistic information, regional information, and physical characteristics of a speaker are also collected. We call this dataset as NITK-IISc Multilingual Multi-accent Speaker Profiling (NISP) dataset. The description of the dataset, potential applications, and baseline results for speaker profiling on this dataset are provided in this paper. © 2021 IEEE.
  • Item
    Automatic speaker profiling from short duration speech data
    (Elsevier B.V., 2020) Kalluri, S.B.; Vijayasenan, D.; Ganapathy, S.
    Many paralinguistic applications of speech demand the extraction of information about the speaker characteristics from as little speech data as possible. In this work, we explore the estimation of multiple physical parameters of the speaker from the short duration of speech in a multilingual setting. We explore different feature streams for age and body build estimation derived from the speech spectrum at different resolutions, namely – short-term log-mel spectrogram, formant features and harmonic features of the speech. The statistics of these features over the speech recording are used to learn a support vector regression model for speaker age and body build estimation. The experiments performed on the TIMIT dataset show that each of the individual features is able to achieve results that outperform previously published results in height and age estimation. Furthermore, the estimation errors from these different feature streams are complementary, which allows the combination of estimates from these feature streams to further improve the results. The combined system from short audio snippets achieves a performance of 5.2 cm, and 4.8 cm in Mean Absolute Error (MAE) for male and female respectively for height estimation. Similarly in age estimation the MAE is of 5.2 years, and 5.6 years for male, and female speakers respectively. We also extend the same physical parameter estimation to other body build parameters like shoulder width, waist size and weight along with height on a dataset we collected for speaker profiling. The duration analysis of the proposed scheme shows that the state of the art results can be achieved using only around 1–2 s of speech data. To the best of our knowledge, this is the first attempt to use a common set of features for estimating the different physical traits of a speaker. © 2020 Elsevier B.V.
  • Item
    Influence of V2O5 addition as a dopant and dispersed content in barium borophosphate glass on structural and optical properties
    (Elsevier Ltd, 2024) Rashmi, I.; Ingle, A.; Raghuvanshi, V.; Shashikala, H.D.; Nagaraja, H.S.
    The Barium Borophosphate glass system with molar compositions 40P2O5– 25B2O3-(35-x) BaO-xV2O5 and 40P2O5–25B2O3–35BaO-xV2O5 (x = 0,1,3,5 mol%) was synthesized using melt-quenching method. A comprehensive investigation of the structural and optical properties was conducted to compare the effects of V2O5 as a dopant and as an addition to the glass matrix. The physical parameters were assessed through the measurement of density. The influence of V2O5 introduction on vibrational modes was studied through Fourier-transform infrared (FTIR) and Raman spectroscopy. The UV–visible absorbance analysis unveiled the existence of multiple valence states of vanadium (V3+, V4+ and V5+). The reduction in bandgap was determined through the utilization of a Tauc plot, while the measurement of the refractive index allowed for the assessment of its variation with the composition of V2O5. Photoluminescence spectroscopy (PL) was employed to explore the presence of intrinsic defects within the glass matrix and the impact of V2O5 on the emission spectra. Furthermore, CIE chromaticity coordinates of synthesized samples were observed in both the white and blue regions, suggesting their potential application in display devices. Significantly, V2O5 glass doped with 1 mol% displayed chromaticity, characterized by CIE coordinates x = 0.288 and y = 0.386, closely matching the white region as well as the bandpass filter. The introduction of transition metal oxide dopants into borophosphate glass yielded exceptional emission properties. The ability to modify optical properties makes it more promising for these glass materials, particularly for applications like optical filters and displays. © 2024 Elsevier Ltd and Techna Group S.r.l.