Browsing by Author "Sreenivas, T.V."
Now showing 1 - 6 of 6
- Results Per Page
- Sort Options
Item Comparison of low-dimension speech segment embeddings: Application to speaker diarization(2019) Chetupalli, S.R.; Sreenivas, T.V.; Gopalakrishnan, A.Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings. � 2019 IEEE.Item Comparison of low-dimension speech segment embeddings: Application to speaker diarization(Institute of Electrical and Electronics Engineers Inc., 2019) Chetupalli, S.R.; Sreenivas, T.V.; Gopalakrishnan, A.Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings. © 2019 IEEE.Item Feature selection and model optimization for semi-supervised speaker spotting(2016) Chetupalli, S.R.; Gopalakrishnan, A.; Sreenivas, T.V.We explore, experimentally, feature selection and optimization of stochastic model parameters for the problem of speaker spotting. Based on an initially identified segment of speech of a speaker, an iterative model refinement method is developed along with a latent variable mixture model so that segments of the same speaker are identified in a long speech record. It is found that a GMM with moderate number of mixtures is better suited for the task than a large number mixture model as used in speaker identification. Similarly, a PCA based low-dimensional projection of MFCC based feature vector provides better performance. We show that about 6 seconds of initially identified speaker data is sufficient to achieve > 90% performance of speaker segment identification. � 2016 IEEE.Item Feature selection and model optimization for semi-supervised speaker spotting(European Signal Processing Conference, EUSIPCO, 2016) Chetupalli, S.R.; Gopalakrishnan, A.; Sreenivas, T.V.We explore, experimentally, feature selection and optimization of stochastic model parameters for the problem of speaker spotting. Based on an initially identified segment of speech of a speaker, an iterative model refinement method is developed along with a latent variable mixture model so that segments of the same speaker are identified in a long speech record. It is found that a GMM with moderate number of mixtures is better suited for the task than a large number mixture model as used in speaker identification. Similarly, a PCA based low-dimensional projection of MFCC based feature vector provides better performance. We show that about 6 seconds of initially identified speaker data is sufficient to achieve > 90% performance of speaker segment identification. © 2016 IEEE.Item Low complexity bit allocation algorithms for MP3/AAC encoding(2008) Nithin, S.; Suresh, K.; Sreenivas, T.V.We have developed two reduced complexity bit-allocation algorithms for MP3/AAC based audio encoding, which can be useful at low bit-rates. One algorithm derives optimum bit-allocation using constrained optimization of weighted noise-to-mask ratio and the second algorithm uses decoupled iterations for distortion control and rate control, with convergence criteria. MUSHRA based evaluation indicated that the new algorithm would be comparable to AAC but requiring only about 1/10 th the complexity.Item Low complexity bit allocation algorithms for MP3/AAC encoding(2008) Nithin, S.; Suresh, K.; Sreenivas, T.V.We have developed two reduced complexity bit-allocation algorithms for MP3/AAC based audio encoding, which can be useful at low bit-rates. One algorithm derives optimum bit-allocation using constrained optimization of weighted noise-to-mask ratio and the second algorithm uses decoupled iterations for distortion control and rate control, with convergence criteria. MUSHRA based evaluation indicated that the new algorithm would be comparable to AAC but requiring only about 1/10 th the complexity.
