Comparison of low-dimension speech segment embeddings: Application to speaker diarization

dc.contributor.author	Chetupalli, S.R.
dc.contributor.author	Sreenivas, T.V.
dc.contributor.author	Gopalakrishnan, A.
dc.date.accessioned	2026-02-06T06:37:29Z
dc.date.issued	2019
dc.description.abstract	Segment clustering is a crucial step in unsupervised speaker diarization. Bottom-up approaches, such as, hierarchical agglomerative clustering technique are used traditionally for segment clustering. In this paper, we consider the top-down approach to clustering, in which a speaker sensitive, low-dimensional representation of segments (speaker space) is obtained first, followed by Gaussian mixture model (GMM) based clustering. We explore three methods of obtaining the low dimension segment representation: (i) multi-dimensional scaling (MDS) based on segment to segment stochastic distances; (ii) traditional principal component analysis (PCA), and (iii) factor analysis (i-vectors), of GMM mean super-vectors. We found that, MDS based embeddings result in better representation and hence result in better diarization performance compared to PCA and even i-vector embeddings. Â© 2019 IEEE.
dc.identifier.citation	25th National Conference on Communications, NCC 2019, 2019, Vol., , p. -
dc.identifier.uri	https://doi.org/10.1109/NCC.2019.8732210
dc.identifier.uri	https://idr.nitk.ac.in/handle/123456789/31098
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.title	Comparison of low-dimension speech segment embeddings: Application to speaker diarization

Collections