Constructing an enriched domain taxonomy for Hindi using word embeddings

Thumbnail Image

Date

2018

Authors

Keshava, V.
Avvara, P.
Sowmya, Kamath S.
Geetha, V.

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Domain-specific taxonomies constitute a valuable resource as they offer extensive support in information retrieval related activities like browsing, searching, recommendations and personalization. Such taxonomies can bridge the gap between the lack of domain-specific querying knowledge in potential users and the actual content. In case of multilingual content, taxonomies can play a pivotal role in boosting search performance for content across language barriers. In this paper, a domain-agnostic framework for building an evolving, domain-specific taxonomy for the Hindi, given a set of well-organized data points is proposed. The approach is intended for designing a hierarchical taxonomy enriched with synonyms and other morphological variants using WordNet and Word2vec models respectively. The hierarchical structure acts as a base which binds the taxonomy to a given domain. Such enrichment can improve taxonomy coverage within the given domain. The focus is also on building a taxonomy that can self-evolve over time, with high precision and recall, with minimal manual effort. � 2017 IEEE.

Description

Keywords

Citation

Proceedings of the 2017 International Conference on Asian Language Processing, IALP 2017, 2018, Vol.2018-January, , pp.127-130

Endorsement

Review

Supplemented By

Referenced By