Faculty Publications

Search Results

Now showing 1 - 2 of 2

Long Short Term Memory Networks for Lexical Normalization of Tweets
(Institute of Electrical and Electronics Engineers Inc., 2021) Nayak, P.; Praueeth, G.; Kulkarni, R.; Anand Kumar, M.
Lexical normalization is converting a non-standard text into a standard text that is more readable and universal. Data obtained from social media sites and tweets often contain much noise and use non-canonical sentence structures such as non-standard abbrevlatlons, skipping of words, spelling errors, etc. Hence such data needs to be appropriately processed before it can be used. The processing can be done by lexical normalization, which reduces randomness and converts the sentence structure to a predefined standard. Hence. lexical normalization can help in improving the performance of systems that use user-generated text as inputs. There are several ways to perform lexical normalization, such as dictionary lookups, most frequent replacements, etc. However, VVe aim to explore the domain of deep learning to find approaches that can be used to normalize texts lexically. Â© 2021 IEEE.
OntoPred: An Efficient Attention-Based Approach for Protein Function Prediction Using Skip-Gram Features
(Springer, 2023) Chintawar, S.; Kulkarni, R.; Patil, N.
Proteins play an essential role in performing many cellular functions in organisms and are responsible for various biochemical activities. The main objective of this task of protein function prediction is to annotate protein sequences with their correct functions, which are represented by Gene Ontology (GO) terms. Recently, the number of new proteins released has been increasing. As the experimental approach of annotating these proteins is very time-consuming, the need for faster annotation techniques has arisen. Approaches using deep learning and machine learning have been shown to be beneficial in this regard. In this research, we propose a novel approach, OntoPred, for the task of function prediction which makes use of the standalone protein sequences and annotates them with their corresponding functions (GO terms). The core idea is to use an attention mechanism to identify which parts of a sequence influence the presence of a function. The model uses a combination of n-grams and skip-gram features extracted from the sequences. The proposed model was evaluated on multiple datasets including the CAFA3 evaluation benchmark. The maximal F1 scores obtained on molecular function (MF), biological process (BP), and cellular component (CC) aspect on the CAFA3 evaluation benchmark are 0.494, 0.480, and 0.637 respectively. © 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.

Faculty Publications

Browse

Filters

Settings

Sort By

Results per page

Search Results