Transformer assisted framework for automated multi-class abnormality classification for video capsule endoscopy

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Physics

Abstract

Video Capsule Endoscopy (VCE) is a minimally invasive imaging technique used for diagnosing gastrointestinal (GI) disorders, enabling detailed visualization of the digestive tract. This study introduces CASCRNet, a novel and parameter-efficient deep learning architecture designed to enhance interpretability and computational efficiency in multi-class abnormality classification for VCE. CASCRNet integrates focal loss, Atrous Spatial Pyramid Pooling, and Shared Channel Residual blocks to improve feature extraction and address class imbalance. In addition to CASCRNet, this study conducts a comprehensive evaluation of several deep learning models, including ResNet50, DenseNet121, RCCGNet, Hiera, and AIMv2. Among these, AIMv2, a fine-tuned transformer-based model, achieved the highest overall performance, serving as a new benchmark for accuracy. The proposed framework demonstrates robust results on the Capsule Vision 2024 dataset and highlights the potential of both lightweight and transformer-based solutions to improve diagnostic efficiency and clinical workflow in gastrointestinal imaging. © 2025 IOP Publishing Ltd. All rights, including for text and data mining, AI training, and similar technologies, are reserved.

Description

Keywords

Benchmarking, Classification (of information), Computational efficiency, Computer aided diagnosis, Data mining, Deep learning, Learning systems, Medical imaging, Automated medical diagnose, Digestive tract, Gastrointestinal disorders, Gastrointestinal imaging, Interpretability, Learning architectures, Minimally invasive imaging, Multi-class abnormality classification, Video capsule endoscopies, Video capsule endoscopy, Endoscopy

Citation

Engineering Research Express, 2025, 7, 4, pp. -

Collections

Endorsement

Review

Supplemented By

Referenced By