Performance Comparison of Transformers and Convolutional Neural Networks (CNNs) Based Architecture on Endoscopy Images

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Institute of Electrical and Electronics Engineers Inc.

Abstract

Endoscopy is a widely used medical imaging technique for diagnosing gastrointestinal (GI) disorders. It allows doctors to visualize the internal organs without performing major surgery. The endoscope contains light and a camera, which allows the medical practitioner to see the lining of the digestive tract on a monitor. Wireless Capsule Endoscopy (WCE) which is a nonsurgical procedure, uses a capsule with embedded camera to take images, traditional endoscopy approach allows for a more targeted examination of specific areas of the digestive tract provides detailed and high-resolution images. In this study, we introduce a novel approach for the classification of gastrointestinal (GI) images using the hybrid CNN-ViT architecture, a state-of-the-art architectures to learn the global features of medical images, and comparing results of hybrid architecture with Convolutional Neural Networks (CNNs) and Vision Transformers (ViT). This study focuses on classification of six distinct classes of endoscopy images, namely Ulcerative colitis, Polyp, Esophagitis, Angioectasia, Erosion, and Normal, capturing various states of gastrointestinal conditions and achieving test accuracy, precision, recall and F1 score of 97.91%, 98.01%, 97.91%, and 97.92% respectively. © 2024 IEEE.

Description

Keywords

Deep Learning, Medical image, Vision Transformer

Citation

Proceedings of CONECCT 2024 - 10th IEEE International Conference on Electronics, Computing and Communication Technologies, 2024, Vol., , p. -

Endorsement

Review

Supplemented By

Referenced By