Performance Comparison of Transformers and Convolutional Neural Networks (CNNs) Based Architecture on Endoscopy Images
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Endoscopy is a widely used medical imaging technique for diagnosing gastrointestinal (GI) disorders. It allows doctors to visualize the internal organs without performing major surgery. The endoscope contains light and a camera, which allows the medical practitioner to see the lining of the digestive tract on a monitor. Wireless Capsule Endoscopy (WCE) which is a nonsurgical procedure, uses a capsule with embedded camera to take images, traditional endoscopy approach allows for a more targeted examination of specific areas of the digestive tract provides detailed and high-resolution images. In this study, we introduce a novel approach for the classification of gastrointestinal (GI) images using the hybrid CNN-ViT architecture, a state-of-the-art architectures to learn the global features of medical images, and comparing results of hybrid architecture with Convolutional Neural Networks (CNNs) and Vision Transformers (ViT). This study focuses on classification of six distinct classes of endoscopy images, namely Ulcerative colitis, Polyp, Esophagitis, Angioectasia, Erosion, and Normal, capturing various states of gastrointestinal conditions and achieving test accuracy, precision, recall and F1 score of 97.91%, 98.01%, 97.91%, and 97.92% respectively. © 2024 IEEE.
Description
Keywords
Deep Learning, Medical image, Vision Transformer
Citation
Proceedings of CONECCT 2024 - 10th IEEE International Conference on Electronics, Computing and Communication Technologies, 2024, Vol., , p. -
