Smart clinical decision support system for Identification of Novel Features from Cancer Data Sets
| dc.contributor.author | Koppad, Saraswati | |
| dc.date.accessioned | 2026-01-22T11:45:00Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | Colorectal cancer (CRC) ranks as the third leading cause of cancer-related deaths globally and stands as the fourth most frequently diagnosed cancer. Despite its prevalence, the lack of diagnostic biomarkers and a comprehensive understanding of its molecular mechanisms contribute to an escalating mortality rate. The dynamic nature of CRC occurrence and progression, coupled with varying expression levels of specific molecules across different stages, compounds early detection and diagnosis challenges, underscoring the urgency for precise and reliable CRC biomarkers. Advancements in high-throughput sequencing have facilitated exploration into novel gene expressions, targeted therapies, and the pathogenesis of colon cancer. These endeavors yield large datasets, increasingly reliant on machine learning (ML) algorithms for efficient analysis of variables across high-dimensional datasets. However, handling such data presents challenges in storage, analysis, and sharing, which are being addressed through innovations in computational technologies, particularly in cloud computing. This work comprehensively reviews advanced cloud-based and big data technologies for omics data processing and analysis, comparing them across various technical aspects such as data size, cost, applications, advantages, and limitations. Drawing insights from this review, a cloud-based framework was developed utilizing big data systems to analyze cancer-based multi-omics datasets from diverse sources. This work involves an ML-based experimental design to study CRC gene associations, employing six different machine-learning methods as classifiers. Evaluation across various combinations of training and test datasets using different algorithms revealed that random forest models consistently outperformed others in terms of accuracy and area under receiver operating characteristic (AUROC) curve. Subsequently, 34 genes were identified for pathway and gene set enrichment analysis, with further mapping revealing interesting miRNA hub genes. These genes represent a diagnostic panel for CRC, identified with high accuracy. Furthermore, artificial intelligence has demonstrated its worth in diagnosing CRC at its initial phase through the analysis of clinical imaging and data. Computer-aided detection systems assist clinicians in identifying suspicious lesions during colonoscopies or CT scans, potentially enhancing early-stage tumor detection. This research uses a transfer learning-based model for CRC prediction, combining Convolutional Neural Networks (CNN) and VGG16. The VGG16 serves as the base model with its convolutional layer extracting features for classification. Comparative analysis with CNN demonstrates approximately 6% higher accuracy with the proposed model. Conducting high-throughput experiments often requires target validation, typically relying on association. Data-driven approaches offer the means to discern causal relationships between molecules or features, facilitating the systematic discovery of molecules pivotal in cancer development and progression while also aiding in understanding underlying disease mechanisms. This study employs structural equation modeling (SEM) to explore the interrelationship among 34 genes identified in this work. Structural relationships among these genes are examined, and models are tested using confirmatory factor and path analyses. Various fit indices, including RMSEA, CFI, NFI, GFI, and TLI, indicate a reasonably good fit. Hence, the proposed models can infer potential relationships between latent and observed variables. Through SEM, 34 genes were identified as implicated in specific biological processes or pathways, suggesting a significant direct impact on colorectal cancer. The integration of SEM with other biological evidence offers a robust approach to identifying causal genes. | |
| dc.identifier.uri | https://idr.nitk.ac.in/handle/123456789/18742 | |
| dc.language.iso | en | |
| dc.publisher | National Institute of Technology Karnataka, Surathkal. | |
| dc.subject | Colorectal cancer | |
| dc.subject | Cancer omics data sets | |
| dc.subject | Cancer candidate | |
| dc.subject | diagnostic genes | |
| dc.subject | Big data in Cancer research | |
| dc.title | Smart clinical decision support system for Identification of Novel Features from Cancer Data Sets | |
| dc.type | Thesis |
