Optimizing Feature Selection in Big Data: A Hybrid Spark and Fuzzy Approach

dc.contributor.authorHada, A.S.
dc.contributor.authorSahoo, G.S.
dc.contributor.authorVamsi, C.K.
dc.contributor.authorHegde, A.
dc.contributor.authorBhowmik, B.
dc.date.accessioned2026-02-06T06:33:38Z
dc.date.issued2024
dc.description.abstractThe exponential growth of big data presents both immense opportunities and significant challenges. While vast datasets hold the key to unlocking groundbreaking insights, efficiently extracting value requires sophisticated feature selection techniques. Traditional methods often struggle with the sheer volume and complexity of big data. This paper addresses this challenge by proposing a novel hybrid feature selection algorithm by leveraging Apache PySpark's distributed computing power. Combining a robust feature selection technique with a novel weighting scheme, our method outperforms existing hypercuboid and fuzzy Rough Set methods. The hybrid approach achieves superior accuracy of 72.1% with a reduced feature set, demonstrating its effectiveness in identifying salient features for big data analysis. © 2024 IEEE.
dc.identifier.citationCOSMIC 2024 - IEEE International Conference on Computing, Semiconductor, Mechatronics, Intelligent Systems and Communications, Proceedings, 2024, Vol., , p. 195-199
dc.identifier.urihttps://doi.org/10.1109/COSMIC63293.2024.10871408
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/28771
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.subjectApache PySpark
dc.subjectBig Data
dc.subjectFuzzy Rough Set
dc.subjectHybrid Feature Selection
dc.subjectHypercuboid Method
dc.titleOptimizing Feature Selection in Big Data: A Hybrid Spark and Fuzzy Approach

Files