Comparison of the Multiple Imputation Approaches for Imputing Rainfall Data: A Humid Tropical River Basin Case Study

dc.contributor.authorKumar, G.P.
dc.contributor.authorDwarakish, G.S.
dc.date.accessioned2026-02-03T13:19:34Z
dc.date.issued2025
dc.description.abstractAccurate rainfall data is crucial for agriculture, hydrology, and climate research as it guides water management, crop planning, and disaster preparedness. Missing data affects reliability, requiring effective imputation. The purpose of this study is to address the critical challenge of imputing missing daily rainfall data, which is especially important given rainfall’s nonlinear distribution and variability in missingness patterns. This research aims to develop and evaluate advanced imputation algorithms to improve data completeness and integrity in humid tropical regions. This study evaluates ten imputation algorithms: K-nearest neighbors (KNN), classification and regression trees (CART), predictive mean matching (pmm), random forest (rf), mean method, and Bayesian methods (norm. boot, lasso. norm, norm, norm. nob, midastouch) for addressing missing daily rainfall data. Using 37 years of data from thirteen stations in the Kali River Basin, the methods leverage descriptive statistics to enhance accuracy in humid tropical regions. The proposed algorithms incorporate descriptive statistics of the rainfall time series and are evaluated using 37 years of daily data from thirteen selected rainfall stations in the Kali River Basin, a humid tropical region. Model performance was assessed at four missingness levels (1%, 5%, 10%, and 20%) and evaluated with accuracy metrics root mean square error (RMSE), mean absolute error (MAE), index of agreement (d), and RMSE-observations standard deviation ratio (RSR). Among the evaluated methods, KNN consistently demonstrated superior performance across all levels of missingness (RMSE = 13.22 to 15.42; MAE = 4.68 to 6.08; d = 0.87 to 0.90; RSR = 0.57 to 0.61), followed closely by CART (RMSE = 16.48 to 20.77; MAE = 6.20 to 8.31; d = 0.81 to 0.86; RSR = 0.71 to 0.79). Overall, KNN, CART, pmm, and rf emerged as reliable methods for imputing missing rainfall data of varying lengths, contributing to more accurate weather forecasting and climate change analyses. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
dc.identifier.citationWater Conservation Science and Engineering, 2025, 10, 2, pp. -
dc.identifier.issn23663340
dc.identifier.urihttps://doi.org/10.1007/s41101-025-00403-x
dc.identifier.urihttps://idr.nitk.ac.in/handle/123456789/20147
dc.publisherSpringer Nature
dc.subjectBayesian networks
dc.subjectCrops
dc.subjectData accuracy
dc.subjectData reduction
dc.subjectDecision trees
dc.subjectDisaster prevention
dc.subjectInformation management
dc.subjectNearest neighbor search
dc.subjectRain
dc.subjectRandom forests
dc.subjectRegression analysis
dc.subjectRivers
dc.subjectTime series
dc.subjectTropical engineering
dc.subjectTropics
dc.subjectClassification trees
dc.subjectData quality
dc.subjectImputation methods
dc.subjectK-near neighbor
dc.subjectMissing data
dc.subjectNearest-neighbour
dc.subjectPerformance
dc.subjectPerformance score
dc.subjectRainfall data
dc.subjectRegression trees
dc.subjectMean square error
dc.subjectalgorithm
dc.subjectcomparative study
dc.subjectdata quality
dc.subjectmachine learning
dc.subjectnearest neighbor analysis
dc.subjectrainfall
dc.subjectriver basin
dc.subjectspatial distribution
dc.subjecttropical region
dc.subjectSharda Basin
dc.titleComparison of the Multiple Imputation Approaches for Imputing Rainfall Data: A Humid Tropical River Basin Case Study

Files

Collections