Comparison of the Multiple Imputation Approaches for Imputing Rainfall Data: A Humid Tropical River Basin Case Study
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Springer Nature
Abstract
Accurate rainfall data is crucial for agriculture, hydrology, and climate research as it guides water management, crop planning, and disaster preparedness. Missing data affects reliability, requiring effective imputation. The purpose of this study is to address the critical challenge of imputing missing daily rainfall data, which is especially important given rainfall’s nonlinear distribution and variability in missingness patterns. This research aims to develop and evaluate advanced imputation algorithms to improve data completeness and integrity in humid tropical regions. This study evaluates ten imputation algorithms: K-nearest neighbors (KNN), classification and regression trees (CART), predictive mean matching (pmm), random forest (rf), mean method, and Bayesian methods (norm. boot, lasso. norm, norm, norm. nob, midastouch) for addressing missing daily rainfall data. Using 37 years of data from thirteen stations in the Kali River Basin, the methods leverage descriptive statistics to enhance accuracy in humid tropical regions. The proposed algorithms incorporate descriptive statistics of the rainfall time series and are evaluated using 37 years of daily data from thirteen selected rainfall stations in the Kali River Basin, a humid tropical region. Model performance was assessed at four missingness levels (1%, 5%, 10%, and 20%) and evaluated with accuracy metrics root mean square error (RMSE), mean absolute error (MAE), index of agreement (d), and RMSE-observations standard deviation ratio (RSR). Among the evaluated methods, KNN consistently demonstrated superior performance across all levels of missingness (RMSE = 13.22 to 15.42; MAE = 4.68 to 6.08; d = 0.87 to 0.90; RSR = 0.57 to 0.61), followed closely by CART (RMSE = 16.48 to 20.77; MAE = 6.20 to 8.31; d = 0.81 to 0.86; RSR = 0.71 to 0.79). Overall, KNN, CART, pmm, and rf emerged as reliable methods for imputing missing rainfall data of varying lengths, contributing to more accurate weather forecasting and climate change analyses. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
Description
Keywords
Bayesian networks, Crops, Data accuracy, Data reduction, Decision trees, Disaster prevention, Information management, Nearest neighbor search, Rain, Random forests, Regression analysis, Rivers, Time series, Tropical engineering, Tropics, Classification trees, Data quality, Imputation methods, K-near neighbor, Missing data, Nearest-neighbour, Performance, Performance score, Rainfall data, Regression trees, Mean square error, algorithm, comparative study, data quality, machine learning, nearest neighbor analysis, rainfall, river basin, spatial distribution, tropical region, Sharda Basin
Citation
Water Conservation Science and Engineering, 2025, 10, 2, pp. -
