Journals Information
Mathematics and Statistics Vol. 14(1), pp. 130 - 146
DOI: 10.13189/ms.2026.140112
Reprint (PDF) (1952Kb)
Fusion Sampling Validation in Data Partitioning for Machine Learning
Christopher Godwin Udomboso 1, Caston Sigauke 2,*, Ini Adinya 3
1 Department of Statistics, University of Ibadan, Nigeria
2 Department of Mathematical and Computational Sciences, University of Venda, South Africa
3 Department of Mathematics, University of Ibadan, Nigeria
ABSTRACT
Effective data partitioning is known to be crucial in machine learning. Traditional cross-validation methods like K-fold Cross-Validation (KFCV) enhance model robustness but often compromise generalisation assessment due to high computational demands and extensive data shuffling. To address these issues, the integration of the Simple Random Sampling (SRS), which, despite providing representative samples, can result in non-representative sets with imbalanced data. The study introduces a hybrid model, Fusion Sampling Validation (FSV), combining SRS and KFCV to optimise data partitioning. FSV aims to minimise biases and merge the simplicity of SRS with the accuracy of KFCV. The study used three datasets of 10,000, 50,000, and 100,000 samples, generated with a normal distribution (mean 0, variance 1) and initialised with seed 42. KFCV was performed with five folds and ten repetitions, incorporating a scaling factor to ensure robust performance estimation and generalisation capability. FSV integrated a weighted factor to further enhance performance and generalisation. Evaluations focused on mean estimates (ME), variance estimates (VE), mean squared error (MSE), bias, the rate of convergence for mean estimates (ROC ME), and the rate of convergence for variance estimates (ROC VE). The results indicated that FSV consistently outperformed SRS and KFCV, with ME values of 0.000863, VE of 0.949644, MSE of 0.952127, bias of 0.016288, ROC ME of 0.005199, and ROC VE of 0.007137. These results demonstrate the superior accuracy (reduced error and bias) and reliability of FSV (stable performance with increasing sample size and repeated trials). Moreover, since FSV reduces the computational demands of repeated KFCV while preserving representativeness, it becomes particularly well suited to environments with constrained resources and large-scale datasets. Hence, FSV offers a practical solution for improving model evaluation where both efficiency and accuracy are critical.
KEYWORDS
Data Partitioning, Cross-validation, Hybridisation, Machine Learning, Sampling
Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Christopher Godwin Udomboso , Caston Sigauke , Ini Adinya , "Fusion Sampling Validation in Data Partitioning for Machine Learning," Mathematics and Statistics, Vol. 14, No. 1, pp. 130 - 146, 2026. DOI: 10.13189/ms.2026.140112.
(b). APA Format:
Christopher Godwin Udomboso , Caston Sigauke , Ini Adinya (2026). Fusion Sampling Validation in Data Partitioning for Machine Learning. Mathematics and Statistics, 14(1), 130 - 146. DOI: 10.13189/ms.2026.140112.