Mathematics and Statistics Vol. 11(2), pp. 229 - 244
DOI: 10.13189/ms.2023.110201
Reprint (PDF) (2108Kb)


MTSClust with Handling Missing Data Using VAR-Moving Average Imputation


Embay Rohaeti 1,2, I Made Sumertajaya 1,*, Aji Hamim Wigena 1, Kusman Sadik 1
1 Department of Statistics, Faculty of Mathematics and Natural Sciences, IPB University, Bogor, 16680, West Java, Indonesia
2 Department of Mathematics, Faculty of Mathematics and Natural Sciences, Pakuan University, Bogor, 16129, West Java, Indonesia

ABSTRACT

Modeling and forecasting multivariate time series (MTS) data with multiple objects may be challenging, especially if the data have volatility and missing data. Several studies on inflation data have been proposed, but these studies either did not use MTS data or did not consider missing data. This study aims to develop an approach that can obtain general models and forecasts for MTS data with volatility and missing data. We proposed Vector Autoregressive Moving Average Imputation Method - Multivariate Time Series Clustering (VAR-IMMA - MTSClust) to group the objects into clusters. The clusters can then be used to obtain general models and forecasts. This study consists of three stages. The first stage is the imputation simulation stage, where 10%, 20%, and 30% of MTS data were randomly removed and imputed using the original VAR-IM and the proposed VAR-IMMA. The second stage is the clustering stage where six clustering methods, i.e., K-means Euclidean, K-means Manhattan, K-means DTW, PAM Euclidean, PAM Manhattan, and PAM DTW, were used on both the completed data and the imputed data from the first stage. The third stage is the modeling and forecasting stage, where clusters from the second stage are used to obtain general models and forecasts for each cluster. The simulations were performed 1000 times and evaluated using RMSE, RMSSTD, R-squared, ARI, and balanced accuracy. The results showed that VAR-IMMA could increase the imputation accuracy by 10% in 50% of cases and even more in another 25% of cases. This increase in imputation accuracy was proven beneficial in the second stage, where clustering on imputed data formed clusters that are still like the completed data clusters despite missing data. K-means Euclidean and PAM Euclidean are two of the best methods. Finally, the use of VAR-IMMA and PAM Euclidean on inflation rate data with missing data was illustrated. The imputed clusters have an ARI score of 0.57 and balanced accuracy of 92%, leading to similar models and forecasts to the ones in the completed data.

KEYWORDS
K-means Euclidean, Missing Data, Moving Average, MTSClust, PAM Euclidean, VAR-IMMA

Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Embay Rohaeti , I Made Sumertajaya , Aji Hamim Wigena , Kusman Sadik , "MTSClust with Handling Missing Data Using VAR-Moving Average Imputation," Mathematics and Statistics, Vol. 11, No. 2, pp. 229 - 244, 2023. DOI: 10.13189/ms.2023.110201.

(b). APA Format:
Embay Rohaeti , I Made Sumertajaya , Aji Hamim Wigena , Kusman Sadik (2023). MTSClust with Handling Missing Data Using VAR-Moving Average Imputation. Mathematics and Statistics, 11(2), 229 - 244. DOI: 10.13189/ms.2023.110201.