Journals Information
Environment and Ecology Research Vol. 9(3), pp. 114 - 118
DOI: 10.13189/eer.2021.090303
Reprint (PDF) (421Kb)
A RPCA-Based Tukey's Biweight for Clustering Identification on Extreme Rainfall Data
Siti Mariana Che Mat Nor 1, Shazlyn Milleana Shaharudin 1,*, Shuhaida Ismail 2, Kismiantini 3
1 Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, 35900 Tanjong Malim, Perak, Malaysia
2 Department of Mathematics and Statistics, Faculty of Applied Sciences and Technology, Universiti Tun Hussein Onn Malaysia, Malaysia
3 Department of Statistics, Faculty of Mathematics and Natural Sciences, Universitas Negeri Yogyakarta, Indonesia
ABSTRACT
In high dimensional data, Principal Component Analysis (PCA)-based Pearson correlation remains broadly employed to reduce the data dimensions and to improve the effectiveness of the clustering partitions. Besides being prone to sensitivity on non-Gaussian distributed data, in a high dimensional data analysis, this algorithm may influence the partitions of cluster as well as generate exceptionally imbalanced clusters due to its assigned equal weight to each observation pairs. To solve the unbalanced clusters in hydrological study caused by skewed character of the dataset, this study came out with a robust method of PCA in term of the correlation. This study will explain a RPCA to be proposed as an alternative to classical PCA in reducing high dimensional dataset to a lower form as well as obtain balance clustering result. This study improved where RPCA managed to downweigh the far-from-center outliers and develop the cluster partitions. The results for both methods are compared in term of number of components and clusters obtained as well as the clustering validity. Regarding the internal and stability validation criteria, this study focuses on the cluster's quality in order to validate the results of clusters obtained for both methods. From the findings, the amount of clusters had improved significantly by using RPCA compared to classical PCA. This proved that the proposed approach are outliers resistant than classical PCA as the proposed approach made a thorough observation assessment and downweigh the ones which were distant from the data center.
KEYWORDS
Principal Component Analysis (PCA), Pearson Correlation, Tukey's Biweight Correlation, cluster analysis
Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Siti Mariana Che Mat Nor , Shazlyn Milleana Shaharudin , Shuhaida Ismail , Kismiantini , "A RPCA-Based Tukey's Biweight for Clustering Identification on Extreme Rainfall Data," Environment and Ecology Research, Vol. 9, No. 3, pp. 114 - 118, 2021. DOI: 10.13189/eer.2021.090303.
(b). APA Format:
Siti Mariana Che Mat Nor , Shazlyn Milleana Shaharudin , Shuhaida Ismail , Kismiantini (2021). A RPCA-Based Tukey's Biweight for Clustering Identification on Extreme Rainfall Data. Environment and Ecology Research, 9(3), 114 - 118. DOI: 10.13189/eer.2021.090303.