Journals Information
Mathematics and Statistics Vol. 7(4A), pp. 49 - 57
DOI: 10.13189/ms.2019.070707
Reprint (PDF) (3019Kb)
Investigation on the Clusterability of Heterogeneous Dataset by Retaining the Scale of Variables
Norin Rahayu Shamsuddin 1,*, Nor Idayu Mahat 2
1 Faculty of Computer & Mathematical Sciences, Merbok, 08400, Kedah, Malaysia
2 School of Quantitative Sciences, Universiti Utara Malaysia, 06010 Changlun, Kedah, Malaysia
ABSTRACT
Clustering with heterogeneous variables in a dataset is no doubt a challenging process owing to different scales in a data. The paper introduced a SimMultiCorrData package in R to generate the artificial dataset for clustering. The construction of artificial dataset with various distribution helps to mimic the scenario of nature of real datasets. Our experiments shows that the clusterability of a dataset are influenced by various factors such as overlapping clusters, noise, sub-cluster, and unbalance objects within the clusters.
KEYWORDS
Gower's Distance, k-medoids, Mixed Variables, SimMultiCorrData
Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Norin Rahayu Shamsuddin , Nor Idayu Mahat , "Investigation on the Clusterability of Heterogeneous Dataset by Retaining the Scale of Variables," Mathematics and Statistics, Vol. 7, No. 4A, pp. 49 - 57, 2019. DOI: 10.13189/ms.2019.070707.
(b). APA Format:
Norin Rahayu Shamsuddin , Nor Idayu Mahat (2019). Investigation on the Clusterability of Heterogeneous Dataset by Retaining the Scale of Variables. Mathematics and Statistics, 7(4A), 49 - 57. DOI: 10.13189/ms.2019.070707.