Inferences of Coordinates in Multidimensional Scaling by a Bootstrapping Procedure in R

Recently, MDS has been utilized to identify and evaluate cognitive ability latent profiles in a population. However, dimension coordinates do not carry any statistical properties. To cope with statistical incompetence of MDS, we investigated the common aspects of various studies utilizing bootstrapping, and provided an R function for its implementation.


Introduction
In recent decades, multidimensional scaling (MDS) has been utilized by several authors [1-3] to identify and evaluate cognitive ability latent profiles in a population. However, dimension coordinates do not carry any statistical properties (e.g., standard errors to test statistical significance of the coordinates). Therefore, interpretation of the dimension coordinates could be arbitrary and misleading. To cope with statistical incompetence for the MDS, several authors [2, [4][5][6] proposed methods utilizing bootstrapping [7]. In this paper, we investigated the common aspects of various studies which are generation of the empirical distribution and its statistical inferential procedure based on the bootstrapping method. We also provided an R function available for public for its implementation in the R platform. It would contribute to enhancing the analytical methodology for educational and behavioral scientists who want to identify latent profile patterns utilizing the bootstrapping approach embedded in MDS.
Typically, scores of individuals in education studies and psychology consist of the several subtest scores. For personality assessment [4,6], cognitive ability [1-3], and educational measurements [8], transformed scores rather than the original subtest scores are utilized to identify distinct characteristics of individuals. The multidimensional scaling (MDS) model has been adapted to transform the original scale of scores through the MDS coordinates. These coordinates reflect distinct patterns of latent profiles which encapsulate all possible observed score profiles of individuals in a population. However, statistical inferences for such coordinates are not provided. Therefore, interpretation of observed scores of individuals could be arbitrary and misleading.
The bootstrap method produces the empirical distribution of the coordinates, which allows its inference such as estimation of standard errors and empirical confidence intervals for the MDS coordinates whereas conventional MDS does not provide any statistical inference for the coordinates. Consider an MDS model as follows. Let The bootstrapping procedure consists of three steps. In the first step, bootstrap samples are generated at random from given observations that construct a finite population. That is, we randomly select rows from the observation matrix X . are aggregated to form the empirical distribution of the coordinates. Based on the empirical distribution of the coordinates, a statistical inference about the coordinates can be made. We provide the R function "BootMDS" that implements the bootstrapping procedure to construct the empirical coordinate distributions.
Adapting the bootstrapping procedure with "BootMDS", researchers can (1) generate of the empirical distribution of the MDS coordinates, (2) estimate various statistics based on the empirical distribution, (3) perform statistical inferential procedure of the coordinates, and (4) plot the various aspects of the empirical distribution of the coordinates.

Availability
The "BootMDS" is written with R. R is a statistical system freely available at CRAN (Comprehensive R Archive Network) from the website http://CRAN.r-project.org/, and works under Windows, Linux, and MacOS platforms. Note that the R package "smacof" should be preinstalled for the implementation of "BootMDS". The source code of "BootMDS" function and cognitive ability test data "wj7.txt" are available for free from the author's website http://dasan.sejong.ac.kr/~dhkim/ BootMDS.html. For a step-by-step tutorial one may download sample code and data from the author's website.

Empirical Analysis
We analyze "wj7" data and provide the potential applications of the proposed bootstrapping procedure through R implementation. The "wj7" data consists of the seven Woodcock-Johnson III cognitive ability tests [9]. The seven cognitive ability tests are Comprehension Knowledge (CK), Long-term Memory (LT), Visual-Spatial Thinking (VS), Auditory Processing (AP), Fluid Reasoning (FR), Processing Speed (PS), and Short-Term Memory (ST). Here, we load R functions for implementation, read the data file in R as the matrix of observation.
To run "BootMDS", we need to specify seven input arguments as follows.
This must be one of "euclid" for Euclidean distance or "sqeuclid" for squared Euclidean distance. 4. scale : specify whether the observation is standardized or not. 5. nBoot : specify the number of bootstrap samples. 6. nprofile : specify the number of dimension or profiles. 7. cl : specify the empirical confidence level of the interval estimation of a profile.
The "BootMDS" function generates bootstrap samples at random from a given observations, analyzes each bootstrap sample by two MDS scaling methods ("smacof" or "classical" which is a metric scaling), and aggregates the MDS coordinates to form the empirical distribution of the coordinates. In this study, 2,000 bootstrap samples are generated from the original sample, and these samples are analyzed by the "smacof" procedure with squared Euclidean distance (and can also be analyzed by "classical", although we did not include the results here). Then, three profiles are aggregated to form the empirical distribution of each profile. The following R code implements this procedure for obtaining the empirical distribution of each profile and 95% confidence interval for MDS coordinates, and saves its result to "empprofile" object. To duplicate its result, we set the random number seed fixed.
#### Generating empirical distribution of #### profiles from observation by bootstrapping library(smacof) # load smacof package set.seed(1) # To duplicate the result set the seed as 1. empprofile <-BootMDS(x=testdata, mds="smacof", distance="sqeuclid", scale=FALSE, nBoot=2000, nprofile=3, cl=0.95) The "BootMDS" function produces the empirical distribution of the profiles as well as a statistical summary of each profile. The output arguments of the "BootMDS" function are as follows: 1. stress : provide stress value of MDS. Based on the empirical distribution of each profile, a statistical inference can be made. We can calculate standard errors to test statistical significance of the coordinates and produce confidence interval of the coordinates. For example, the "empprofile" object by the "BootMDS" function includes the bootstrap samples of the dimension profiles and its summary statistics. The summary statistics of the i th dimension profile can be deduced from summary [[i]] of the 404 Inferences of Coordinates in Multidimensional Scaling by a Bootstrapping Procedure in R "empprofile" object. For example, Table 1 of the summary statistics for the first dimension profile can be deduced from the "empprofile" object by the following R command.
#### The summary statistics for each #### dimension profile i <-1 # for the first dimension profile #i <-2 # for the second dimension profile #i <-3 # for the third dimension profile empprofile$summary[[i]]  Note that "Ori" is the coordinates estimated from the original sample, "SE" is the bootstrap standard errors of the coordinates, "Mean" is mean coordinates from 2000 bootstrap replicates, "Upper" and "Lower" is lower and upper bound of 95% empirical confidence interval, and "WD" is the width of the confidence interval. Based on the bootstrap empirical confidence interval, we can infer that the LT, VS, and AP coordinates are significant for constructing the first dimension profile, since they did not include zeros in their empirical confidence intervals.

AP
From the empirical distribution of each profile, we can obtain graphical summary of the bootstrap empirical confidence interval of each dimension profile or each coordinate. These procedures can be implemented by the function "figureMDS" with four input arguments as follows.
figureMDS(result, type=c("ci", "hist"), dimension, coordinates) 1. result : the object by the "BootMDS" function. 2. type : specify the figure type, either "ci" for bootstrap empirical confidence interval of dimension profile or "hist" for histogram and Q-Q plot of coordinates. 3. dimension : specify the dimension of profile. 4. coordinates : specify the coordinates.  From the bootstrap samples, we can draw the empirical distribution of each dimension profile, and conduct any statistical procedure. For example, the bootstrap sample of the j th coordinate of the i th dimension profile can be deduced from profile [[i]][,j] of the "empprofile" object. Figure 2 describes the histogram and Q-Q plot for LT and AP coordinate of the first profile, which implies that the distribution of AP coordinate is right-skewed. The dotted line of histogram in Figure 2 indicates the 95% empirical confidence interval of each coordinate. The LT and AP coordinates are significant for the first profile since the 95% bootstrap interval does not include zero. In this way, the distributional characteristics of the MDS coordinates can be discovered and statistical inferences can be made. The following codes produce Figure 2.

Concluding Remarks
We utilized the bootstrap method to produce the empirical distribution of the coordinates for a statistical inference. Based on the bootstrap empirical distribution, we conducted statistical inference of empirical confidence intervals of the MDS coordinates for cognitive ability test data. As the analysis results show, three cognitive ability clusters, LT, VS, and AP were statistically significant. Researchers and practitioners can adapt, modify, and extend the bootstrap procedure for their own purpose. Using the R function "BootMDS" for the current study, one can be ready for applying bootstrapping method to real data. It can be easily accessible from the author's website. We hope that the implementation of the bootstrap procedure for constructing the empirical distribution of the MDS coordinates helps researchers conduct a statistical inference of the dimension coordinates and interpret the coordinates for their real applications.