The Consistency of Blindfolding in the Path Analysis Model with Various Number of Resampling

The use of regression analysis has not been able to deal with the problems of complex relationships with several response variables and the presence of intervening endogenous variables in a relationship. Analysis that is able to handle these problems is path analysis. In path analysis there are several assumptions, one of which is the assumption of residual normality. If the normality residual assumptions are not met, then estimating the parameters can produce a biased estimator, a large and not consistent range of estimators. Unmet residual normality problems can be overcome by using resampling. Therefore in this study, a simulation study was conducted to apply resampling with the blindfold method to the condition that the normality assumption is not met with various levels of resampling in the path analysis. Based on the simulation results, different levels of closeness occur consistently at different resampling quantities. At a low level of closeness, it is consistent with the resampling magnitude of 1000. At a moderate level, a consistent level of resampling of 500 occurs. At a high level of closeness, it is consistent with the amount of resampling 1400.


Introduction
According to [1], path analysis is a form of multiple regression statistical analysis used to analyze causal relationships in which independent variables produce both direct and indirect effects on a dependent variable. In path analysis, when the normality assumption of residuals cannot be fulfilled, several things can be done, for example transforming data, trimming data outlier, or adding observations. In addition to the methods mentioned, one other method that can be used to overcome violations of the normality assumption of residuals is resampling.
Resampling is a sampling activity from an existing data sample to draw a new sample with a larger size. According to the central limit theorem, the sampling distribution of the sample means approaches of a normal distribution for samples from a non-normal distribution with a sufficiently large sample size [2]. Several types of resampling include bootstrapping, jackknifing, blindfolding, k-nearest neighbor, Randomization Exact Test, and Cross-validation.
Basically, resampling is the method of repeatedly drawing samples from the original data samples. One of the most basic resampling techniques is Blindfolding. The advantage of the Blindfolding method is that it has a more flexible performance. It is because the sampling is conducted with part of the data that are set and part of the data that are omitted.
Generally, one issue to consider in resampling is the number of resampling. The number of resampling shows the number of replications of sampling taken. The more the number of resampling is, the more consistent the estimators obtained will be [3]. A simulation study can be done to investigate the number of resampling approaching the consistency of estimators in path analysis.
Based on the description above, this simulation study is conducted to examine the number of resampling on the consistency of path coefficient estimators if the normality assumption of residuals is not fulfilled under various conditions of the closeness of relationships between variables. This study aims to identify the consistency of blindfolding. Blindfolding does resampling with part of the data set. The consistency is needed for path coefficient estimation.

Materials and Methods
Path analysis is an extension of multiple linear regression. Path analysis, developed by Sewall Wright as in [7], is a method for studying the direct and indirect effects of variables in which some of the variables are often taken as causes, whereas others are taken as effects. Meanwhile, according to [8], path analysis is defined as an extension of the regression model used to test the fit of the correlation matrix against two or more causal models which are being compared by the researcher. The model is usually depicted in a circle-and-arrow figure in which single-headed arrows indicate causation. A regression is done for each variable in the model as a dependent on others which the model indicates are causes. The regression weights predicted by the model are compared with the observed correlation matrix for the variables, and a goodness-of-fit statistic is calculated [8].
In path analysis, the terms exogenous and endogenous variables are used. An exogenous variable is one whose variation is assumed to be determined by causes outside the model. It means that exogenous variables are not determined in the model. An endogenous variable, conversely, is one whose variation is explained by exogenous or other endogenous variables in the model. Endogenous variables are divided into intervening endogenous variables and pure endogenous variables. An intervening variable is one whose variation is explained by other variables as well as being the explanatory to other variables. Meanwhile, a pure endogenous variable is one that only explains the variation of other variables. In path analysis, there is at least one exogenous variable, one intervening endogenous variable, and one pure endogenous variable [9].

Steps of Path Analysis
According to Solimun (2010), the steps in path analysis are: 1. Designing a model based on concepts and theories.
The designed model will be presented in the form of a path diagram or in the form of an equation. For the equation model, because path analysis consists of several equations, a system of equations will be formed.

Steps of Path Analysis
Path analysis distinguishes three types of effects [10] with the following explanations.

The Direct Effect
The direct effect occurs when the relationship between the exogenous variable and the endogenous variable is not mediated by the intermediary variable. The direct effect can be described as follows. Based on Figure 1, it can be seen that the magnitude of the direct effect of the exogenous variable X on the endogenous variable Y is XY  .

The Indirect Effect
The indirect effect occurs when the relationship between the exogenous variable and the endogenous variable is mediated by the intermediary variable. The indirect effect can be described as follows.

The Total Effect
The total effect is the sum of the direct and indirect effects. From the examples in Figures 1 and 2., the total effect is

A Path Diagram
One important component in path analysis is a path diagram. A path diagram is used to illustrate the causal relationships between variables [7]. In the path diagram, the direct and indirect effects of each variable can be seen.
An example of the relationship between the exogenous variable and the endogenous variable can be illustrated in the following path diagram. Y are endogenous variables. The causal relationship in the path diagram is drawn in the direction of the arrows. In Figure 3, it can be seen that the variable X for the i th observation X : the mean value of the variable X S : the standard deviation of the variable X n : sample size By doing standardization, each variable will follow the standard normal distribution with mean = 0 and variance = 1.
The path analysis model consists of a system of equations. This model can be formed based on a path diagram. The system of equations needs to be solved simultaneously, starting from parameter estimation, hypothesis testing, to interpretation.
The system of equations obtained from the path diagram in Figure 3. is shown in equation (2).
The matrix form above can be written as follows.

Blindfolding Resampling
Characteristics that describe a population are called parameters. Information about the population is needed to estimate parameters. However, in the application, not all observations in the population can be known. It can be caused by several factors, for example, considerable cost and time that are required to obtain complete data. Thus, sampling is more often done.
If the number of observations taken as a sample is insufficient, parameter estimation with the sample becomes less accurate. To overcome this problem, a tool departing from theoretical distributions can be used, i.e. resampling. As the name implies, the resampling method is done by repeated sampling within the same sample. Resampling is used in hypothesis testing. However, resampling can be used to bring up all possible combinations. It is certainly highly time-consuming, so the computation is required [11].
One resampling method is the blindfolding method. Blindfolding is a sample re-use technique, which systematically deletes data points and provides a prognosis of their original values. For this purpose, the procedure requires an omission distance D. A value for the omission distance D between 5 and 12 is recommended in literature [11]. An omission distance of seven (D=7) implies that every fifth data point of a latent variable's indicators will be eliminated in a single blindfolding round. Since the blindfolding procedure has to omit and predict every data point of the indicators used in the measurement model of the selected latent variable, an omission distance of D=7 results in seven blindfolding rounds. Hence, the number of blindfolding rounds always equals the omission distance. Blindfolding employs a resampling algorithm that creates a number of resamples by a method whereby each resample has a certain number of rows replaced with the means of the respective columns (variable). Replications are performed on the first row, continued on the second row until the last row. The number of rows modified in this way in each resample equals the sample size divided by the number of resamples. For example, if the sample size is 108 and the number of resamples selected is 108, then each resample will have 1 row modified. As with the bootstrapping method, the Blindfolding method will obtain convergent sample estimators on at least 100 replications [5].
For example, the variable X has seven sizes of and 5 sizes are taken in the resampling process. The following is the resampling process in the blindfolding method: a. The first resampling Based on this process, it can be seen that in each blindfold sample, 1 x and 2 x will always be taken at every repeated sample taken. The Blindfolding Process is illustrated in Figure 4.

Path Coefficient Estimation
A path coefficient indicates the magnitude of the direct effect of an exogenous variable on an endogenous variable in a system of equations. One method that can be used to estimate path coefficients is Ordinary Least Squares (OLS).
Completion of equation optimization (4) is by

Hypothesis Testing and Consistency with Blindfolding
The blindfolding steps to estimate the standard error are as follows: The use of resampling method causes the data to be free from distribution so the assumption of normally distributed data and a large sample are not required. Thus, hypothesis testing can be done using the t-test on each coefficient obtained using the following formula. i k se (8) In equation (8), k shows the number of estimated coefficients. The statistical hypothesis used is as follows.
If the t-test statistic obtained is greater than ttable, then the null hypothesis is rejected. It means that there is a significant effect between variables. In this case, the effect between variables is the effect of the exogenous variable on the endogenous variable and the effect of the endogenous variable on another endogenous variable.
According to [6], the consistency of an estimator in blindfolding can be shown by the bias value, which is the difference (distance) between the estimator and the parameter. The following is the formula for calculating the estimator bias in resampling for the relationship between the exogenous variable X and the endogenous variable Good resampling results will follow a Monte Carlo simulation approach based on means.
In equation (10), ˆ* (.)  is the mean of parameter estimators obtained from the resampling process. Thus, the bias based on the replica B is to replace The bias in equation (11) can be used to determine the consistency of estimators obtained from blindfold samples.

Methods
This research is a simulation study using data generation with the following criteria. 1) Cross-section data with four variables, i.e. one exogenous variable, two intervening endogenous variables, and one pure endogenous variable. The four variables were measured directly (observable variables), so they did not require a measurement model. Thus, data had interval or ratio scales.

Forming Simulation Data
The following are the steps conducted in this study to form simulation data.   n were used, i.e. n small = 25, n medium = 50, and n large = 100. The distance between observations on the exogenous variable was made the same using arithmetic progression of ( 1) n a a n b    .

Making a path diagram
A path diagram illustrates the causal relationships between variables. In this study, the path diagram as shown in Figure (3) was used. Furthermore, a system of equations was formed according to the path diagram using standardized variables as in equation (16).

Making the path coefficient values from data generation
The path coefficient    values were -1 <0 <1 due to the standardization of the exogenous variable. In this study, the path coefficient values were divided into three conditions, i.e. those that describe the low

Generating the residual values
In this study, the residual  used followed an exponential distribution. It aimed to find out the resampling results on data which was not fulfilled the normality assumption of residuals.

Forming data for the intervening and pure endogenous variables
The intervening endogenous variables   1 Y and 2 () Y , as well as the pure endogenous variable   3 Y were calculated based on a system of equations formed in step two.

Forming simulation data
Simulation data was obtained by combining the exogenous variable, the intervening endogenous variables, and the pure endogenous variable obtained.
Next, the following steps were taken to do resampling with the Blindfolding method on simulation data.

1) Path coefficient estimation on simulation data
Path coefficient estimation on simulation data was performed using the OLS method. From this process, six path coefficients were obtained.

3) Path coefficient estimation in each blindfold sample
Path coefficient estimation was conducted on the B sample set. Thus, this process generated path coefficients with 1,2, , bB  . After that, average path coefficient was calculated notated as

4) Hypothesis testing of blindfolding
Hypothesis testing was performed for each path coefficient value.

5) Calculation of the bias
The bias is the difference (distance) between the estimator and the parameter. There were six estimators and six parameters in this study. The formula for calculating the bias is shown in equation (2.21 1  2  3  1  2  1  3  2  3 , , , , , . Thus, the bias was calculated as the distance between the two points written as

Results and Discussion
The resampling process with the Blindfolding method was performed on simulation data whose residuals did not fulfill the normality assumption. In this case, violations of the normality assumption of residuals were represented by exponentially distributed residuals. The following were the calculation results of the bias at various levels of the closeness of the relationship between variables, various conditions of the number of resampling, and different sample sizes.

Low Level of the Closeness of the Relationship between Variables
The level of the closeness of the relationship between variables included in the low category was indicated by the path coefficient value in a range of 0.05 -0.25.

Sample Size n = 25
Calculation of the bias was performed using the vector norm. In Figure 5 below, a comparison of the bias values in each condition of the number of resampling is presented.  Figure 5, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 750 B  and the bias at 1000 B  was very small of only 0.0448. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the low closeness of the relationship between variables was reached at the number of resampling of 750.

Sample Size n = 50
Calculation of the bias was performed using the vector norm. In Figure 6, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 6, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 1000 B  and the bias at 1200 B  was very small of only 0.0372. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the low closeness of the relationship between variables was reached at the number of resampling of 1000.
3.1.3. Sample Size n = 100 Calculation of the bias was performed using the vector norm. In Figure 7, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 7, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 1000 B  and the bias at 1200 B  was very small of only 0.0276. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the low closeness of the relationship between variables was reached at the number of resampling of 1000.

Moderate Level of the Closeness of the Relationship between Variables
The level of the closeness of the relationship between variables included in the moderate category was indicated by the path coefficient value in a range of 0.30 -0.50.

Sample Size n = 25
Calculation of the bias was performed using the vector norm. In Figure 8, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 8, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 500 B  and the bias at 750 B  was very small of only 0.0345. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the moderate closeness of the relationship between variables was reached at the number of resampling of 500.

Sample Size n = 50
Calculation of the bias was performed using the vector norm. In Figure 9, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 9, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 1000 B  and the bias at 1200 B  was very small of only 0.0367. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the moderate closeness of the relationship between variables was reached at the number of resampling of 1000.

Sample Size n = 100
Calculation of the bias was performed using the vector norm. In Figure 10, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 10, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 500 B  and the bias at 750 B  was very small of only 0.0483. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the moderate closeness of the relationship between variables was reached at the number of resampling of 500.

High Level of the Closeness of the Relationship between Variables
The level of the closeness of the relationship between variables included in the high category was indicated by the path coefficient value in a range of 0.6 -0.9.
3.3.1. Sample Size n = 25 Calculation of the bias was performed using the vector norm. In Figure 11, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 11, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 1400 B  and the bias at 1600 B  was very small of only 0.0541. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the high closeness of the relationship between variables was reached at the number of resampling of 1400.

Sample Size n = 50
Calculation of the bias was performed using the vector norm. In Figure 12, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 12, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 1400 B  and the bias at 1600 B  was very small of only 0.0257. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the high closeness of the relationship between variables was reached at the number of resampling of 1400.
3.3.3. Sample Size n = 100 Calculation of the bias was performed using the vector norm. In Figure 13, a comparison of the bias values in each condition of the number of resampling is presented.
Based on Figure 13, it can be seen that the bias values get smaller as the number of resampling increases. The difference in the bias at 1000 B  and the bias at 1200 B  was very small of only 0.0584. After that, the bias value experienced a quite low decrease. Thus, it can be said that the consistency of path coefficient estimators under unfulfilled normality assumption condition and the high closeness of the relationship between variables was reached at the number of resampling of 1400.

Consistency of Blindfolding under Unfulfilled Normality Assumption Condition
This section presents a comparison of the bias values obtained under the unfulfilled normality assumption of residuals condition. In Table 1 below, the number of resampling able to reach the consistency of path coefficient estimators for each condition is presented. In Table 1, it can be seen that the consistency was achieved in the low closeness of the relationship between variables with a sample size of 25 at the number of resampling of 750, a sample size of 50 at the number of resampling of 1000, and a sample size of 100 at the number of resampling of 1000. The consistency was achieved in the moderate closeness of the relationship between variables with a sample size of 25 at the number of resampling of 500, a sample size of 50 at the number of resampling of 1000, and a sample size of 100 at the number of resampling of 500. The consistency was achieved in the high closeness of the relationship between variables with a sample size of 25 at the number of resampling of 1400, a sample size of 50 at the number of resampling of 1400, and a sample size of 100 at the number of resampling of 100.

Conclusions
Based on the simulation study results, it can be concluded that: 1) The number of resampling with the blindfolding method needed to achieve consistent path coefficient estimators in the data with exponentially distributed residuals for the condition of the low closeness of relationships between variables was 1000.
2) The number of resampling with the blindfolding method needed to achieve consistent path coefficient estimators in the data with exponentially distributed residuals for the condition of the moderate closeness of relationships between variables was 500.
3) The number of resampling with the blindfolding method needed to achieve consistent path coefficient estimators in the data with exponentially distributed residuals for the condition of the high closeness of relationships between variables was 1400.