Test Efficiency Analysis of Parametric, Nonparametric, Semiparametric Regression in Spatial Data

Regression analysis has three approaches in estimating the regression curve, namely: parametric, nonparametric, and semiparametric approaches. Several studies have discussed modeling with the three approaches in cross-section data, where observations are assumed to be independent of each other. In this study, we propose a new method for estimating parametric, nonparametric, and semiparametric regression curves in spatial data. Spatial data states that at each point of observation has coordinates that indicate the position of the observation, so between observations are assumed to have different variations. The model developed in this research is to accommodate the influence of predictor variables on the response variable globally for all observations, as well as adding coordinates at each observation point locally. Based on the value of Mean Square Error (MSE) as the best model selection criteria, the results are obtained that modeling with a nonparametric approach produces the smallest MSE value. So this application data is more precise if it is modeled by the nonparametric truncated spline approach. There are eight possible models formed in this research, and the nonparametric model is better than the parametric model, because the MSE value in the nonparametric model is smaller. As for the semiparametric regression model that is formed, it is obtained that the variable X2 is a parametric component while X1 and X3 are the nonparametric components (Model 2). The regression curve estimation model with a nonparametric approach tends to be more efficient than Model 2 because the linearity assumption test results show that the relationship of all the predictor variables to the response variable shows a non-linear relationship. So in this study, spatial data that has a non-linear relationship between predictor variables and responses tends to be better modeled with a nonparametric approach.

Abstract Regression analysis has three approaches in estimating the regression curve, namely: parametric, nonparametric, and semiparametric approaches. Several studies have discussed modeling with the three approaches in cross-section data, where observations are assumed to be independent of each other. In this study, we propose a new method for estimating parametric, nonparametric, and semiparametric regression curves in spatial data. Spatial data states that at each point of observation has coordinates that indicate the position of the observation, so between observations are assumed to have different variations. The model developed in this research is to accommodate the influence of predictor variables on the response variable globally for all observations, as well as adding coordinates at each observation point locally. Based on the value of Mean Square Error (MSE) as the best model selection criteria, the results are obtained that modeling with a nonparametric approach produces the smallest MSE value. So this application data is more precise if it is modeled by the nonparametric truncated spline approach. There are eight possible models formed in this research, and the nonparametric model is better than the parametric model, because the MSE value in the nonparametric model is smaller. As for the semiparametric regression model that is formed, it is obtained that the variable X 2 is a parametric component while X 1 and X 3 are the nonparametric components (Model 2). The regression curve estimation model with a nonparametric approach tends to be more efficient than Model 2 because the linearity assumption test results show that the relationship of all the predictor variables to the response variable shows a non-linear

Introduction
Regression analysis is one method that can be used to determine the relationship between variables involved in a study (Draper & Smith, 1992). Regression analysis that involves one response variable and several predictor variables is multiple linear regression analysis. According to (Kutner et al, 2005), multiple linear regression analysis requires several assumptions that must be fulfilled, namely linearity, error normalization, homogeneity of various errors, non-autocorrelation, and non-multicollinearity. There are three approaches to regression analysis including parametric, nonparametric, and semiparametric approaches.
research data shows an unknown shape of the regression curve and the linearity assumption is not met, it is necessary to do statistical modeling with a nonparametric approach (Fernandes et al, 2015).
Some regression models with nonparametric approaches that are often used by researchers include Spline, Kernel, Fourier, and Wavelet (Eubank, 1999). Spline regression is a regression analysis method that can be used to estimate nonparametric regression models. Data that has a changing pattern at certain subintervals is very well modeled with splines (Hardle, 1990). Spline has piecewise polynomial properties in which a piece of the polynomial has segmented properties at the interval k formed at the knot point. One approach that can be used for parameter estimation in nonparametric regression models is truncated spline, where the truncated spline approach can overcome the changing data patterns at certain subintervals.
There is a development method of multiple linear regression, namely statistical modeling based on regional characteristics, where the statistical model that is formed is influenced by the geographical location between regions (Lu et al, 2014). Differences in geographical location affect the potential that is owned or used by an area. Therefore we need a statistical modeling method that considers geographical location or location observation factors.
Based on the three approaches in regression analysis, this study will discuss the comparison of the three approaches when used to that is formed is influenced by the geographical location between regions. The selection of the best model is seen based on the value of Mean Square Error (MSE). With this research, it is expected to be able to show the right regression analysis approach to be used if the research data does not follow a certain pattern. Regression curve estimation is done by the Weighted Least Square (WLS) method, and the estimated regression curve obtained applies globally and locally.

Spatial Data
Spatial data is data that contains geographical information so that it can be described in a map. In spatial data, there is a dependency between observation locations. The difference between spatial data and other data is that there are coordinates that indicate the location points according to geographical conditions (Anselin, 1988).

Regression Analysis
There are several regression analysis approaches based on data patterns, namely the parametric regression approach, nonparametric regression, and semiparametric regression. If the pattern of the relationship between the response variable and the predictor is known then it is called parametric regression analysis (Kutner et al, 2005), whereas if the pattern of the relationship between the response variable and the predictor does not know the shape of the regression curve or there is no past information about the data pattern then the approach used is nonparametric regression (Fernandes et al, 2014). In addition to the two approaches, there is a semiparametric regression approach which states that this approach used in the shape of the regression curve is partially known and partly unknown (Eubank, 1999).

Parametric Regression
In parametric regression, several classical assumptions must be fulfilled. One such assumption is that the shape of the regression curve is known, for example, as linear, quadratic, cubic, p-degree polynomials, exponents, and so on. Parametric regression functions can be written with the following equation: where: If parametric regression is applied to spatial data, the following equation will be obtained: Coordinates   , ii uv indicate the location of each observation. In parametric modeling with spatial data, we will get as many models as the location of the observation, and this shows that each location of the observation shows a different effect between the predictor variables on the response variable (Lu et al, 2014). So that if equations (2) and (3) are combined, a parametric regression curve estimation model will be obtained globally for all locations and locally for each observation location with the following equation: where: ik X : the value of the i-th observation on the k-th parametric predictor variable k : 1, 2, …, p 2.

Nonparametric Regression
Nonparametric regression is a statistical method used to estimate the pattern of relationships between predictor variables and responses when no information is obtained about the shape of a function or regression curve. In nonparametric regression, there are no classical assumptions as in parametric regression (Fernandes et al, 2019). Based on n independent observations, the relationship from variables and is unknown.
According to (Hardle, 1990), a nonparametric regression model with more than one nonparametric predictor variable is as follows: where: : error is assumed independent with zero mean and variance Truncated Spline is one of the approach methods in the nonparametric regression model that is often used. Truncated Spline is polynomial pieces that have segmented and continuous properties. One of the advantages of the truncated spline approach in nonparametric regression tends to find its estimation of the regression function according to the data.
  i f x is a function of a regression curve whose shape is unknown and assumed to be additive. If the nonparametric regression function is approximated by the truncated spline function, then it can be written in the equation as follows: If the nonparametric predictor variable is more than one, the data pair arrangement is   12 , , ,..., x so the nonparametric regression model that is formed is as follows.
  (7) and truncated functions as follows: The truncated spline nonparametric regression model in data with coordinate locations in the development of nonparametric regression, where the model is applied to spatial data (Sifriyani et al., 2018), so the estimated parameters generated are local for each observation location. The truncated spline approach is used to solve spatial data analysis problems for which the regression curve is unknown (Sifriyani et al, 2017).
Equation (9) shows a global nonparametric model so that in all locations of observation the predictor variables have the same effect on the response variable. If equation (9) is applied to spatial data, we will obtain a global nonparametric regression curve estimation model for all locations and locally for each observation location with the following equation:

Semiparametric Regression
Semiparametric regression combined between parametric regression and nonparametric regression. According to Eubank (1999), semiparametric regression states that the regression curve is partly known and partly unknown. The truncated spline semiparametric regression equation with more than one nonparametric predictor variable is as follows: where: p : number of parametric predictors variable q : number of nonparametric predictors variable Based on equation (10), it can be seen that in this equation, there is a global and local influence. Global influence does not involve the coordinates of the point of observation so that all observation locations have the same influence (Fernandes et al, 2020). While the influence locally gives a different effect at each observation location. The regression curve estimation method used in the three approaches is the Weighted Least Square (WLS) method (Fernandes et al, 2017), where the weighting indicates that there is an influence of heterogeneity of variance globally for all observations and locally for each observation location.

Testing Linear Assumptions
The linearity assumption states that the relationship between the response variable and the predictor variable is appropriate, which means that the regression curve can be expressed in a linear, quadratic, or cubic form. If the linearity assumption is not met, then the linear regression analysis with the parametric approach is not suitable for use in data analysis. One method for testing linearity assumptions is the Regression Specification Error Test or RESET, which was first introduced in 1969 by Ramsey.
According to (Gujarati, 2003) steps to implement the RESET, namely: 1. Perform a regression analysis using one predictor variable to get the fitted value of the response variable from the following equation.
From equation (11), parameter estimation using Ordinary Least Square (OLS) method and the coefficient determination is obtained with following equation: where: 1 SSE : sum square error from equation (11) 1 SST : sum square total from equation (11) i y : response variable 1i y : predicted value of response variable from equation (11) y : mean value of response variable from equation (11) 2. Perform a regression analysis by entering the fitted value obtained from equation (11) as a new predictor variable with the regression equation model as follows: Based on equation (13) the coefficient determination is obtained with the following equation: where: 2 SSE : sum square error from equation (13) 2 SST : sum square total from equation (13) 2i y : predicted value of response variable from equation (13) y : mean value of response variable from equation (13) 3. Then, the value 2 1 R of equation (11) and the value 2 2 R of equation (12)

Research Methodology
In the study, discussing farmer satisfaction with subsidized fertilizers from the government with the research variables used are as follows: Courage of a field counselor (Y), Nation Culture (X 1 ), Reward Financial Courage of a field counselor (X 2 ), and Leadership Role (X 3 ). The composition of the research data consists of three cultures that exist in East Java Province, wherein each culture consists of five regencies that have the coordinates of the observation location. In each culture, analysis was carried out to determine the level of farmer satisfaction, research was conducted to model farmers' satisfaction with subsidized fertilizer as a whole culture and in each culture. Modeling is done with three approaches, namely parametric, nonparametric, and semiparametric so that the most appropriate modeling is obtained to represent farmer satisfaction data.

Testing Linear Assumptions
Linearity assumption testing is used to find out whether the relationship between response variables and predictor variables can be stated precisely. This means that the regression curve can be expressed in a linear, quadratic, or cubic form.
Based on Table 1, it can be seen that all the predictor variables involved have a non-linear relationship to the response variable. Thus, insufficient evidence of the existence of a linear relationship pattern is used. The next step is modeling with three approaches at once, namely parametric, nonparametric, and semiparametric approaches. Based on the three approaches, modeling results will be obtained with the most efficient approach to represent data that has a non-linear relationship between the predictor variable and the response variable. Modeling on these three approaches is done by combining classical regression models with spatial data regression involving coordinates at each observation location.

Parametric Regression
Following are the equations obtained with the parametric approach globally for all observations and locally according to each coordinate of the observation location:  Locally estimated regression curves are obtained because of the coordinates at each observation location. Based on Table 3, three regions produce estimations of the regression curve locally, namely regions with culture 1, culture 2, and culture 3 as follows: Modeling with a nonparametric approach is done globally and locally so that all the predictor variables involved in the research are assumed to have non-linear relationships. Here are the results of estimating global nonparametric regression curves, in which there are three nonparametric predictor variables.
Estimation of nonparametric regression curves with a first-order truncated spline approach with the point of the optimum knot is as follows:  Table 5 the estimation of the nonparametric regression curve using the first-order truncated spline approach in each culture as follows: Culture 1:

Semiparametric Regression
Semiparametric regression modeling is used to model parametric and nonparametric regression simultaneously. The following will discuss six possible models that were formed using the semiparametric approach.

Model 1
The first model formed is assuming the variable X 1 as a parametric component, while X 2 and X 3 as the nonparametric component.  Table 6 states that the semiparametric curve estimation is done globally for all observations. Estimation is carried out with the assumption that part of the regression curve has a known shape and part that has no / unknown shape. The equation obtained is as follows:    The second model formed to compile the estimation of the regression curve is by assuming X 2 as a parametric component, while X 1 and X 3 as nonparametric components. Based on Table 8, the estimation is done globally for all observation locations so that the estimated curve model generated by the order 1 truncated spline approach and the point of the optimum knot is as follows:  Table 9 presents the results of the estimated semiparametric regression curves for each of the observation sites which include Culture 1, Culture 2, and Culture 3 with the following results: Culture 1:   The third equation that may be formed if a semiparametric approach is used, namely X 3 as a parametric component, and X 1 and X 2 as nonparametric components.   The fourth model that is formed is by assuming that X 1 and X 2 as parametric components, while X 3 as nonparametric components. Estimates of the global regression curve where X 1 and X 2 are parametric components, while X 3 as nonparametric components are as follows:  The fifth model is by assuming X 1 and X 3 as parametric components, X 2 as nonparametric components.  Table 14 shows the results of the global semiparametric regression curve estimation. The effect on all observation locations is assumed to be the same as the following results:  Table 15, states that each observation location shows a different effect between the predictor variables on the response variable.  The sixth model that was formed is by assuming variables X 2 and X 3 as parametric components, while X 1 as nonparametric components.
Global semiparametric regression curve estimates from the sixth model based on

The Efficiency Model
Based on the results obtained, there are three regression approaches curve estimation models that can be compiled. From the curve estimation models obtained, the best model will be selected. The best model is seen from the Mean Square Error (MSE) value for each model with the following results. Based on Table 18, the model formed based on the estimated regression curve shows that the nonparametric model has a smaller MSE value than the parametric model.  Table 19 shows some semiparametric models that are formed based on the results of the estimation of the regression curve. Based on the six models formed, it is found that in Model 2 the MSE values are the smallest compared to the other models.

Conclusions
This study discusses the estimation of the regression curve carried out with three approaches. The approach is parametric, nonparametric, and semiparametric. In this modeling combined with global and local curve estimation. In this study, there are three locations namely Culture 1, Culture 2, and Culture 3. Estimation of the regression curve when done globally will provide the same effect of the predictor variables on the response variable, while the estimation of the regression curve locally gives different results at each observation location. Then based on regression curve estimation models that are formed the best model is selected with the criteria for selecting the best model using MSE, the smaller the MSE value, the better the model formed. Of the eight possible models formed, the nonparametric model is better than the parametric model because the MSE value in the nonparametric model is smaller. As for the semiparametric regression model that is formed, it is obtained that Model 2 has the smallest MSE value, wherein Model 2 it is assumed that the variable X 2 is a parametric component while X 1 and X 3 are the nonparametric components.
The regression curve estimation model with a nonparametric approach tends to be more efficient than Model 2 because the linearity assumption test results show that the relationship of all the predictor variables to the response variable shows a non-linear relationship. So in this study, spatial data that has a non-linear relationship between predictor variables and responses tends to be better modeled with a nonparametric approach.