Per Capita Expenditure Modeling Using Spatial EBLUP Approach – SAE

Per capita expenditure of an area is a welfare indicator of the community. It is also a reflection of the economic capacity in meeting basic needs. Bali is the second richest province in Indonesia. This study aims to model the per capita expenditure of Bali at the sub-district level using Spatial-EBLUP (SEBLUP) approach in SAE. Small area estimation (SAE) modeling is an indirect estimation approach capable of increasing the effectiveness of sample sizes and minimizing variance. The heterogeneity of an area is influenced by other areas around. Everything is related to one another, but something closer will be more influential than something far away. Therefore, the spatial effect can be included in the random effect of a model small area, which is called as SEBLUP model. The selection of a spatial weights matrix is very important in spatial data modeling. It represents the neighborhood relationship of each spatial observation unit. A SEBLUP model needs a spatial weights matrix, which can be based on distance (radial distance and power distance), contiguity (queen), and a combination of distance and contiguity (radial distance and queen contiguity). The result of the implementation of the SEBLUP approach in per capita expenditure of Bali shows that the SEBLUP model with radial distance spatial weights matrix is the best model with the smallest ARMSE. South Denpasar Sub-district is the most prosperous sub-district with the highest per capita expenditure in Bali. Meanwhile, Abang Sub-district is the smallest per capita expenditure.


Introduction
The National Socio-Economic Survey (SUSENAS) conducted by the Central Statistics Agency (BPS) is a survey activity to collect information/data of population, health, education, family planning, housing, and consumption and expenditure. SUSENAS is conducted every three years and is designed to have three modules, namely the household consumption/expenditure module, the social, culture & education module, and the housing and health module [1]. The household consumption/expenditure module provides information about the welfare conditions of the community in an area. Community welfare describes the ability and inability of the community to meet basic needs. According to BPS [2], poverty is measured using the concept of the ability to meet basic food and non-food needs as represented in expenditure. Therefore, the definition of poor people is people who have an average expenditure per capita per month below the poverty line.
One of the government's efforts to reduce poverty is by predicting poor areas at the small area level, such as districts/cities, sub-districts, and villages. The application of the sample system to the population survey causes the survey objects to be limited, as a result, a direct estimation cannot produce accurate estimation and the variance is high. The indirect estimation method that utilizes information from the surrounding area becomes a solution to suppress high variance, which is then called small area estimation (SAE) [3]. According to Rao [4], the indirect estimation method in a small area can be used to produce better predictions. One of the methods in SAE is the Empirical Best Linear Unbiased Prediction (EBLUP) method. Estimation using EBLUP on continuous data needs to be evaluated because the estimators in a small area are biased but have minimum variance. The purpose of the estimation using the EBLUP method is to obtain an efficient estimator.
The method in SAE is an indirect estimator, namely an estimator obtained by weighting the value of a random variable from an area to increase the effectiveness of sample size and minimize variance. The diversity of an area can be affected by the surrounding areas, so the spatial effect can be included in the area random effect. The spatial effect is the effect that occurs from one area to another due to the interaction of one area with other areas. Modeling in a small area that includes spatial effect into the model is called the Spatial EBLUP (SEBLUP) model. Cressie [5] investigate the first SEBLUP estimator by including the spatial correlation of small area random effects into the small area estimation model. Other research on SEBLUP was also conducted by [6][7][8][9][10] as well as [11] by including the nearest neighbor's spatial weights matrix into the EBLUP model.
The involvement of spatial effects in each area is accommodated by the spatial weights matrix in the model. There are two ways to determine the spatial weights matrix, namely by looking at the neighborhood relationship of each area or the distance from one area to another. One of the weighting methods that look at the neighborhood relationship is the spatial contiguity method, in which weighting is done by looking at adjacent areas from the sides and/or corners of an area. Meanwhile, other weighting method uses the distance function to see the distance of an area to another. Although many previous studies have discussed the spatial approach to estimating small areas, the selection of spatial weights matrix is still a problem in modeling. According to Getis and Aldstadt [12], the spatial weights matrix is an important part of spatial modeling that represents the spatial effect of an area. Therefore, the selection of a spatial weights matrix is very important in small area modeling.
Based on previous studies, this study was conducted to examine the best choice of spatial weights matrix (based on distance, contiguity, or a combination of distance and contiguity) in the per capita expenditure modeling in Bali using the SEBLUP model. Determination of the best spatial weights matrix is measured from the SEBLUP value of Average Root Mean Squared Error (ARMSE). The smaller the MSE SEBLUP value, the spatial weights matrix in the SEBLUP model is said to be the best spatial weights matrix. This study is expected to contribute to the development of statistics, particularly in determining the spatial weights matrix in small area modeling. Besides, the best model in the study will also provide an overview of the per capita expenditure in Bali. Bali was chosen in this study because Bali is the province with the second-lowest poverty rate in Indonesia. However, Bali Island itself is the second poorest island after Maluku and Papua. This research is also expected to provide input for the government at both the provincial and district levels regarding per capita expenditure in Bali, which can be an indicator of poverty. Poverty is still a big challenge for the government in the socio-economic field.

Materials and Methods
The data in this study were secondary data from BPS [2]. The response variable was the average per capita expenditure per sub-district. Meanwhile, the predictor variables were total population, number of public health centers, number of public primary schools, and number of families using PLN (Perusahaan Listrik Negara). The population was 54 sub-districts in Bali. The operational definition of each variable in this study is summarized in Table 1. The steps of analysis in this study can be described as follows: 1. Examine SEBLUP models and various spatial weights matrices based on distance (power distance and radial distance), spatial contiguity (queen contiguity), as well as distance and spatial contiguity combinations (radial distance and queen contiguity). 2. Prepare direct estimator data, namely the average per capita expenditure at the sub-district level in Bali, which is obtained from this formula.
3. Prepare data of predictor variables, namely the number of populations, number of health centers, number of public elementary schools, and number of families of PLN users. 4. Determine the centroid coordinates of each sub-district in Bali using ArcGIS. 5. Form a distance matrix between areas (sub-districts) in Bali based on Euclid's distance using R3.6.2 software. 6. Form a spatial weights matrix based on distance using the power distance and radial distance functions 7. Form a spatial weights matrix based on spatial contiguity, queen contiguity type 8. Form a spatial weights matrix based on a combination of distance and spatial contiguity, namely the radial distance and queen contiguity function 9. Estimate per capita expenditure at the sub-district level in Bali using the SEBLUP model using various spatial weights matrices (based on distance, spatial contiguity, and combination of distance and spatial contiguity) from the results of steps (6), (7), and (8). 10. Compare the results of the estimated per capita expenditure at the sub-district level in Bali from the SEBLUP model generated by step (9). 11. Select the best SEBLUP model from the four spatial weights matrices in steps (6), (7), and (8) based on the MSE SEBLUP using equation (7). 12. Identify the normality assumption on errors from the best SEBLUP modeling results in step (11). 13. Test the significance of the predictor variables from the best SEBLUP modeling results in step (11). 14. Interpret the best model from step (11).

Results and Discussion
Bali is one of the provinces with the capital in Denpasar. Geographically, Bali consists of several islands located in the central Indonesian region (WITA) and includes eight districts and one city, namely Badung District, Bangli District, Buleleng District, Denpasar City, Gianyar District, Jembrana District, Karangasem District, Klungkung District, and Tabanan District. Bali is the province with the second-highest level of community welfare in Indonesia. Community welfare reflects the ability to meet needs. Meeting needs is closely related to both income and consumption. In terms of consumption, people's ability to meet needs can be seen from the per capita household expenditure each month. Descriptive statistics of per capita expenditure at the sub-district level in Bali are summarized in Table 2. Based on Table 2, it can be seen that the lowest average per capita expenditure is in Marga Sub-District, Tabanan District of IDR 104,257. Meanwhile, people in South Denpasar Sub-District, Denpasar City, have the highest average per capita expenditure of IDR 6,476,787. This shows that the condition of the people in South Denpasar District is far from the poverty line. On the other hand, a different condition occurs in the people of Abang Sub-District with an average per capita expenditure that is very small below the average per capita expenditure in Bali. In general, it provides an overview of the economic condition, which still needs special attention from the local government.
The SEBLUP model in this study is broadly using a spatial weights matrix (W) based on geographic proximity. The first law on geography was introduced by Tobler who argues that "Everything is related to everything else, but near things are more related than distant things" [13][14]. The weights matrix based on geographic proximity is grouped by distance, based on contiguity (boundary), and based on a combination of distance and contiguity [15]. This study examines which spatial weights matrix is most suitable for per capita expenditure in Bali, covering 54 sub-districts using the SEBLUP model. According to [16], the recommended spatial weights matrix is used for the number of areas with a moderate category (n = 64 areas), namely the power distance spatial weight matrix, the spatial weights matrix KNN, and the combination of radial and queen spatial weights matrices.
In this study, two methods of spatial weights matrix based on distance were chosen, namely the radial distance matrix and power distance matrix. Based on the spatial contiguity, only a queen type contiguity matrix was chosen. Meanwhile, based on the combination of distance and contiguity, a combination of radial and queen weights matrix was chosen. The mathematical logic concept of the conjunction operator (∧) is used to form a matrix of this combination of distance and contiguity. An area is said to be neighboring according to a combination of distance and contiguity if the area is viewed based on the distance of the neighboring area and based on contiguity is also neighboring. The Euclid distance matrix is based on physical distance. Meanwhile, the spatial weights matrix is based on radial distance, power distance, queen type contiguity, and a combination of radial distance and queen contiguity.
The SEBLUP model, in general, can be written as: where = ( − ) −1 is a small area random effect containing the autoregression coefficient and spatial weights matrix [17]. The SEBLUP model in this study includes a vector y measuring 54×1, which is the per capita expenditure vector per sub-district obtained from direct estimation, the predictor variable X matrix measuring 54×4, the regression coefficient vector measuring 4×1, the incidence matrix ( is the identity matrix measuring 54×54). Identity matrix I measuring 54×54, spatial autoregression coefficient ρ, spatial weights matrix W measuring 54×54, error vector of random effect u measuring 54×1, and is the sample error vector measuring 54×1.
Because the units of the predictor variables used in this study were different, standardization was carried out on the initial data. After going through the process of estimating the parameters of the SEBLUP model with the spatial weights matrix, the results of the estimated parameters can be seen in Table 3.
The estimation of SEBLUP model parameters in Table 3 shows the results that are not much different for the SEBLUP model with a queen weights matrix (SEBLUP KQ ) and a combination radial-queen (SEBLUP RQ ) on all predictor variables. While the SEBLUP JR model, the SEBLUP model with a spatial weights matrix of radial distance, tends to produce a slightly different ̂ coefficient compared to the other three SEBLUP models. In addition to the SEBLUP model parameters, to estimate per capita expenditure, it is necessary to estimate the effect of small area random (̂) and autoregressive spatial coefficient ( ) from the SEBLUP model. The autoregressive spatial coefficient shows the strength of the spatial relationship between the effects of random areas (between sub-districts). The value of ranges from -1 to 1.
In the SEBLUP model, the random effect components are assumed to follow the normal distribution. Testing the normality assumption on small area random effects has been carried out using the Anderson-Darling test (α = 0.05) with the following hypothesis: H 0 : ~ (0, 2 ) (random effects have normal distribution) vs H 1 : ≁ (0, 2 ) (random effects do not have normal distribution) The summary of the spatial autocorrelation coefficient of each model along with the Anderson-Darling test results is presented in Table 4.
The autoregressive spatial coefficients of the four SEBLUP models in Table 4 show all positive values ( > 0 ). The positive autoregressive spatial coefficient value indicates that a sub-district that has a high parameter value tends to be surrounded by other sub-districts with high parameter values. The result of the random variable normality assumption test in Table 5 shows that all p-values are very small (<0.05), so H 0 is rejected and it is concluded that the normality assumption of random variables in the four models is not fulfilled. However, in this case, we will first look at the estimation results of the per capita expenditure of Bali Province from the direct estimation results and the four SEBLUP models, which are summarized in Table 5.
Based on Table 5, the highest aggregate per capita expenditure in Bali is indicated by the direct estimation result of IDR. 75,804,607.29, while the lowest aggregate per capita expenditure of Bali is shown by the estimation results of the SEBLUP model with a radial distance spatial weights matrix of IDR. 73,789,265.25. The comparison of the results of the estimated per capita expenditure of each sub-district in Bali based on the four SEBLUP models and the direct estimation in Table 5 is presented in the graph in Figure 1.    Table 6 cannot be separated from the errors generated by the model. The error in the SEBLUP model assumes that the distribution follows the normal distribution. Testing the error normality assumption using the Anderson-Darling test (α = 0.05) is shown in Table 6.
Testing the error normality assumption in Table 6 shows that all p-values are greater than 0.5 (> 0.05), so the decision to reject H 0 can be drawn, and it is concluded that the normality assumption of errors in the four models is fulfilled. Furthermore, testing the significance of the SEBLUP model parameters was then carried out using the t-test (α = 0.05) with the hypothesis: H 0 : = 0 (variable do not have a significant effect) vs H 1 : ≠ 0 (variable has a significant effect) The results of testing the significance of the four parameters of the SEBLUP model are summarized in Table  7.
The results of the significance of the parameters based on Table 7 show that the four SEBLUP models are significantly influenced by population size (JP), the number of elementary schools (JSDN), and the number of families using PLN (JKPLN). This is indicated by the p-value that is smaller than 0.05 (p-value <0.05) for the three variables and vice versa for the number of public health centers (JPus).
After the estimation results of per capita expenditure for each sub-district in Bali based on direct estimation and the four SEBLUP models are obtained, at the same time, the mean square error (MSE) value of each sub-district in each model is obtained. The smaller the MSE value, the better the model. The complete MSE SEBLUP value of the four models is presented in Figure 2. Broadly, if the Average Root MSE (ARMSE) value is taken from the four SEBLUP models, the results are in Table 8.
The evaluation of the best small area estimation model can be seen from the ARMSE value in Table 8. Table 8 shows that the ARMSE value of the SEBLUP model with a spatial weights matrix of radial distance is smaller than the ARMSE of the SEBLUP model with a spatial weights matrix of power distance, queen contiguity, and radial-queen combination. Thus, it can be said that the SEBLUP model with a radial distance spatial weights matrix is better than other SEBLUP models for estimating per capita expenditure per sub-district in Bali.

Conclusions
The result of a study on the selection of a spatial weights matrix in the spatial-EBLUP (SEBLUP) model on the per capita expenditure per sub-district in Bali shows that the best SEBLUP model with the smallest ARMSE value is the SEBLUP model with a radial distance spatial weights matrix. The aggregate per capita expenditure for Bali based on the SEBLUP JR model is IDR. 73,789,265.25, where South Denpasar Sub-District was the sub-district with the highest per capita expenditure and Abang Sub-District was the sub-district with the lowest per capita expenditure. The SEBLUP JR model also shows that the per capita expenditure is significantly influenced by the population size, the number of elementary schools, and the number of families using PLN.