A Goal Programming Approach for Multivariate Calibration Weights Estimation in Stratified Random Sampling

Calibration estimation is one of the most important ways to improve the precision of the survey estimates. It is a method in which the designs weights are modified as little as possible by minimizing a given distance measure to the calibrated weights respecting a set of constraints related to suitable auxiliary information. This paper proposes a new approach for Multivariate Calibration Estimation (MCE) of the population mean of a study variable under stratified random sampling scheme using two auxiliary variables. Almost all literature on calibration estimation used Lagrange multiplier technique in order to estimate the calibrated weights. While Lagrange multiplier technique requires all equations included in the model to be differentiable functions, some undifferentiable functions may be faced in some cases. Hence, it is essential to look for using another technique that can provide more flexibility in dealing with the problem. Accordingly, in this paper, using goal programming approach is newly suggested as a different approach for MCE. The theory of the proposed calibration estimation is presented and the calibrated weights are estimated. A comparison study is conducted using actual and generated data to evaluate the performance of the proposed approach for multivariate calibration estimator with other existing calibration estimators. The results of this study prove that using the proposed GP approach for MCE is more flexible and efficient compared to other calibration estimation methods of the population mean.


Introduction
Statisticians in survey sampling are often interested in the precision of the estimates of the population parameters. This paper focuses on MCE approach, which is considered to improve the efficiency of the estimates of the population parameters. MCE is a systematic way to incorporate auxiliary information in the estimation procedure. With respect to the auxiliary information, it is the additional information correlated in some way with the study variable. It can be available for the target population, either in an aggregated form or in a detailed form for each individual population unit.
In General, any estimation approach includes attaching weights to sample data and then the weighted averages are computed. These attached weights require a little modification making use of the available auxiliary information through calibration estimation. Hence, the main goal of calibration estimation is to get new weights that modify as little as possible the design weights. These design weights have the desirable property of yielding unbiased estimates. Accordingly, the survey statistician wants to stay as close as possible to these design weights. There is no guarantee that the new weights still give unbiased estimates, but it is expected to be close to the unbiasedness [1].
Deville and Sarndal [1] first introduced calibration estimation in survey sampling. Later, many researchers have contributed to calibration estimation approach which provides calibrated estimators for population parameters using various calibration constraints under different sampling designs. Singh [2] is considered the first study that extended the calibration approach to Stratified Random Sampling (SRS) design. According to [2], many researchers have contributed to the theory of calibration estimation under SRS [3][4][5].
This study proposes, mainly, new MCE through developing a new set of calibrated weights that can be used to increase the precision of the estimates under SRS by incorporating two auxiliary variables. Therefore, the estimation of the calibrated weights is considered as a Mathematical Programming Problem (MPP) in which the Manhattan distance is minimized respecting a set of calibration constraints. In addition, the constraints that improve the precision of the calibrated estimator should be taken into consideration. The considered optimization problem will be solved using Goal Programming technique; the computational details of the proposed procedure will be illustrated in the existence of two auxiliary variables.
The remaining part of this paper is organized as follows. Section 2 presents notations about calibration approach in SRS, while the literature of the calibration estimation approach is reviewed in section 3. Section 4 presents the suggested mathematical programming model used in estimating the calibrated weights. Section 5 illustrates the performance of the proposed mathematical programming model. This is done through comparing the suggested model to the previously introduced methods using a numerical example and through a simulation study that is demonstrated in section 6. Finally, section 7 summarizes the main conclusions and contribution of this paper. = n, where n is the total sample size.

Basic Notations of Calibration Estimation Approach in Stratified Random Sampling
Suppose the i th unit value of the study variable selected from the h th stratum is denoted by y hi ; i=1, 2…. n h . It must be noted that the unbiased estimator of the population mean under SRS design = ∑ ℎ ℎ ℎ=1 is given by [6] denotes the h th stratum sample mean. Additionally, the estimated variance of st under the SRSWOR scheme is given by [6] ℎ ) 2 is the sample variance of Y in the stratum h, and f h = n h / N h is the sampling fraction for the h th stratum; h=1,2,…L.
Assume that X hji denotes the i th unit value of the j th auxiliary variable in the stratum h; h=1,2,….L, i= 1,2,…., N h and j=1,2. The stratum means ℎ = ℎ ) 2 of each auxiliary variable are assumed to be known.
The main purpose of any calibration estimation approach is to provide an improved estimator of the parameter of the population by incorporating information from number of auxiliary variables. Hence, the calibration estimator of the population mean, under SRS, can be given by [3] where  h ; h=1,2….L denote the estimated calibration weights that are chosen to minimize a given distance function, subject to some specific constraints. In addition, the variance of the calibrated estimator can be computed by substituting the design weights, W h with the calibration weights  h ; h=1, 2…L in (2.2). Accordingly, the variance of the calibrated estimator can be given by [3] ѵ ( y st (

Review for the Previous Calibration Estimators in Stratified Random Sampling
Mathematical programming tools offer researchers the advantages of optimizing an objective function respecting a set of system constraints that identify the structure of the main problem. This is one of the main features that have been utilized by many authors in the field of calibration estimation. In this section, some of the mathematical programming models, which were suggested in the literature to determine the optimal calibrated weights, are presented.
Most of the literature considered the calibration weights estimation as an optimization problem, in which the chi-square distance function is minimized subject to some important constraints. Finally, they used Lagrange multiplier technique to estimate the optimum calibrated weights.
The literature can be classified according to the number of incorporated auxiliary variables into two groups. The first group represents the studies that contributed to univariate calibration estimation by incorporating one auxiliary variable, while the second group introduces some multivariate calibration estimators using two auxiliary variables.
With respect to the first group, the calibration estimator introduced by Singh [3] is considered one of the main estimators for the population mean in SRS using one auxiliary variable. Singh [3] minimized the chi-square distance function subject to two main constraints. However, the first constraint assumes that the weighted sum of mean of the auxiliary variable has to be equal to the known parameter of that auxiliary variable. The second constraint indicates that the sum of the calibrated weights is equal to 1. Therefore, Singh [3] optimization problem was formulated as follows are the sample and population means of auxiliary variable in h th stratum respectively. Q h is appropriately set constant that determine the final form of the calibrated estimators [9]. Moreover, Tracy [4] introduced their calibration estimator by minimizing the objective function in (3.1) subject to two constraints. The first one is the same as the constraint in (3.2), while the second constraint used the second order moments of the auxiliary variable. Hence, the second constraint can be expressed as follows: are the auxiliary variable's sample and population variance in the h th stratum respectively. Motivated by Singh [3] and Tracy [4], Koyuncu and Khadilar [7] proposed their calibration estimator. They also minimized the same objective function in (3.1) subject to three constraints which are the same as the constraints proposed in [3] and [4]. However, they considered using them simultaneously at one optimization problem. Hence, it can be stated that they minimized (3.1) subject to the three constraints expressed in (3.2), (3.3) and (3.4) together simultaneously [7].
Regarding the second group that focused on MCE, Rao [5] can be considered the first study that proposed MCE for the population mean in SRS through incorporating two auxiliary variables. They considered minimizing (3.1) subject to one constraint in which the weighted sum of the sample means of j th auxiliary variable is equal to the known parameter for that auxiliary variable. Their optimization problem is expressed by are the sample of j th auxiliary variable and population means in h th stratum respectively. Further, Rao [8] proposed MCE for the population mean in SRS using different constraints related to two auxiliary variables. They also suggested minimizing (3.1) as their objective function subject to the same three constraints used in [7] but with respect to multi-auxiliary variables.
Almost all previously mentioned literature used the Lagrange multiplier technique in order to derive the values of their calibrated weights. While Lagrange multiplier technique requires all equations included in the model to be differentiable functions, some un-differentiable functions may be faced in some cases. Hence, it is essential to look for another technique that can provide more flexibility in dealing with different optimization problems. In the following section, MPP will be proposed along with its suitable solving method.

The Suggested Multivariate Mathematical Programming Model
In this section a Multivariate Mathematical Programming (MMP) for MCE in SRS using two auxiliary variables is presented in order to estimate the values of the calibrated weights of strata. The objective function in the proposed model is presented in subsection 4.1 while the constraints included in the model and the suggested MMP will be introduced in the subsections 4.2 and 4.3, respectively.

The Objective Function
This model is concerned with minimizing an objective function that represents a distance measure between the design weights ( ℎ ; h=1…L) and the calibrated weights ( ℎ ; h=1…L). It must be noted that most of the literature on calibration estimation in SRS considered minimizing the Chi-squared distance function as their objective function. Contrary to most of this literature, this paper considers the Manhattan distance (L1 Norm) from the design weights ℎ as the main objective function. The main advantage behind using the Manhattan distance measure is decreasing the effect of the existence of the outlier. If the design weights W h contain outliers, the variance of the estimator may be large and inference on the population parameter may be inefficient.
Moreover, the decision variables in the suggested model are the L calibrated weights  h ; h=1, 2…L assuming that the population under the study consists of L strata. Accordingly, the considered objective function is presented as follows: where W h and  h ; (h=1, 2…L) are the design weight and calibrated weight for the h th stratum respectively.

The Constraints
This model considers two sets of constraints. The first set represents the calibration constraints while the second set contains the constraints that concentrate on improving the precision of the calibrated estimator. The following three constraints belong to the first set that represents the main key of the calibration procedure. In these calibration constraints, both of the 1 st and 2 nd order moments of each auxiliary variable are used. Hence, the first constraint, in which the weighted sum of the j th auxiliary variable equals the known parameter for that auxiliary variable, can be expressed as follows The second constraint, in which the weighted sum of the sample variances of j th auxiliary variable equals population variance for that auxiliary variable, can be represented as follows The third constraint, which guarantees that the sum of the estimated calibration weights doesn't exceed 1, can be given by Regarding to the second set of the constraints, two constraints will be included in the MMP model. The first constraint is designed to guarantee that the estimated variance of the calibrated estimator is less than or equal to a specified positive constant ζ, and it can be expressed as follows: On the other hand, the second constraint is designed to guarantee that the improved calibrated estimator is still close to unbiasness. It can be expressed as follows represents the calibration estimator of the population mean, and Y denotes its parameter, ϵ is a specified positive constant. Accordingly, |∑  h y h L h=1 − Y | denotes the estimated biasness of the calibrated estimator.

The MMP Model
Based on the elements of the model presented in the previous subsections, the main structure for the MMP model can be expressed as follows: Subject to  On the other hand, Goal programming approach (GP) has been accepted as a basic mathematical programming method for solving different optimization problems. Practically, in some mathematical programming problems that contain too restrictive hard constraints, it is still available to find a solution for that problem using GP technique. This can be obtained by modifying these hard constraints to be soft constraints, and then these soft constraints will be treated as goals to be achieved by adding penalty term in the objective function. Finally, the overall objective is to get as close as possible to these goals [9]. Nevertheless, up to researcher's knowledge, there is not any study that considered using GP Technique for calibration estimation in stratified sampling. Accordingly, based on the wide flexibility of GP technique, it is suggested to convert the original MMP model (4.7-4.13) into a GP model for estimating the calibrated weights of strata. In the suggested GP model, the constraints of the entire problem are considered as goals to be attained and satisfied.

The GP Model
In order to solve any mathematical programming model using Goal programming technique, positive and/or negative deviational variables have to be added to the main objective function in addition to each goal has to be attained. Further, it can be proved that minimizing the absolute deviation between any parameter and its estimate is equivalent to minimizing the sum of the positive and negative deviational variables added to this function [10].
In another words, minimizing the sum of positive and negative deviational variables for any objective (goal constraint) is equivalent to minimizing the absolute deviation between the right-hand side and the left-hand side of this objective (goal constraint). Here is an example that can be used to clarify the previous statement. Assuming that the objective (goal constraint) expressed in (4.14) is considered where A is the target value for this objective (goal constraint), 1 Further, to construct the Goal programming model for the previous MMP [4.7-4.13], each objective (goal constraint) will be divided by its right-hand side value in order to avoid the effect of different measurement units between different constraints assuming that the right-hand side value doesn't equal zero.
Theoretically, there is a tradeoff between getting a calibrated estimator with a minimum variance and a minimum bias simultaneously. Minimizing the variance of the calibrated estimator by adding the goal constraint (4.19) without adding the goal constraint (4.24) may cause getting a biased calibrated estimator. Hence, it is important to add both of the two constraints in order to control the overall mean squared error of the calibrated estimator.
Therefore, according to this theoretical basis, three cases for the proposed GP approach will be considered. The main difference between these cases depends on the relative importance of each goal (variance and bias). This relative importance can be determined by the priority weights given to the deviational variables added to each goal constraint, specifically; dp 2 and (dn 7 + dp 7 ) deviational variables.
The three cases can be further explained as follows: Case 1: Both of the variance and the bias of the calibrated estimator has the same relative importance so that equal priority weights are given to both dp 2 and (dn 7 + dp 7 ) deviational variables of the variance and the bias goal constraints respectively. Case 2: The variance of the calibrated estimator is relatively more important than its unbiasedness. Hence, the priority weight given to dp 2 is equal to 1, while the priority weight for (dn 7 + dp 7 ) is zero.
Case 3: This case is considered the inverse of case 2, where having unbiased calibrated estimator is more important than decreasing its variance. Hence, the priority weight given to dp 2 is equal to zero, while the priority weight for (dn 7 + dp 7 ) is one.
In the following two sections, the performance of the proposed approach with its three cases will be demonstrated and compared with some literature using a numerical example and a simulation study.

Numerical Illustrations and Comparison Study
For explaining the computational details of the proposed multivariate calibration estimator of the population mean, a tobacco data containing 106 units for three variables: area, yield, and production is used. The data is obtained from the Agriculture Statistics 1999 reported in [3] and commonly used in the literature [3, 5, and 8]. The population in this data is divided into L = 10 subgroups (strata). Using proportional allocation, a sample of 40 observations is selected.

Numerical Illustration
Assuming that estimating the average production of tobacco ̅ using both of area and yield as the auxiliary variables is considered. To estimate the multivariate calibration weights and then compute the calibration estimator of ̅ in SRS, the same sample units taken in [3] is used. The information needed for solving the model and computational details is summarized in Tables 1 & 2. GAMS software is used to solve the suggested GP. It solves the nonlinear programming problems using CONOPT 3.15L solver (by default) and the estimated calibration weights are obtained and given in table 3.

Comparison Study
In this section, the performance of the proposed multivariate calibration estimators and other existing calibration estimators will be compared using the Relative Efficiency RE. The proposed GP approach will be compared with following existing calibration estimators:  Tracy [4] estimator: .  Singh [3] estimator: .  Rao [5] estimator: .  Rao [8] estimator: To assess the performance of different calibration estimators compared to the unbiased estimator y st , the measure of Relative Efficiency (RE) is computed as follows [8]: It is the estimated variance of the unbiased estimator y st , v(Ŷ) is the estimated variance of the calibrated estimators, that can be computed through substituting by the calibrated weights instead of the design weights in (5.2) [8].
Moreover, table 4 presents the variance v(Ŷ) of calibration estimator and its relative efficiency (RE) using tobacco data. The results in table 4 show that the proposed GP approach is more efficient than other calibration estimation methods. This can be observed in the increase in the efficiency of the proposed multivariate calibration estimator over the unbiased stratified estimator which is 4799% compared to 462.8%, 495.4% and 551% for the estimators of [3, 4, and 8] respectively. This emphasizes the importance of using GP approach for multivariate calibration estimation that provides more flexibility in dealing with the constraints of the model.

Simulation Study
In this section, a simulation study is conducted and demonstrated to assess the performance of the proposed estimator compared to the same methods used in the numerical example. A population with three strata is generated; the size of the stratum is taken as 3000, 2000, 5000 respectively. The study variable and two auxiliary variables are generated using R software from Multivariate Normal Distribution with parameters μ = (0, 0, 0) and ∑. The covariance matrix ∑ is determined based on the correlation levels ρ between the study and auxiliary variables. Therefore, the covariance matrix ∑ can be given by where the main diagonal's elements represent the variances of the three variables and the elements of the off diagonal represent the covariance between each two variables. Using the proportional allocation, a sample with size n h =1000 is selected by SRSWR, so n h can be given by Detailed information about the simulated data is given in table (5).
In order to compare the proposed GP approach for multivariate calibration estimation with the other existing calibration methods through the simulation study, a total of 1000 datasets are generated where each dataset represents a population of size 10,000 observations. Moreover, a sample with size 1000 observations is proportionally selected from each population under stratified sampling scheme. For each population and its selected sample, the means and variances for the three generated variables are calculated within the three strata. Since the values of the population parameters are unknown, they can be calculated from a pilot survey or obtained from a previous study. These calculated values for the means and the variances represent the input for the models under comparison.  As mentioned above, five models are considered in the comparison. Four models are from the literature and the fifth model is the proposed one. It must be noted that all of the models under comparison are solved for a total number of 1000 runs using GAMS software.
The performance of the different methods for calibration estimators under comparison are evaluated by the Relative Root Mean Squared Error "RRMSE" that can be defined by [11].
where Y denotes the population parameter and y (α) denotes the calibration estimators; specifically, α = Singh [3], Tracy [4], Rao [5], Rao [8] & the proposed GP estimator. Table (6) summarizes the simulation results for the three cases. Table 6 shows that in case 1, where equal weights are given for both of the variance and the bias of the calibrated estimator, the suggested multivariate calibration estimator is better than other existing calibration estimators. The simulated data's results showed that its RRMSE is lower than the RRMSE of [3], [4], [5] and [8] calibration estimators. Moreover, the proposed calibrated estimator in case 2 has a lower RRMSE than [8] calibration estimator but higher than the other estimators. Finally, the proposed multivariate calibration estimator in case 3 is considered the best among all estimators even the estimator in case 1.

Conclusions
This paper proposed a new approach for multivariate calibration estimation for the population mean of the study variable under SRS. New calibration weights are developed by minimizing the Manhattan distance function subject to new calibration constraints using Goal Programming approach. A comparison study using both of numerical example and generated data is performed to assess the precision of the proposed estimator. The results showed that the proposed estimator provides better estimates compared to other existing calibration methods. Specifically, using GP approach provides more efficient and flexible calibration estimator than other existing calibration estimation in the stratified sampling design.
For future research, the Euclidean distance (L2 Norm) could be used rather than the Manhattan distance (L1 Norm).