Methods of Stratification for a Generalised Auxiliary Variable Optimum Allocation

In stratified sampling, ever since Dalenius [1] undertook the problem of optimum stratification, the research in the area has been progressing in various perspectives and dimensions till date. Amidst the multifaceted developments in the trend of the research, consideration of the topic by taking into account various aspects such as different sample selection methods and allocations, study variable based stratification, auxiliary variable based stratification, superpopulation models, extension to two study variables for a single auxiliary variable, extension to two stratification variables for a single study variable etc., are a few noteworthy ones. However, with regard to considering optimum stratification of heteroscedastic populations, as live populations are generally heteroscedastic, it was Gupt and Ahamed [2,3] who considered the problem for a few allocations under a heteroscedastic regression superpopulation (HRS) model. As a sequel to the work of the authors, in this paper, the problem of optimum stratification for an objective variable 𝑦 based on a concomitant variable 𝑥 under the HRS model is considered for an allocation proposed by Gupt [4,5] and termed as Generalised Auxiliary Variable Optimum Allocation (GAVOA). Methods of stratification in the form of equations and approximate solutions to the equations which stratify populations at optimum strata boundaries (OSB) and approximately optimum strata boundaries (AOSB) respectively are obtained. Mathematical analysis is used in minimizing sampling variance of the estimator of population mean and deriving all the proposed methods of stratification. The proposed equations divide heteroscedastic populations, symmetrical or moderately skewed or highly skewed, at OSB, but, the equations are implicit in nature and not easy in solving. Therefore, a few methods of finding AOSB are deduced from the equations through analytically justified steps of approximation. The methods may provide practically feasible solutions in survey planning in stratifying heteroscedastic population of any level of heteroscedasticity and the work may contribute, to some extent, theoretically in the research area. The methods are empirically examined in a few generated heteroscedastic data of varied shapes with some assumed levels of heteroscedasticity and found to perform with high efficiency. The proposed methods of stratification are restricted to the particular allocation used.

Abstract In stratified sampling, ever since Dalenius [1] undertook the problem of optimum stratification, the research in the area has been progressing in various perspectives and dimensions till date. Amidst the multifaceted developments in the trend of the research, consideration of the topic by taking into account various aspects such as different sample selection methods and allocations, study variable based stratification, auxiliary variable based stratification, superpopulation models, extension to two study variables for a single auxiliary variable, extension to two stratification variables for a single study variable etc., are a few noteworthy ones. However, with regard to considering optimum stratification of heteroscedastic populations, as live populations are generally heteroscedastic, it was Gupt and Ahamed [2,3] who considered the problem for a few allocations under a heteroscedastic regression superpopulation (HRS) model. As a sequel to the work of the authors, in this paper, the problem of optimum stratification for an objective variable based on a concomitant variable under the HRS model is considered for an allocation proposed by Gupt [4,5] and termed as Generalised Auxiliary Variable Optimum Allocation (GAVOA). Methods of stratification in the form of equations and approximate solutions to the equations which stratify populations at optimum strata boundaries (OSB) and approximately optimum strata boundaries (AOSB) respectively are obtained. Mathematical analysis is used in minimizing sampling variance of the estimator of population mean and deriving all the proposed methods of stratification.
The proposed equations divide heteroscedastic populations, symmetrical or moderately skewed or highly skewed, at OSB, but, the equations are implicit in nature and not easy in solving. Therefore, a few methods of finding AOSB are deduced from the equations through analytically justified steps of approximation. The methods may provide practically feasible solutions in survey planning in stratifying heteroscedastic population of any level of heteroscedasticity and the work may contribute, to some extent, theoretically in the research area. The methods are empirically examined in a few generated heteroscedastic data of varied shapes with some assumed levels of heteroscedasticity and found to perform with high efficiency. The proposed methods of stratification are restricted to the particular allocation used.

Introduction
In sample survey, since the precision of an estimator of a population parameter depends on the heterogeneity of the units of the population besides the sample size and sampling fraction, the role of stratified sampling method comes into play as one possible way to enhance the precision of the estimator. In stratified sampling, a heterogeneous population is divided into a number of strata so as to increase the homogeneity among population units within strata and then a sample is drawn from each stratum by using any suitable sample selection method. The main aspects which are to be dealt with tactically for enhancing precision of an estimator of a population parameter are construction of strata, number of strata to be made, and allocation of sample size to strata. In the construction of strata, the major concerns are determination of OSB and choice of the best characteristic.
Tschuprow [6] and Neyman [7] developed method of allocation of sample size to strata for the first time in stratified sampling based on the characteristic under study. Cochran [8] showed that superpopulation model could be constructed such that finite population under study can be considered as a simple random sample from the superpopulation that provided information on auxiliary variable highly correlated with study variable is available. Among many sample size allocations to strata, it is pertinent to mention that Hanurav [9] and Rao [10] used information of the auxiliary variable for allocation of sample size to strata under the following superpopulation model.
, , 2 and are the superpopulation parameters and the script letters , and denote conditional expectation, variance and covariance given ′ respectively. Gupt and Rao [11] considered problem of optimum allocation of sample size to strata for probability proportional to size with replacement (PPSWR) under particular case, i.e., intercept = 0 , of the superpopulation model (1).
It was Dalenius [1] who pioneered the work for determining OSB in stratifying population based on characteristic under study. Dalenius and Gurney [12] conjectured that by taking ℎ ℎ constant, the optimum points of the study variable that divided the population into strata could be determined, where ℎ and ℎ are the stratum weight and standard deviation of the characteristic in the ℎ ℎ stratum. Mahalanobis [13], and Hansen, et al. [14] postulated that OSB of the study variable could be determined by keeping ℎ ℎ constant, where ℎ is the mean for in ℎ ℎ stratum. Dalenius and Hodges [15] endorsed the conjecture of Dalenius and Gurney [12] and again Dalenius and Hodges [16] proposed method of cumulating the values of √ ( ) for finding OSB. Ekman [17] proposed that OSB of study variable could be obtained by ensuring ℎ ( ℎ − ℎ−1 ) constant. Sethi [18] demonstrated that postulates made by Hansen, et al. [14] did not give OSB for certain types of population.
It is unrealistic to assume that stratification should be done based on study variable whose information is not available in practice and therefore some other known variable which is highly correlated with the study variable should be used. For a given number of strata, Dalenius [97] obtained equations giving OSB based on auxiliary variable for proportional and Tschuprow [6] and Neyman [7] optimum allocation (TNOA). Taga [20] too obtained OSB for the objective variable based on the concomitant variable and showed that the optimum stratification method, he proposed, reduced to the optimum decomposition of the distribution function ( )for the random variable z= ( ), where ( ) is the regression function of on . It was Singh and Sukhatme [21] who first made a breakthrough in not only obtaining equations giving OSB but also methods of approximation for obtaining AOSB for TNOA and proportional allocation based on auxiliary variable when both the form of regression of study variable on and variance function ( | ) are known, and since then a number of researchers such as Singh [22], Singh and Sukhatme [23], Singh [24][25][26], Singh and Prakash [27], and Yadava and Singh [28], to mention a few notable ones among many, furthered the work in the direction under various allocations and sample selection methods. The theory of optimum stratification based on single study variable with and without auxiliary variable was extended to more than one variate by, inter alia, Ghosh [29], Sadasivan and Aggarwal [30], Gupta and Seth [31], Rizvi et al. [32], Rizvi et al. [33,34], Verma [35], etc., under various allocations, sample selection methods and some other exceptional conditions. Rabee et al., [36] proposed a multivariate calibration estimation of the population mean of a study variable under stratified random sampling scheme using two auxiliary variables.
Gupt [4,5] dealt with problem of allocation of sample size to strata by modifying the model (1) into a more general form in which element of correlation among the units within strata was taken into account, and hence he obtained a few generalised model-based allocations; Gupt and Ahamed [2,3] obtained methods of stratifying heteroscedastic populations for finding OSB and AOSB for two of the few generalised model-based allocations under simple random sample with and without replacement (SRSWR and SRSWOR) designs. Gupt et al., also [37] obtained equations giving OSB and AOSB for the auxiliary variable optimum allocation (AVOA) proposed by Hanurav [9]. One of the above mentioned generalised allocations proposed by Gupt [4,5] under the presumption of equality of co-efficient of variation of 2 ⁄ in all strata, is as follows: provided the ratio are equal in all strata .
In this paper, we deal with the problem of finding OSB and AOSB for the allocation (2), i.e., GAVOA under SRSWR design which also holds true for SRSWOR design when finite population correction is neglected.
The paper comprises five sections. Section 2 contains the derivation of equations giving OSB and section 3 gives the derivation of the approximate solutions of the equations giving OSB to obtain a few methods of stratification giving AOSB. In section 4, empirical illustration of all the proposed methods is conducted in some generated populations and the results are discussed about. Section 5 gives conclusion.

Method of Obtaining OSB
The GAVOA given by (2) can be represented as follows: where is the number of strata into which the population of size is divided such that ∑ ℎ ℎ=1 = , and ℎ is size of the sample to be selected from the population of size ℎ of the ℎ ℎ stratum such that ∑ ℎ ℎ=1 = .
Using (1) and (3), the sampling variance of the stratified sampling can be expressed as follows: From (4), we can again obtain Since Gupt [4,5] presumed ℎ ( 2 ⁄ ) is equal for all ℎ and obtained the allocation (2) provided is equal for all ℎ, here we assume constants and ′ in lieu of in (5), we get the following: Since 2 , 2 , and ′ are the positive real valued quantities, minimising ( ̅ ) in (6) is equivalent to minimising the following expression.
Differentiating expression (7) partially with respect to ℎ and equating the result to zero, we get Considering ( ) as probability density function of the auxiliary variable , we have the following: From (8), we have Putting = ∑ ℎ ℎ + ∑ ℎ ℎ ( 2 ⁄ ) and = ∑ ℎ ℎ ( 2 ⁄ ) in (11) and using (9) and (10), we obtain the terms of the equations (11) as follows: From (11) and (12), we get Equations (13) give OSB of the objective variable in terms of the auxiliary variable .

Approximate Solutions to the Equations Giving OSB to Obtain Methods of finding AOSB
In this section, we derive approximate solutions of equations (13) that will give AOSB in stratifying heteroscedastic populations. The techniques used by Singh and Sukhatme [21], Gupt and Ahamed [2,3] etc., are followed in obtaining series expansions of the system of equations (13). The techniques adopted by Singh and Sukhatme [21] by using Ekman's [38] identity in carrying out series expansion of conditional mean and variance are also followed and extended in this section in the same way Gupt and Ahamed [2,3] explored. We assume ( ) = 2 ⁄ , and it is also stipulated at this juncture to assume the existence of continuous first two partial derivatives of ( ) and ( ) with respect to ℎ in [ ℎ−1 , ℎ+1 ] for all values of ℎ.
Considering parameters, terms and expressions of the right hand side of equations (13), evaluating all the derivatives at = ℎ , ∀ ∈ [ ℎ , ℎ+1 ] and putting ℎ+1 = ℎ+1 − ℎ , we proceed as follows: Adding (14) and (15) Again from (15), we get . (17) Multiplying (16) and (17) Similarly, we can obtain Therefore, using (18) and (19) we can obtain right hand side of equations (13)  Similarly, the left hand side of (13) can be obtained as Thus, equations (13) can be written as Squaring both sides and expanding by binomial theorem, we can get The identity obtained by Singh and Sukhatme [21] and used by Yadava and Singh [28], Gupt and Ahamed [2,3], Gupt et al. [37] etc., is as follows: By using (20) and (21), we can proceed as follows: Thus, the AOSB are given by (22) and equivalently by (23), and the values of constants 1 and 2 can be approximately evaluated as, respectively, where we assume and are upper and lower bounds of the points of stratification ℎ ′ , i.e., ≤ ℎ ≤ . The approximate solutions of OSB, i.e., ℎ ′ can be calculated by any of the methods (22) and (23) given lower boundaries ℎ−1 .
From the above analytical work involved in finding AOSB, the following theorem has been inferred. However, in the application of methods of finding AOSB (22) and (23), it is required to determine the values and under the conditions of AOSB for which the following lemma is used.

Empirical Illustrations of the Proposed Methods of Stratification and Discussion on Their Results
In the empirical illustration, as was done by Singh and Sukhatme [21], Gupt and Ahamed [2,3], Gupt et al. [37] etc., the data generated by the following probability density functions (pdf) in the given ranges are used. We calculate OSB by using the proposed equations (13) and, in the calculation of AOSB, among the proposed two equivalent methods of approximation (22) and (23), we conveniently use (23). Population generated by each of the pdfs for every assumed level of heteroscedasticity -=1, =1.5 and =2is divided into number of strata = 2, 3, 4, 5, 6. For any number of strata while using the proposed methods of stratification in stratifying the populations to get OSB and AOSB, successive iterations are executed till optimum points converge and then the resulting sampling variances are calculated. At the same time, for each population, equal interval stratification is done for each considered number of strata and corresponding sampling variances are calculated. Efficiency of each method of stratification is found by comparing its variance at optimum points with respect to that of equal interval stratification in each population for each considered number of strata. The comparisons are shown in Tables (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12). In the tables, equal interval stratification is denoted by the abbreviation EIS and relative efficiencies by RE.
In the generation of populations, ( | ) = + is taken to be linear with slope at 45 and = 0 is also assumed. The constant 2 in the conditional variance ( | ) = 2 is determined, given =1, =1.5 and =2, so that 90% of the total variation is accounted for by the regression. We truncate right triangular distribution such that area under the curve to the right of the truncation point is 0.05. In the case of exponential and chi-square distributions, truncation is done in such a way that area under the curve to the left as well as right is 0.05. Numerical differentiation and integration methods are used in solving (23). For uniform populations, it is logical to consider that equal interval stratification is efficient stratification method. The proposed methods of stratification (13) and (23) are stratifying the populations of all the three considered levels of heteroscedasticity for each number of strata, mostly, with almost same efficiencies or slightly higher efficiencies than that of equal interval stratification. Therefore, the proposed stratification methods can be considered efficient stratification methods. In the skewed populations generated by Right Triangular Distribution, it is seen that the proposed methods of stratification (13) and (23) are performing with much higher efficiencies than that of equal interval stratification for all the number of strata, although when number of strata is 6, the efficiency is relatively less compared to efficiencies in lower number of strata in each of the three populations. Therefore, the proposed methods perform efficiently in stratifying populations of all the considered levels of heteroscedasticity. Both the proposed methods work with more or less same efficiencies. In the case of populations generated by exponential distribution, the proposed methods of stratification (13) and (23) perform with much higher efficiencies than that of equal interval stratification. For =1, except for number of strata 2, method of approximation (23) performs with slightly higher efficiency than equations (13) giving OSB. For =1.5 and =2, in the numbers of strata 2, 3, 4, equations (13) perform slightly better than (23), whereas in numbers of strata 5 and 6, method (23) perform slightly better than (13). In the populations generated by chi-square distribution too, the proposed methods stratify populations with much higher efficiencies than equal interval stratification. In the heteroscedastic population for =1, method of approximation (23) perform slightly better than equations (13). For =1.5 and =2, in all considered number of strata, except strata no., 6 for =2, equations (13) perform slightly better than method of approximation (23).

Conclusions
It has been demonstrated in this paper that the proposed methods of stratification for GAVOA are found to stratify heteroscedastic populations efficiently. Both the methods -equations (13) giving OSB and method of approximation (23) giving AOSB -are not only obtained analytically but also found to stratify populations of varied nature, symmetrical or moderately skewed or highly skewed with different levels of heteroscedasticity, with higher efficiencies. Although equations (13) are found arduous in practical applications, the methods of giving AOSB (22) and (23) are easy to use. It is also demonstrated that equations (13) giving OSB and method of giving AOSB (23) work with more or less same efficiencies in a number of populations. The methods of approximations (22) and (23) are proven to be equivalent. Therefore, any of the two proposed methods of stratification can be easily used in stratified sampling.