A Robust Estimator of R = P (X > Y ) of Heavy-tailed Distributions and its Sampling Distributions

Heavy-tailed distributions have wide applications in life-time contexts, especially in reliability and risk modeling. So we consider the estimation problem of reliability, R = P (X > Y ) for small samples, when X and Y are two independent but not identically distributed random variables belonging to the family of heavy-tailed distributions, using a robust estimator, namely the harmonic moment estimator. Extensive simulation studies are carried out to study the performance of this estimator. The relative efficiency of the estimator with the well known Hill estimator is studied. We obtain the sampling distribution of the parameters of the distribution as well as that of estimator of R which will help us to study the properties of the estimators. Also we find out the asymptotic confidence intervals of R and its performance is studied with respect to average width and the coverage probability, through simulations.


Introduction
Heavy-tails are characteristics of many phenomena where the probability of a single huge value impacts heavily. Record-breaking insurance losses, financial log returns, file sizes stored on a server, transmission rates of files etc. are examples of random variables following heavy-tailed distributions. The distribution of a random variable X is said to have heavy-tail if where L is slowly varying. That is, for x > 0 lim t→∞ L(tx) L(t) = 1.
Regardless of the distribution for small values of the random variable if the asymptotic shape of the distribution is hyperbolic, it is heavy tailed. Heavy-tailed distributions have wide applications in lifetime contexts, especially in reliability and risk modeling. We consider the estimation problem of reliability R = P (X > Y ) for small and large samples, when X and Y are two independent but not identically distributed random variables belonging to the family of heavy-tailed distributions.The heavy-tailed distributions like the student's t and Pareto families have been used to model data with high kurtosis see, [4]. The simplest heavy tailed distribution is the Pareto distribution. It is hyperbolic over its entire range.
The Pareto distribution is a power law probability distribution that coincides with social, scientific, geophysical, actuarial and many other types of observable phenomena. The univariate Pareto distribution is a simple model for non-negative data with a power law probability tail, at least approximately for large values of X. It is a useful model in the analysis of income data, reliability studies, risk modeling and business failure data [14]. Reference [1] gave an extensive historical survey of its uses in the context of income distribution. References [6], [16] and [19] discussed the application of the Pareto disrribution in various fields. Moreover Pareto Type I distribution has a position of importance in the field of life testing because this distribution can be considered as a distribution of failure time of a component as follows.
Suppose the life time of a particular component have an exponential distribution with failure rate λ and threshold parameter or guarantee time c. Now in a population of components, there could be variation in λ values because of small fluctuations in manufacturing tolerances. So the component can be regarded as one belonging to a random sub population (see, [15]). The effect of the changes in λ is accommodated by assuming λ as a random variable following a gamma distribution with scale parameter k and shape parameter α.
Then the failure time X of a component selected at random from such a mixed population has a Pareto distribution [12] with density function Stress-strength model describes the life of a component which has a random strength (X), subject to random stress (Y). The component fails at the instant that the stress applied to it exceeds the strength and the component will function satisfactorily when X > Y . Reliability of a component (R) during a given period is taken to be the probability that the strength exceeds the stress [P (X > Y )] during the interval.
The problem of estimating R when X and Y are independent exponential random variables is discussed in [18], [13] and [3]. In [2], the author studied the case when X and Y are independent Burr Random variables of Type XII. For more studies see, [17], [7], [9], [8] and [5].
Because of the real life applications of the Pareto distribution and because most of the real data (heavytailed) under consideration recently are Pareto tailed, the estimate of R and related inference is important. So in this study we are estimating the desired probability R = P (X > Y ) where X and Y are two independent but not identically distributed random variables following Pareto type I distribution, with parameters α 1 and α 2 . This paper is organized as follows. Section 2 discusses about Pareto type I distribution. Section 3 focus on the estimation of tail index using the harmonic moment estimation method. Estimation of reliability function of the Pareto type I distribution using the above mentioned tail index estimator with its performance have been successfully studied in Section 4. We obtain the sampling distribution of the harmonic moment estimator of tail index and of reliability R in Section 5. In Section 6 we estimate the asymptotic confidence interval of R and its performance is studied.

Pareto Type-I Distribution
The simplest heavy-tailed distribution is the Pareto distribution which is hyperbolic over its entire range and has the probability density function, and its cumulative distribution function is given by: where α is the shape parameter and tail index and c represents the smallest value, that the random variable can take. As α decreases, an arbitrarily large portion of the probability mass may be present in the tail of the distribution.

Harmonic Moment Estimator
To obtain the estimate of R, we have to estimate α 1 and α 2 . For that we are using the harmonic moment estimator.
The harmonic moment estimator is introduced in [10] for estimating the tail index of distributions, in the maximum domain of attractions of the Frechet distribution, with F (x) as in (1). Some distributions satisfying (1) are the Burr, Frechet, F, inverse Gamma, log Gamma and Pareto. For more details see, [10]. This tail index estimator is important because one uses only a small number of the largest order statistics when estimating the tail index.
For the Pareto type I distribution with probability density function (2), we obtain the estimate of α aŝ Here we are assumed that there is an exact Pareto tail beyond some high threshold u, but the form F(x) is not specified for x < u. That is so that the harmonic moment estimator for α using the upper order statistics (descending) above a random threshold Moreover it is a robust estimator. The harmonic moment estimator satisfies certain very important properties. For details see, [10].

Estimation of R
Let X and Y be random variables distributed as Pareto type I distribution with parameters α 1 and α 2 respectively. The probability density functions of X and Y are given by and Using equations (5) and (6), the function R will be Clearly R depends only on α 1 and α 2 .
We have conducted a simulation study to estimate R of the Pareto type I distribution using simulated observations from the distributions with α 1 = 0.8 and α 1 = 1.6 and α 2 = 1.5.
The following steps will be considered for obtaining the numerical results.
Step II Using the harmonic moment estimator, obtain 1000 estimates ofα 1 .
Step III Similarly generate 1000 random samples from the Pareto distribution with parameter α 2 and obtainα 2 .
Step IV Using equation (7), we obtain the estimate of R using the harmonic moment estimates of α 1 and α 2 respectively.

Performance Study
An extensive numerical investigation is carried out to study the performance of the estimator using the simulated random samples from the Pareto Type I distribution in terms of average bias and average M.S.E.
The results are given in Tables 1 and 2. The absolute bias of the harmonic moment estimate is very small. The MSE is symmetric with respect to α 1 and α 2 . Though we have considered different choice for the parameter α 1 , there is no appreciable difference in the average bias and average MSE accordingly.

Remark:
Given X (k+1) ≥ u, a choice of θ is given by The harmonic moment estimator to the well known Hill estimator [11] was compared by Henry in [10] and it was shown that the harmonic moment estimator outperforms the Hill estimator for estimating the tail index of a distribution in small sample situations, in terms of bias, variance and mean squared error. Also the harmonic moment estimator works well in terms of efficiency for large θ. That is for θ > ( √ 1/ϵ − 1 ) /α, since the asymptotic relative efficiency of Hill k,n toα k,n (θ) is AV ar(Hill k,n ) AV ar(α k,n (θ)) = θα(αθ + 2) (αθ + 1) 2 = 1 − (αθ + 1) −2 .
Moreover, the harmonic moment estimator is a robust estimator. For the Hill estimator, the asymptotic B.P is zero where as for harmonic moment estimator, the asymptotic B.P is in (0, 1 2 ] depending on the true value of α and the choice of the tuning parameter θ.

Sampling distribution of Estimators
In this section we obtain the sampling distribution of the parameters of the Pareto type I distribution and harmonic moment estimators of reliability R. we apply the Pearson's technique for the two estimators to obtain the sampling distribution. Knowing the distribution, we can study the properties of the estimators.
Pearson's criterion for fixing the distribution family in a particular case is where β 1 and β 2 are the measures of skewness and kurtosis respectively. According to the values of K, there are seven types of Pearson curves. they are: Type I when K < 0 Type II when K = 0, β 1 = 0 and β 2 < 3 Type III when 2β 2 = 3β 1 -6 = 0 and K = ∞ Type IV when 0 < K < 1 Type V when K = 1 Type VI when K > 1 Type VII when K = 0,β 1 = 0 and β 2 > 3 The steps to be followed for obtaining the sampling distribution of the estimators are Step 1: Generate 1000 random samples X 1 , X 2 , ..., X n from Pareto type I distribution with sample sizes n=15,20,30 with shape parameter α 1 =0.5 and 1.6. Using these data we obtain 1000 estimates of the harmonic moment estimates of α 1 . Similarly we can obtain the harmonic moment estimates of Pareto type I with parameter α 2 . Then using (7) we can obtain 1000 estimates of R. Step 2: Calculate Pearson's coefficient K and then determine the types of Pearson curves. Table 2 shows the sampling distribution of the parameter α 1 and of Reliability R. Since the harmonic moment estimators outperform for small samples we conduct the numerical study of harmonic moment estimators for sample sizes n=15,20 and 30. Using SAS package, we did the computational works.
From Table 3, it can be noted that the sampling distributions of the harmonic moment estimators of the shape parameter (α) are of type I and IV when α 1 < α 2 and α 1 > α 2 where as the sampling distributions of the Harmonic moment estimates of the reliability function R are of Types I and IV, when α 1 < α 2 and type I only when Each type of Pearson's system has a specific probability density function. The type I Pearson curves has the density function where a 1 , a 2 , L 1 and L 2 are the parameters of the family of distributions and k is constant. Type I is Beta distribution.
The density function of the type IV Pearson curves is, where k is a constant and a,λ and ν are the parameters of the distribution of type IV Pearson curve.

Asymptotic Confidence Interval of R
The point estimator of R is obtained using harmonic moment estimates of α 1 and α 2 asR Har = R(α 1 ,α 2 ).
Then using delta method, and and as m −→ ∞, n −→ ∞, we havê Then the asymptotic 95% confidence interval for R Har is given bŷ and the average length of the asymptotic 95% confidence interval of R Har is

Performance Study
Now we study the performance of the asymptotic confidence interval with respect to the average confidence length and coverage probability. The results are given in Tables 4 and 5 respectively.
From tables 4 and 5, it is clear that the average confidence length is reasonably small and the coverage probability is somewhat close to the nominal value in the case of asymptotic confidence interval. So asymptotic confidence interval can be used for finding the confidence interval of R.

Conclusion
We have estimated R=P (X > Y ) with respect to two independent Pareto type I distributions with shape parameters α 1 and α 2 . This study can be used in two dimensions. Firstly for comparing two distributions with common base distribution (in medical studies, for studying the effectiveness of drugs) and secondly as a measure of reliability or for conducting stress-strength analysis. As a failure time distribution, this measure can be used as a measure of reliability for Pareto type I distribution.
In practical situations sometimes we have to deal mostly with small samples. For small samples, the harmonic moment estimator, which maintains satisfactorily high performance over a specified range of departures from the ideal model and also not too less efficient than the M.L.E and the Hill estimator. The simulation study also suggested the effectiveness of the harmonic moment estimate of R for small samples. Moreover it is a robust estimator. But it is not much good for for large samples. Also we find the sampling distribution of both the tail index estimator and the estimator of R. This will help us to study the properties of the estimators. We estimated the asymptotic confidence interval which performs well with respect to the average length of the confidence interval and coverage probability. Further this study can be extended to the general class of distributions with Pareto tail having index α, whose MLE is not a robust estimator.