B-spline Estimation for Force of Mortality

The paper focuses on the estimation of the force of mortality of living time distribution. We use a third-order B-spline function to construct the logarithm for force of mortality of living time. The number of the knots, their locations and B-spline coefﬁcients based on a sample of observations are estimated by the maximum likelihood estimation method. Evaluation of B-spline parameters estimated by maximum likelihood estimation tested with criteria of the modiﬁed chi-squared goodness of the ﬁt statistic. An algorithm developed to calculate Sequential Procedure for the modiﬁed chi-squared goodness of the ﬁt testing. The Matlab code was written using the algorithm. Within this evaluation, the number of knots in the model has signiﬁcantly reduced. The developed method was used to explain the mortality rate of women aged 0 to 69 among the Mongolian population in 2019 and estimate the life expectancy of Mongolians. The results of this experiment provided an excellent estimation of the force of mortality. Construction of a mortality rate estimation gives possibilities to determine mortality trends and force of mortality. Here, force of mortality is further used to construct a survival function, a lifetime distribution function, and a lifetime distribution probability density function. The method can also be used in ﬁnancial market models and in models that estimate the useful life of equipment.


Introduction
In any nation, setting up the force of mortality and the living time distribution are crucially important in actuarial science, life insurance, health, and demographic surveillance system. Thus, estimation of the force of mortality has become the most important issues among the researchers.
De Moivre (1729) [1] applied a survival model in actuarial science. He estimated that the function of the force of mortality is λ(x) = 1 w−x , where survival function is s(x) = 1 − x w when 0 ≤ x < w. Since then other scientists have made important contributions. Gompertz (1825) [2] made an attempt to model the force of mortality. He proposed the force of mortality as with parameters B > 0, c > 1, x ≥ 0. In this case, we can find the survival function for the Gompertz distribution: In 1860, Makeham [3] extended Gompertz' equation by adding a constant, A > 0: Accointing to (3), the survival function can be found: when B > 0, A ≥ −B, c > 1, x ≥ 0. The Weibull [4] model has been applied in a mortality context, though it was developed for the failure of technical systems due to wear and tear. He suggested the force of mortality as with the survival function: A three-component, competing-risk mortality model, developed for animal survival data, has been proposed by Siler [5]. This model aims at portraying the whole of the age range with five parameters. On the other hand, Anson [6] proposed a fifth degree polynomial to represent the hazard rate for humans. A comprehensive review of the models for human population over ages has been provided by Gavrilov and Gavrilova [7]. The survival function and the force of mortality were also proposed by [8]. Here, the structure of the survival model and the force of mortality are defined as: and respectively. Here, cross validity prediction power, (n is the number of observations, k is the number of predictors in the model, R is the correlation between observed and predicted values of the dependent variable), was applied for the testing of the stability of the fitted model. The method for calculating the (n-1)-th order of B-Spline, number of knots, k, knots location, t k,n , and the regression coefficients θ are proposed in [9]. Here, the linear space of all n-th order spline functions defined on a set of non- denoted by S t k,n , where t n = a, t n+k+1 = b. They used splines with simple knots, except for the n left and right most knots which were assumed to be coalescent, i.e. t k,n = {t 1 = . . . = t n < t n+1 < . . . < t n+k < t n+k+1 = . . . = t 2n+k }.
A spline regression function f ∈ S t k,n , can be expressed to be where θ = (θ 1 , . . . , θ p ) is a vector of real valued regression coefficients and N n (x) = (N 1,n (x), . . . , N p,n (x)) , p = n + k are the Bsplines of order n, defined on t k,n .
It is well known that j i=j−n+1 N i,n (t) = 1, for any t ∈ [t j , t j+1 ), j = n, . . . , n + k and N i,n (t) = 0 for t / ∈ [t i , t i+n ] respectively. Vladimir K. Kaishev and etc [9] defined 5% as relative error of the estimation of θ parameter that has the least squared sum of a distance between the empirical distribution of life time and the distribution with force of mortality. This estimation defined the approximate value of the experimental distribution. Our main goal is to evaluate the theoretical distribution. The asymptotic distribution needs to be determined in order to verify that this assessment is consistent with the theoretical distribution. Since this asymptotic distribution has not defined in the study [9], we used maximum likelihood estimation-θ, for which the asymptotic distribution determined. Therefore, we used modified chi-square goodness of fit test on the hypothesis of whether the theoretical distribution was included in the family distribution with λ(x, θ) force of mortality. The logarithmic assessment of the lifetime of the human life of the total population of Mongolia developed by O. Tserenbat and etc [10]. In this assessment, they have used the data of population deaths in 2003-2008, with use of the method described in [9] for the evaluation of the B-spline function parameters. The results from the assessment lead to the obtaining the quadratic spline fit with k = 20 knots for both total population. Khaoula Aidi, Sanki Dey and Azeem Ali developed in a new bounded distribution from the exponentiated Weibull distribution by transformation of the type x = T /(1 + T ), where T has the exponentiated Weibull distribution [11]. They obtained a new distribution with support on (0, 1), which call it bounded exponentiated Weibull (BEW) distribution.
This distribution was capable of modelling decreasing and bathtub shaped hazard rate. They also obtained maximum likelihood estimators for unknown parameters of the model based on right-censored data.
They used modified chi-squared statistic developed by Bagdonavicius and Nikulin(2011) for some parametric accelerated failure times models [12]. It is well known that, if the sample size is large, then the evaluation of the distribution parameters by MLE is preferable. An example of this is presented in the recent work of Siti Aisyah Zakaria and etc [13], where they used the MLE in the evaluation the distribution parameters.
In this paper, we evaluate the logarithm of the force of mortality in the form of the third order B-spline, where its parameters have been estimated by the Maximum Likelihood Estimation. We examine the fit of the obtained distribution with the empirical distribution by using chi-square goodness of fit statistics in [14,15,18]. Here, 3rd order of the spline,with k = 4 knots were defined to be in good fit statistics.

Living time distribution and its empirical estimation
Let's denote by X is the living time of people. Then, by assuming X ≥ 0 to be continuous random variable, we define distribution function F X (x) as and Here, the function is called the survival functions of the random variable X. From the definition of conditional probability, for X ≥ x In this case, the conditional probability death of person with age x within the times interval x + ∆x, for given X ≥ x is: Here, f X (x) = F X (x) the density function of living time distribution. The function is called the force of mortality function. Survival function is expressed as a function of the force of mortality and can be written in the following Thus, the cumulative distribution and the density functions are expressed via the force of mortality as: and Let n x denotes the number of death of people aged in the interval x − 1, x. x = 1, 2, 3, · · · , 100. and ν 101 the total number of death of people above age N = 101 x=1 n x . Then empirical estimation for the function of living time distribution and empirical estimation for the force of mortality are defined aŝ respectively.
3 B-spline estimation for the logarithmic force of mortality by the MLE method

B-spline
Consider a set of B-spline functions of the nth order defined on a set of knots g(x, θ, c k,n ) ∈ S k B-splines are defined on the set of knots through recurence relation (T is the symbol transposed in the matrix) Basis B-splines are defined on set of knots c k,n through the Mansfield-De Boor-Cox recurrence relation from which it can be seen that

Maximum Likelihood Estimation
Suppose that a sample X 1 , X 2 , . . . , X N of random variables chosen according to of family of distribution F (x, θ). Here, θ = (θ 1 , θ 2 , . . . , θ s ) ∈ Θ ⊂ R s , θ, R s and Θ are s-dimensional unknown parameter, s-dimensional the space of real vectors and s-dimensional open set respectively. Let's denote by θ 0 the true value of θ. Then, the principle of maximum likelihood yields a choice of the estimator θ as the value for the parameter that makes the observed data most probable.
The likelihood function is the density function regarded as a function of θ, The maximum likelihood estimator (MLE) is the value of θ defined to be If behavior at θ = θ 0 , the density probability function f (x, θ) satisfies a regularity condition, where √ N ( θ − θ 0 ) vector statistics has the zero expectation and an asymptotic multidimensional normal distribution with a covariance matrix I −1 (θ 0 ) (N → ∞). Here, I(θ) is the Fisher information matrix and written as where E(·)− the expectation operator. When s = 1, for any consistent estimator θ * = θ * (X 1 , X 2 , . . . , X N ) of θ parameter or maximum likelihood estimation θ is an efficient estimator of the limit. Also, when s > 1 The above matrix is a nonnegative definite matrix or an asymptotic efficient estimator in the Cramer-Rao sense. Since we deal with population data, we used MLE that is appropriate for the large size sample.

Estimation of the force of mortality
An additional condition λ(x) ≥ 0 is required to evaluate the theoretical force of mortality λ(x). However, no additional condition is required to evaluate the logarithm of the force of mortality function by B-spline.
Now, by approximating the force of mortality as the probability density function is expressed by B-spline: For a fixed order of the B-spline n, given a sample of observa- intervals, n i is the frequency of x i ), we estimate the number of knots k, their locations c k,n and the value of θ parameters by the maximum likelihood estimator. Then, the logarithm likelihood function is defined as: The value of θ and c k,n that maximizes the function (33) will be solutions to following optimization problem.
T is the symbol transposed in the matrix.
If null hypothesis is true value, then next limit theorem is valid, Here, H k−1 (x) is a chi-square distribution with (k−1) degrees freedom. Based on this limit theorem, the χ 2 1N (θ 0 ) statistics is used as testing statistics whether the H 0 hypothesis is accepted. However, since θ 0 is unknown, χ 2 1N (θ) statistic cannot be used as testing statistic whether H hypothesis is accepted. Instead, let's assume that χ 2 1N ( θ) statistics can be used. Here, θ = θ(X 1 , X 2 , . . . , X N ) is maximum likelihood estimation.
Evaluation of B-spline parameters θ estimated by MLE was tested with criteria of the modified chi-squared goodness of the fit statistic. In order to test the composite hypothesis we need to divide the real numbers into k disjoint intervals [0, b].
In present work, the interval [0, b] is divided into subintervals by using the above mentioned method, where calculation of the modified chi-square testing value is performed using (37-45). Then the modified χ 2 goodness of fit test is derived as: Here, χ 2 N,k statistics in the limit at N → ∞ has a chi-square distribution with k − 1 degree of freedom. Theorem: At n th order B-spline S k is: for (53) If θ, c n,k denotes the Maximum Likelihood Estimation of the parameter θ, c n,k based on the sample, then the modified statistic χ 2 N,k ( θ, c k,n ) is defined as (51). If the hypothesis H k is true, then as the sample size N → ∞, the distribution function of χ 2 N,k converges to H k−1 (x). i.e is wriiten as: The logarithm likelihood function is defined in the following.
3. When the knot has two internal knot points i.e k = 2, the 3rd order B-spline functions set in the segment [a, b] is written as: Then the logarithm likelihood function becomes Here, θ = {θ 1 , θ 2 , θ 3 , θ 4 , θ 5 } and c 1 , c 2 parameter estimation denoted by θ = { θ 1 , θ 2 , θ 3 , θ 4 , θ 5 }, c 1 , c 2 . In this case, we write the H 2 and alternative hypotheses as: We accept the hypothesis H 2 when χ 2 k, 2 < χ 2 k−1,α or K 1 when χ 2 k, 2 > χ 2 k−1,1−α and move to the next step. Otherwise, by accepting the K 2 hypothesis and moving to the next step we move to the rth step The ln λ(x) valuation subset of the third-order B-spline function H r hypothesis: alternatives hypothesis: We accept and stopped this sequential procedure the hypothesis H r when χ 2 k, r < H k−1,α .

Numerical experiments
The MATLAB program was used to compute the death rate data of women aged 0-69 in the Mongolian population of 2019. The modified test: χ 2 k, r = 4.6527 · 10 3 , H k−1,α = 1.6303 · 10 6 . Since χ 2 k,4 < H k−1,0.95 is satisfied when k = 4, force of mortality becomes The figure 1 the graphs of the force of mortality, with these knot points.

Conclusion
In this study, we constructed the approximate function living time distribution and the force of mortality. In the current context, we estimate the logarithm of the force of mortality in the form of the third order B-spline.
Estimation parameters have been evaluated by the maximum likelihood estimation. The fit of the obtained distribution with the empirical distribution was examined using sequential procedure for the modified χ 2 goodness of fit statistics.
We performed a numerical experiment to construct a logarithmic estimation of life expectancy at the 2019 population statistics of Mongolia using a 3rd order B-spline function. As a result of numerical experiments, the parameters of the B-spline function were found and tested with the modified chi-square testing.
This algorithm can be used further to determine the depreciation period of financial markets and machinery. B-spline estimation is shown to be the an optimal model with low variance for the large amount of data. This method drastically reduces the number of knots in the B-spline function, making the design easier.