An Analysis about Fourier Series Estimator in Nonparametric Regression for Longitudinal Data

Fourier series is a function that is often used Mathematically and Statistically especially for modeling. Here, Fourier series can be constructed as an estimator in nonparametric regression. N onparametric regression is not only using cross section data, but also longitudinal data. Some of nonparametric regression estimators have been developed for longitudinal data case , such as kernel, and spline. In this study, we concentrate to develop an inference analysis that related to Fourier series estimator in nonparametric regression for longitudinal data. Nonparametric regression based on Fourier series is capable to model data relationship with fluctuation or oscillation pattern that represents with sine and cosine functions. For point estimation analysis, Penalized Weighted Least Square (PWLS) is used to determine an estimator for parameter vector in nonparametric regression. Different with previous studies, PWLS is used to get smooth estimator. The result is an estimator for nonparametric regression curve for longitudinal data based on Fourier series approach. In addition, this study also investigated the asymptotic properties of the nonparametric regression curve estimators using the Fourier series approach for longitudinal data, especially linearity and consistency. Some study cases based on previous research and a new study case is given to make sure that Fourier series estimator in nonparametric regression has good performance in longitudinal data modeling. This study is important in order to develop further inferences Statistics, such as interval estimation and test hypothesis that related nonparametric regression with Fourier series estimator for longitudinal data.


Introduction
In Mathematics and Statistics, Fourier series are popular for modelling, especially related with periodic function that have flexibility to estimate data pattern that have oscillation or seasonal pattern. Based on study in Statistics methods, Fourier series often be used in time series analysis, and regression analysis, specially nonparametric regression. Generally in regression analysis, if the data pattern cannot satisfy an assumption test, or there are many insignificant parameters, one of the solutions is using nonparametric regression. Nonparametric regression has a flexibility in modeling data pattern which be unknown, so the regression curve looks for the data pattern [1].
The estimator based on the Fourier series function is one of several estimators for nonparametric regression that interest to be studied. Fourier series estimator can overcome data whose patterns can be approximated by trigonometry. The trigonometry is approached by sine and cosine functions like finite Fourier series as usual [2]. Data patterns that correspond to the Fourier series approach are repeated data patterns. The process that related with repeating data patterns occurs in the value of the response variable for different predictor variables [3]. Fourier series in nonparametric regression has high flexibility in modeling the relationship between predictors and responses that have oscillation pattern, and combination between trend and seasonal data pattern [4]. The Fourier series allows obtaining solutions and can be implemented in many problems 502 An Analysis about Fourier Series Estimator in Nonparametric Regression for Longitudinal Data [5].
The first study that related to the Fourier series estimator in nonparametric regression is Bilodeau [6]. Some next studies about Fourier series estimator in nonparametric regression such as Biedermann et.al., [7] examined the optimal design to obtain smoothing parameters from the nonparametric regression model using the Fourier series. Dette et.al., [8] developed Biedermann et.al., [7] study with used some constrains. Tjahjono et.al., [9] proposed Fourier series estimator in biresponse case and applied to predict electricity consumption. Mardianto et.al., [10] applied Fourier series estimator to determine prediction of rice production in Indonesia Provinces. All of studies are using cross section data. Cross section data is the data structure which consisting of one object that is influenced by other related objects, with a one-time observation frequency. Cross section data is often be used in regression analysis.
Along with the development of data analysis, the data needed to have a more complex structure if compared with regression analysis based on cross section data which commonly used. Recently a lot of research has been done by regression analysis that uses longitudinal data. The longitudinal data structure is more complex because it contains cross sectional and time series data elements. Longitudinal data is more reliable in order to find answers about the dynamics of change. Longitudinal data can potentially provide more complete information. The other advantage of using longitudinal data, it can determine the changes that occur in a subject, because the observation is repeated for each subject. Longitudinal data have an efficient estimate of their use because they can be done for observations and different subjects [11].
Some nonparametric regression studies which have used theoretical and applied in longitudinal data, like spline estimator by Wu and Zhang [11], and Fernandes et.al. [12], kernel estimator by Wu and Chiang [13], and Whang [14]. However, a study for longitudinal data with Fourier series estimator is needed to accommodate repetitive data patterns, in this case seasonal, periodic, and seasonal trend combinations, because in application there are many cases in any majors that have data pattern which suitable to be approached by Fourier series estimator. In this study we use Penalized Weighted Least Square (PWLS) optimization to determine estimator for parameter in vector. Based on Gao and Fang [15], PWLS optimization can accommodate outlier that be presented by a plot. In this article, the research interest is related theoretical analysis about Fourier series estimator in nonparametric regression based on Fourier series estimator for longitudinal data based on PWLS optimization. The analysis consists of point estimation to get estimator for curve regression and its asymptotic properties such as linearity and consistency. The result is important in order to develop nonparametric regression with Fourier series estimator for longitudinal data, in further inferences Statistics.

Materials and Methods
In this analysis, nonparametric regression based with Fourier series estimator is used in longitudinal data. So, in this section is given longitudinal data structure and nonparametric regres-sion equation based on Fourier series that suitable to longitudinal data.

Longitudinal Data Structure
Longitudinal data is data with more than one observation for each subject. In longitudinal data analysis, the relationship between subjects are assumed to be mutually independent from one another, but between observations in the subject are interdependent so there is a correlation [11]. The main goals of longitudinal data analysis is to study how the changes of subject are observed over time. Longitudinal data consists of cross section and time series components. The structure of longitudinal data in this study is given in Table 1.
The data that be presented in Table 1 consists of n subject. Each subjects are observed n i times. Consider the pairs of data (x ijl , y ij ), with i = 1, 2, ..., n represent the number of subjects, j = 1, 2, ..., n i represent the number of observations for each subjects, and l = 1, 2, ..., p represent the number of predictors. In this case, x ijl represents the value of l th predictor for i th subject and j th observation, and y ij represents the value of response variable for i th subject and j th observation.

Nonparametric Regression based on Fourier series for Longitudinal Data
In this subsection we introduce Fourier series equation as general and in vector equation that be used to estimate curve regression in nonparametric case for longitudinal data.
Definition 1 Consider longitudinal data whose data pairs are expressed in (x ijl , y ij ), with i = 1, 2, ..., n represent the number of subjects, j = 1, 2, ..., n i represent the number of observations for each subjects, and l = 1, 2, ..., p represent the number of predictors. In this case, x ijl represents the value of l th predictor for i th subject and j th observation, and y ij represents the value of response variable for i th subject and j th observation. The paired data follows a nonparametric regression model for longitudinal data as follows: where g il (x ijl ) as a representation of the nonparametric regression function for l th predictor, i th subject and j th observation, ε ij is a random error for i th subject and j th observation that identically, independent and normal distributed with mean equals to 0, and variance equals to σ 2 [4].
In Definition 1, g il (x ijl ) regression curve is approximated by the complete Fourier series with cosine and sine components, so that it can be defined as follows: Definition 2 Consider a nonparametric regression equation for the corresponding longitudinal data in (1), if the equation is approximated by a complete Fourier series with cosine and (2) with α 0il 2 , γ il , α kil and β kil are regression parameters whose values are obtained based on the estimation process. The oscillation parameter is denoted by k [4].
By substitution (2) to (1), and by expanding sigma notation in front of g il (x ijl ) to each parts, the nonparametric regression equation based on Fourier series for longitudinal data is as follows: represents trend components, and represents oscillation components. In vector equation (3) can be constructed as follows: with y y y response vector, ε ε ε error vector, Z Z Zγ γ γ represents trend component that related to data, and X X Xβ β β represents oscillation component that related to data. In this case is defined vector and matrix components as follows: is a vector with n i × 1 size for each subjects i = 1, 2, ..., n that consists of data which related to response variable.
is a vector with n i × 1 size for each subjects i = 1, 2, ..., n that consists of random error.
is a matrix with n i × p size for each subjects i = 1, 2, ..., n that consists of predictors without trigonometry operation, is a vector with p × 1 size for each subject i = 1, 2, ..., n that consists trend parameters. Since the approach used is nonparametric regression, with the same predictor, the X X X matrix is defined as follows: is a matrix with n i × p(2K + 1) size for each subject i = 1, 2, ..., n that consists of predictors with trigonometry operation, An Analysis about Fourier Series Estimator in Nonparametric Regression for Longitudinal Data is a vector with np(2K T is a vector with p(2K + 1) × 1 size for each subject i = 1, 2, ..., n that consists of oscillation parameters.

Point Estimation Results
In this section, how to determine curve regression estimation based on PWLS, and consider covariance matrix estimation is discussed mathematically. Some of the related theorems which are formulated by us, given as the result for point estimation.

Curve Regression Estimation based on PWLS
In the point estimation, the PWLS optimization method is used to obtain the regression parameter estimator and regression curve. The form of the PWLS optimization method which consists of two components, namely the goodness of fit and the penalty components are defined as follows: Definition 3 Consider pairs of longitudinal data that follows regression equation in (1). The nonparametric regression function estimator for longitudinal data is obtained by solving the PWLS optimization with shape as follows: is a smoothing parameter for i th subject, and W W W is related with covariance matrix estimation.
Based on (7) in Definition 3, there are two components, namely the goodness of fit component with the weighting in the first part, and the penalty component which is the second (7) part. The modification of the penalty component in the vector equation is given in Theorem 4 as follows: Theorem 4 If consider penalty component based on (7) in Definition 3 as follows: than P can be stated as follows: with L L L is named penalty matrix.

Proof of Theorem 4:
Firstly, Theorem 4 can be proved with determine second derivative from g i (x i ) with So, it can be obtained Furthermore, using the result of (10) the integration on (8) is resolved. The integration process is divided into three parts as follows: Thus, based on results (12), (13), and (14), (11) becomes With the expansion based on the sigma notation, it can be obtained two components that related α and β that correspond with β β β. So, it can be defined penalty matrix with np(2K + 1) × np(2K + 1) size as follows: L L L i is a matrix p(2K + 1) × p(2K + 1) size which given as follows: Nonparametric regression based on Fourier series estimator for longitudinal data is obtained by solving the PWLS optimization of from (7) within the scope of Reproducing Kernel Hilbert Space (RKHS) H 2 2 . Theorem 5 is main theorem that gives the PWLS optimization results. (1), with related vector equation y y y = g g g + ε ε ε; ε ε ε ∼ N (0 0 0, σ 2 I I I) (16) where based on (6), g g g = Z Z Zγ γ γ + X X Xβ β β. So, based on PWLS optimization, Fourier series estimator for nonparametric regression curve g g g in longitudinal data case is as follows:

Theorem 5 If consider pairs of longitudinal data that follows regression equation in
g g g = Z Z Z γ γ γ + X X X β β β = Hy Hy Hy with D D D = X X X T W W WX X X + NL L L; and W W W is related with covariance matrix estimation, and N is the number of observations for all subjects.

Proof of Theorem 5:
Based on result in (9) Theorem 4, with substitution (9) to (7) can be obtained PWLS optimization as follows: min g∈H 2 2 1 N (y y y − g g g) T W W W (y y y − g g g) + β β β T L L Lβ β β (23) g g g = Z Z Zγ γ γ + X X Xβ β β, with expansion result is defined as follows: where Q(β β β, γ γ γ) is equals to 1 N y y y T W W Wy y y − 2γ γ γ T Z Z Z T W W Wy y y − 2β β β T X X X T W W Wy y y + γ γ γ T Z Z Z T W W WZ Z Zγ γ γ+ γ γ γ T Z Z Z T W W WX X Xβ β β + β β β T X X X T W W WZ Z Zγ γ γ + β β β T X X X T W W WX X Xβ β β + β β β T L L Lβ β β The optimization solution is obtained by solving the partial derivative Q(β β β, γ γ γ) respect to β β β and γ γ γ, then the result is equated with zero, and based on goodness of properties from an estimator, some modification Mathematics with substitution method is done to get estimator that free from parameters, related with sufficient Statistics. So it can be summarized that based on PWLS optimization, Fourier series estimator for nonparametric regression curve g g g in longitudinal data case is as follows: g g g = Z Z Z γ γ γ + X X X β β β = Hy Hy Hy W W W is related with covariance matrix estimation, and N is the number of observations for all subjects.
To investigate whether g g g that consist of the β β β and γ γ γ estimators that minimizes least square, initially it can be constructed a Hessian matrix corresponding to the second derivative. The form of the Hessian matrix or H H H * is as follows: H H H * = ∂ 2 Q(β β β,γ γ γ) ∂β β β∂β β β ∂ 2 Q(β β β,γ γ γ) ∂β β β∂γ γ γ ∂ 2 Q(β β β,γ γ γ) ∂γ γ γ∂β β β ∂ 2 Q(β β β,γ γ γ) ∂γ γ γ∂γ γ γ Next, the derivative of each component from the Hessian matrix in (25) is determined with result as follows: and The result in (26), (27), (28), and (29) are substituted to (25), so the Hessian matrix becomes as follows: Furthermore, using the Hessian matrix on (30) given vectors y y y * y y y * 1 y y y * 2 506 An Analysis about Fourier Series Estimator in Nonparametric Regression for Longitudinal Data So, it can be used to investigate that quadratic form y y y * T H H H * y y y * ≥ 0 or it has positive definite with y y y * T H H H * y y y * as follows: Based on these conditions, it can be concluded that the Fourier series estimator in the nonparametric regression for the longitudinal data has minimized the least square.

Covariance Matrix Estimation
Generally, in nonparametric regression modeling, it is assumed that the error covariance matrix is known such as based on Wu and Zhang [11]. But in reality, the covariance matrix contains population parameters such as variance and covariance that are not known. The solution for this condition is to estimate the error covariance matrix. Theorem 6 is related to covariance matrix estimation. (1), with related vector equation in (16), and if consider covariance matrix from error that denoted by W W W = V V V −1 , than covariance matrix estimator is as follows: for i = 1, 2, ..., n.

Proof of Theorem 6:
Maximum Likelihood Estimation (MLE) is used to get estimator that related with covariance matrix as a weight in model. In MLE normal multivariate distribution assumption is used like assumption in general regression modeling. Based on the multivariate normal distribution, the likelihood function is obtained as follows: (y y y i −g g g i ) T W W W (y y y i −g g g i ) (33) Based on some studies such as Johnson and Wichern [16], Fernandes et. al., [17], and Ramli et. al., [18], for obtaining the maximum value of each component of the Likelihood function, it is equivalent to finding the maximum value for each factor. The maximum value of each factor is obtained when the error covariance matrices of W W W i ; i = 1, 2, ..., n are given by the following equations:

Linearity and Consistency from The Estimator
This section examines the asymptotic properties of the Fourier series estimator g g g that related with nonparametric regression model for longitudinal data, the asymptotic properties that specifically discussed, related to the linearity and consistency of g g g. Theorem 7 is related linearity properties from g g g Lemma 7 If g g g is the solution that minimizes PWLS at (23), then pmb g consists of the linear estimator class in the y y y observation.

Proof of Lemma 7:
Based on Theorem 5, the completion of the Fourier series estimator for longitudinal data nonparametric regression curves that minimizes PWLS optimization has been obtained. Based on (17), it can be seen that g g g is a linear estimator class in the observation y y y which is represented by the hat matrix H H H.
Furthermore, the consistency of the Fourier series estimator in the nonparametric regression curve for longitudinal data is investigated based on Chebyshev's inequality. However, firstly, expectation and variance of g g g is determined. The expected E g g g is as follows: Mathematics and Statistics 9(4): 501-510, 2021 507 and for V ar g g g is determined as follows: V ar g g g = V ar(Z Z Z γ γ γ + X X X β β β) = V ar(Z Z Z γ γ γ) + V ar(X X X β β β) = V ar(Z Z ZE E E −1 Z Z Z T W W WF F Fy y y) + V ar(X X XD D D −1 X X X T W W WG G Gy y y) Based on variance properties, it can be obtained where and After that, Lemma 8 which discusses the consistency of g g g is given as follows: Lemma 8 If g g g is the solution that minimizes PWLS at (23), then pmb g consists of consistent estimator for g g g based on Chebyshev's inequality.

Proof of Lemma 8:
To clarify consistency from Fourier series estimator in nonparametric curve for longitudinal data, it can be investigated whether g g g p → g g g by using Chebyshev's inequality, for > 0, Based on the limit result for Chebyshev's inequality, it can be seen that g g g p → g g g. Therefore, it is enough to prove that g g g is a consistent estimator for g g g.

Application for Review
Nonparametric regression that approached by Fourier series estimator for longitudinal data can be applied in various fields. However, it needs to be considered is how to choose the best model. Generally, in choosing the best model in the Fourier series estimator is based on the optimal oscillation parameter (k) which has the smallest GCV value, and there are several studies that also consider the parsimony model [19]. Since the optimization method used is PWLS involving k, and considering the smoothing parameter λ, so there are two parameters involved in determining the best model. In nonparametric regression based on Fourier series estimator for longitudinal data, k value is imputed based on positive integer non zero, k = 1, 2, ..., K. While, the λ is a smoothing parameter to control penalty in order to get a smooth regression curve. Therefore, we need an optimal value of λ in order to obtain the best regression curve estimator. Thus, the selection of the optimal λ value is very important to estimate the nonparametric regression curve. For a specified oscillation parameter, the optimal value of λ results the goodness of the model, such as a small Mean Square Error (MSE) value, a minimum Generalized Cross Validation (GCV) value, and a high determination coefficient value (R 2 ). Consider Definition 9 about MSE as follows: The last definition is about R 2 .

Definition 11
The determination coefficient that related nonparametric regression model based on Fourier series estimator for longitudinal data is given as follows: with y is mean vector.
Fourier series estimator in nonparametric regression for longitudinal data has been applied to some related data, mostly use Weighted Least Square (WLS) optimization. Mardianto,et. al.,[19] estimated the number of students who study in some tutoring agencies, where this phenomena had trend -seasonal pattern. Mardianto, et. al., [20] stated that the performance Fourier series estimator for longitudinal data was good to estimate trend -seasonal and oscillation data pattern for longitudinal data when be compared with linear estimator. Mardianto, et. al., [20] used some simulation data to present the performance of Fourier series estimator in nonparametric regression for longitudinal data. Beside with linear estimator, the other research from Mardianto, et. al., [4] compared Fourier series estimator with spline and kernel estimator for longitudinal 508 An Analysis about Fourier Series Estimator in Nonparametric Regression for Longitudinal Data data. Mardianto,et. al.,[4] used meteorology data for 10 cities where the data was tended to have oscillation pattern, and some of them had trend -seasonal pattern. Based on Mardianto, et. al., [4], Fourier series estimator had good performance when compared with other estimator in nonparametric regression for longitudinal data, such as spline and kernel estimator. As a review, Figure 1 presents the data pattern that be used to present the performance Fourier series estimator in nonparametric regression for longitudinal data based on PWLS optimization. There were 5 subjects where every subject was observed 120 times. The subjects were 5 different tutoring agencies in a city. The response variable (y) represented the number of student that join in every tutoring agencies. So, there were y 1 , y 2 , y 3 , y 4 , and y 5 which correlated each other. The first predictor variable (x 1 ) represented return of equity for every tutoring agencies. So, there were x 11 , x 21 , x 31 , x 41 , and x 51 . The second predictor variable (x 1 ) represented return of asset for every tutoring agencies. So, there were x 12 , x 22 , x 32 , x 42 , and x 52 . Based on Figure 1, generally the scatter plot between x 1 with y tented to had trend -seasonal pattern which correspond to the data pattern in Fourier series estimator, and the scatter plot between x 2 with y tented to had uncertain pattern which correspond to the data pattern in nonparametric regression. However, generally all of predictors that correspond to response data can be stated has oscillation data pattern.
Fourier series estimator based on cosines bases was used to estimated response data with consideration the smallest koptimal as representation of oscillation parameter that be presented in Table 2. Table 2 presents besides it had the smallest k-optimal, in cosine bases had the smallest GCV was equals 171.608. Generally, for all subjects with used k = 8, the smoothing parameter (λ) was near to 0.000040 as approximation. So, it can be stated that for k = 8, and λ = 0.000040. Visually, to present that λ = 0.000040 was the λ−optimal with the smallest λ can be shown in Figure 2. Fourier series estimator in nonparametric regression for longitudinal data, with k = 8, and λ = 0.000040, beside it had the smallest GCV was equals 171.608, it resulted the small MSE equals to 58.6058 and big determination coefficient equals to 0.9805. Therefore, it can be supported with previous study such as Mardianto, et. al., [2, 3, 8, 17 -20], Fourier series estimator in nonparametric regression has good performance to estimate pairs of data that have trend -seasonal and oscillation pattern, in this case especially for longitudinal data.

Conclusions
The Fourier series estimator for nonparametric regression in longitudinal data can be constructed based on PWLS optimization. The regression function estimator based on Fourier series for nonparametric regression in longitudinal data has satisfied linearity and consistency properties. The Fourier series estimator has good performance to estimate trend -seasonal and oscillation longitudinal data pattern based on nonparametric regression study. Some study cases has been summarized based on previous research. Furthermore, using the result of PWLS optimization, Fourier series estimator for nonparametric regression in longitudinal data presents good performance to estimate the number of student that join in tutoring agencies based on return of equity and return of asset. The selected estimator, which is cosine Fourier series with k = 8 and λ = 0.000040 for all subjects, has small MSE and GCV value and the big value of determination coefficient. This research can be more developed to further theoretical research such as inference about interval estimation and test hypothesis. In application study, the relevant data from another fields can be modeled based on Fourier series estimator in nonparametric regression for longitudinal data. their comments and suggestions which improve the paper significantly especially from reviewer.