Fuzzy Estimations for Detecting Abrupt Changes: Cases on Tourism Series

This article deals with problems of detecting abrupt changes in time series ba Change Point Model (CPM) framework. We propose a fuzzification in a Fuzzy Time Series (FTS) model to eliminate a trend in a contaminated dependent series. The independent residuals are then inputed on the CPM method. In simulating an abrupt change, an ARIMA(1,1,1) and variance of the model are considered. The abrupt change is modelled as an AO (Additive Outlier) type of outliers. The minimum weight or breaksize of the abrupt change is defined based on the ARIMA variance formulated in this article. The percentage of uncorrelated residuals obtained by the FTS model and the percentage of correct detection of the proposed procedure are shown by simulation. The proposed detecting algorithm is implemented to detect abrupt changes in monthly tourism series in literature, i.e., in Taiwan and in Bali. The first series shows a slowly increasing trend with one abrupt change while the second series exhibits not only a slowly increasing trend but also a strong seasonal pattern with two abrupt changes. For comparison, we detect the changes in the empirical examples on an existing automatic detection procedure using tso package in R. For the first example, the results show that both detecting procedures give exactly a similar location of one change point where the package recognises it as an AO type of outliers. The abrupt change is related to the period of SARS outbreak in Taiwan. On the second example, the proposed procedure locates 4 change points which form two locations of changes, i.e., the first two change points are within 2 time points so do the last two change points. The locations are closed to times of Bali Bombing events. Meanwhile, the automatic procedure recognizes only one AO outlier on the series.


Introduction
Structural breaks in a time series statistically mean that there is at least a shift in their distributional parameters, e.g., in mean and/or in variance. Some certain U.S series show structural changes as a result of changes of various international events including the two Oil Price Shocks and in International Monetary System [1]. A decline trend of tourist arrivals following 2002 Bali Bombing had a short run-effect on Bali and other famous tourist destinations in Indonesia [2] . A good news, however, the effects of terrorist attacks in 2002 and 2005 on the number of tourist arrivals in Bali is transitory [3]. Still in tourism research, an abrupt change is one out of four distinct patterns in tourism series discussed in [4].
We define abrupt changes as stated in [5], "By abrupt changes, we mean changes in characteristics that occur very fast with respect to the sampling period of the measurements, if not instantaneously". Further, the authors imply that these considered changes occur at any instant times during the observations, treating, in some sense, that characteristics of the series are constants before and after the changes. Due to temporary effect, structural breaks in a time series are shown by changes only on some parts in the time series plot.
In time series analysis, the study of impacts of changes when the time of changes are undefined is known as outlier detection analysis. Commonly, the first step in the outlier detection is to model the series in the presence of outliers [6], [7]. The resid-748 Fuzzy Estimations for Detecting Abrupt Changes: Cases on Tourism Series uals are then formed into several models of outliers known as outlier effects. One may argue that the standard deviation of the residuals may be overestimated as it is obtained from a contaminated series. In the latter work, the contaminated standard deviation of the residuals is reestimated. The standardized versions of the outlier effects follow an approximately normal distribution. Therefore, the existence of an outlier in each model is detected by using a critical value determined from the distribution. Having been all detected, the remaining series is a free outlier model. The process of detecting all possible outliers involving iterations as the effects of recently detected outliers are to be subtracted from the recent data set and then the residuals are reupdated.
Change point formulation was originally introduced by Page in 1954 to detect changes in density of an independent and identicaly distributed (i.i.d) sample. The formula, known as CUSUM, is closely connected to the maximum likelihood (ML) method. Under assumption on the series distribution, parametric change detection analysis had been developed in two directions, Bayesian and MLE methods. Readers refer to [5], [8] for parametric approach in change point detection. On the other hand, change point model (CPM) which is statistically simple and numerically effective was proposed [9]. A two-sample statistics with unknown parameters is employed and a recursive calculation of the statistics is used. Although, originally formulated to detect changes in mean of the Gaussian series in control quality processes, the method has been expanded into a set of tools to allow for detecting arbitrary changes with and without distributional assumption on the sample. The CPMs are available in the cpm package in R (R team core 2013) and the manual in [10].
The CPM method is based on the standard two-sample statistic test where the assumption for the input series is of i.i.d observations. It is said on the manual that for autocorrelated series, however, the detectin of changes can be performed either on the residuals of estimations or of one-step ahead predictions of a fit model inasmuch giving i.i.d residuals. A similar notion is given in [5], pg. 203, "A traditional approach to failure detection consists of considering that the design of detection algorithms is basically made of two steps : 1. generation of 'residuals,' namely of artificial measurements that reflect the changes of interest; for example, these signals are ideally close to zero before the change and nonzero after; 2. design of decision rules based upon these residuals." Hence, two main steps are simultaeously performed in implementing the CPM in a time series analysis, i.e., generating the residuals and detecting the change points. The first step is to obtain the i.i.d residuals and the second step is to determine the change points. Henceforth, we refer the detecting procedure as the two steps, i.e., generating residuals and detecting change points.
The first step on the detection algorithm in CPM framework requires an i.i.d series for the input. Hence, prior to detecting we need to check for the i.i.d assumption on the input series. The process is similar to preliminary forecasting process in time series analysis in which the stationarity property is checked prior to modelling. The classical decomposition model represents any time series x t as a realization of the pro-cess, where m t , s t , and Y t are the trend, the seasonal and the stationary random noise components, respectively. By careful examination of a time series plot, one can have a good idea of these components. After removing the trend and/or the seasonal components such that the series is stationary, we can proceed to construct models under the theory of stationary processes [11].
A preliminary transformation in forecasting involves a nonparametric method to eliminate the trend and or seasonal components. Some techniques are discussed in [11]. In this article, we apply a fuzzy time series (FTS) model on in-sample data to eliminate the trend. FTS of [12] was stated as an appropriate time series model to handle a trend in a series [32]. As a nonparametric method, estimating using FTS needs no distributional assumption on the sample. The uncorrelated residuals of FTS is of interest as it provides a chance to incorporate the estimation method on the first step of the detecting algorithm. The FTS model was introduced by [12], [13]. It takes into account the fuzziness in data gathering in estimation process. The model has been extended in all steps of the fuzzy procedures. A brief review on forecasting using FTS can be found in [14]. A comparison of some FTS models in forecasting is discussed in [31].
A number of research studies in change detection have employed fuzzy approach, i.e., [32], [33], [19], [20], [22], [15]. We review some of them which discuss the approach in time series. The first literature uses deviation of two fuzzy numbers of LR-type in regression trees framework while the second literature employs fuzzy clustering and defines an α -level of detection statistics. In the first literature [15], the authors use fuzzy approach in analysing Italian sovereign debt and temperature in Rome. The first data set is of rating scales while the second data set is crisp data. The method is based on the concept of contagious partitions introduced by Fisher in 1958 where the author proposed an exact grouping of n elements into G mutually exclusive and exhaustive subgroups by minimizing the sums of squares of within-group such that the subgroups having maximum homogeneity. When the data points are a priori ordered, such a case in time series, seeking minimum sum of within-group sums of squares is equivalent to segmenting a time interval into G subintervals which two successive subintervals have different model parameters. The concept was explored by [17] who apply global search using a dynamic programming to get G partitions. In contrast to the previous research which use non hyrarchi partition, Cappeli et. al. show that the partition can also be obtained hyrarcicaly by using regression trees which they then propose Atheoretical Regreession Tree (ART) method [16]. An attempt to assess the use of ART directly on time series to detect changes shows that the method is robust for series with negative correlation and some small positive correlation [18]. The ART method applies binary splitting to divide an interval into two subintervals. The splitting criterion involves additions of sums of square residuals on both subintervals. By using deviation measures of two LR-fuzzy data in place of residuals of the crisp counterpart, the authors proceed the splitting procedure. Hence, the method sounds fuzzy approach.
Another fuzzy detection to detect change periods was proposed by [20]. The method uses α-level change period detection proposed in [19] based on a fuzzy clustering and an extended CUSUM in [21]. The fuzzy detecting algorithm is applied to detect change periods in Taiwan monthly arrivals. Fuzzy clustering and the α-level change period detection were also used to detect change points and change periods [22]. Changes in the level and slope in between two consecutives regimes are detected by using Chow-test. The model is then applied to detect change points and change periods of six Asian Countries exchange rates.
The above fuzzy detection algorithms show that crisp detecting methods can be set up so that they work well in the fuzzy system, either by choosing a suitable fuzzification of the data and/or modifying the method such that it accommodates fuzzy procedures. In this article, a procedure to detect abrupt changes in a time series by using CPM is presented. A fuzzification is introduced in an existing FTS model to generate the i.i.d residuals.
Our motivations leading to the change detection framework can be summarized as follows. Theoretically, it allows us to use fuzzy approach in analyzing abrupt changes and it is a natural counterpart to the conventional framework as well; From practical point of view, as a statistical decision tool for detecting changes which is of great potential interest either in industry or in economy.
The outline of the article is as follows. Introduction is on section 1. The model of the abrupt change is explained in section 2. In section 3, the statistic methods employed in the proposed detecting algorithm are discussed and the proposed procedure is given. Simulation study showing the percentage of uncorrelated residuals obtained under FTS estimates and the percentage of correct detection of the detecting algorithm is presented in section 4. Implementation of the proposed detecting procedure on two empirical examples and comparison of the proposed detecting procedure to the existing automatic detection is given in section 5. Conclusions are provided in the last section.

The Model
The proposed detection algorithm may be applied for general nonseasonal and seasonal ARIMA models. To simplify, we use nonseasonal ARIMA without drift to show the performance of fuzzy estimation in generating the i.i.d residuals. Let Z t be a stochastic process following an autoregressive integrated moving average of order p, d and q; that is where B is the backshift operator such that BZ t = Z t−1 ; φ p (B) = 1 − φB − . . . − φ p B p , θ q (B) = 1 − θB − . . . − θ q B q are two polynomials in B with zeros all outside the unit circles and a t are i.i.d N (0, σ 2 a ). One of the parsimonous ARIMA models is ARIMA(1,1,1). Parsimonous ARIMA models and the extended models are widely employed to describe data set in economy and industry [34]. We use the ARIMA model, in the simulation to show the performance of the proposed detecting procedure. In order to do that, a number of series under the model, contaminated by a change point to some degree of break, are simulated. The mean and variance of the model are used to generate the contaminated series. We define the mean and variance of the model in the following subsection.

Mean and Variance of ARIMA(1,1,1)
For the purpose of simulation which we need to generate some series with a breakpoint where the underlying model is ARIMA(1,1,1), we need to determine the mean and variance of the model. Let, p = 1, d = 1, and q = 1 in (1). We rewrite the model as follows: which gives, Given that the mean and variance of ARIMA are time dependent [23], the mean of ARIMA(1,1,1) at time t is determined as follows: Hence, Further, we set the time reference at n 0 and follow the way variance of IMA(1,1) is derived in [23]. Following (3), Z t−1 = (1 + φ)Z t−2 − φZ t−3 + a t−1 − θa t−2 . After substituting it into (3) and rearranging the equation, we get, After substituting it into the last equation, simplifying it and doing so until the-k step, we get, Let Z t−k = Z n0 and the subscripts and superscripts are changed, i.e., t − k = n 0 , t − (k − 1) = n 0 + 1 and 750 Fuzzy Estimations for Detecting Abrupt Changes: Cases on Tourism Series The above equation can be written in a shorter expression using sum notations as follows, Because the terms, Z n0 , Z n0+1 and a n0+1 , are assumed as the predefined known values, they are statistically constant. Consequently, their variances are zero. Furthermore, as the noise are independently distributed, the variance of the series equals the total of their serial variances. Hence, Therefore, the variance of ARIMA(1,1,1) at time t is as follows: (5) When we put the AR coefficient to be zero and choose the reference point at n 0 + 1, i.e., φ = 0, m 0 = n 0 + 1, (5) reduces to the variance of IMA(1,1) that is equation (4.3.4) of [23] As the variance of ARIMA model depends on the time observation, we report the variance of ARIMA models at particular time. Let t = 50, we simulate ARIMA(1,1,1) on their stable condition, i.e., −1 < φ, θ < 1 with the standard normal error and then calculate the variance at time t. The plot shows that variance of the process is directly proportional to the AR parameters and inversely proportional to the MA parameters. The plot is not displayed here for reason of spaces. We focus our interest on the processes with small variances by choosing the following intervals for the parameters, −1 < φ ≤ −0.5 and 0.5 ≤ θ < 1. We also compare, graphically, the simulated variance to the theoritical variance of (5) as shown in Fig. 1. As can be seen, overall, both subplots share a similar shape. However, the surface plot of the simulated series is rather undulating ( Fig. 1A) compared to the smooth surface plot of the theoretical series (Fig. 1B). These plots qualitatively prove that our finding, the theoretical formula, in (5) is a well representative model for variance of ARIMA(1,1,1). Therefore, we can use the formula for the purpose of analysis related to the variance of ARIMA (1,1,1) at particular time t.

An ARIMA Model with A Breakpoint at Particular
Time In this subsection we define an ARIMA model which contains an abrupt change. We say that the model has an additive change as we add a degree of deviation at the mean of the observation at the targeted change point T . Given ARIMA (1,1,1) as in (2). We can write, Zt be the process variance at time t. We define the process with a breakpoint at time t = T as follows; where w is a constant representing the size of break or the weight and I (T ) t = 1 for t = T and zero otherwise. When, we set wσ Zt = ω in (6), the model is known as the AO (additive outlier) type of outliers of Fox (1972) in [7] where ω is the magnitude of the outlier effects.
One important issue in simulating series with outliers is setting the breaksize. A large breaksize produces a single outlier that may wash out the underlying characteristics of the series [23] which may result in a white noise phenomenon. On the other hand, setting a rather small breaksize may result in failure detection of the outliers as they may be masked by the underlying series. In the simulation, we set the range for w based on the variance of the observation at the change point and at the last simulated observation. This is due to the fact that the variance of ARIMA(1,1,1) increases as the time increases. Also, the outlier should lies in some distance which should be much larger than the standard deviation of the simulated series. Let sd(Z T ) and sd(Z n ) be standard deviation of the obseravtion at the change point and at the last observation, respectively. We define the minimum break size for an abrupt change as follows: When analyzing the use of ART in detecting change points in time series, [18] concludes that the best location for simulated breakpoint is in the middle of the series under investigation. Following it, in the simulation, a change point is introduced in the middle of the simulated ARIMA.
Let φ = −0.5, θ = 0.9, and σ 2 a = 1 be the ARIMA parameters, i.e., the model is written as follows : (8) Figure 2 shows one realization of the model and the simulated series with a break of size 4, 5, and 6. The number of observations on each series is 100 and the break is at the 50th observation.

Methods
In this section we discuss the methods used in constructing the algorithm of detection. We first describe the FTS models. We also describe the proposed fuzzification process in the model. As mentioned in the introduction, the model is constructed to eliminate trend on raw data in order to get i.i.d residuals. We then, review the CPM used for detecting changepoints on the residuals. We also review some tools available in the literatures related to i.i.d observations. Finally, we outline the proposed detection algorithm and give some remarks on it.

Fuzzy Time Series
We briefly review some definitions in fuzzy sets, fuzzy time series, and fuzzy logical relationship in [24], [12] and [13], respectively. We then describe step by step procedures used in the recent study.
Let U = {a i ; i = 1, . . . , p} be a universe of discourse which is discrete and finite. A fuzzy set A of U is defined as, The plus sign represents a union and the horizontal division line represents a separator in the fuzzy set theory. Some literatures use commas and brackets instead of plus and curly brackets in writing a fuzzy number. A fuzzy set of A, by definition, is determined by its membership degrees of µ A (a i ).
Let {y t ∈ R; t = 1, 2, . . . , n} be a universe of discourse in which a fuzzy set {µ i (t) ∈ R; i = 1, 2, . . .} is defined and F (t) be the set of all {µ i }. Here, F (t) is called as the fuzzy time series of {y t ; t = 1, 2, . . . , n}. In time series analysis, recent values can be described by information on previous values in the form of an analytical equation. By analogy, if a fuzzy time series at time t, F (t) is explained by its previous value F (t−1) then this relationship represented as F (t − 1) → F (t). This expression is called as fuzzy logical relationship (FLR).
We adopt the framework in [25] in setting up the fuzzy procedures. There are 5 steps discussed in the reference. We modify the formula used in partitioning universe of discourse which is on the first step. The fuzzification is set such that heights of fuzzy numbers of the extreme values are their crisp values. It turns out that the defuzzification of these fuzzy numbers are their crisps. Hence, we are likely to produce zero residuals for these observations. We employ procedures provided in [26] for the remaining steps. Let x 1 , . . . , x n be the time series of interest.
Step 1. Partition of universal of discourse. We create subintervals a i = [lb i , ub i ) for i = 1, . . . , n based on the series. We define D = 1 n (max{x t } − min{x t }) and d = min{x t } − 0.5D. The lower and upper bounds of the-ith subinterval are given as lb i = d + (i − 1)D and ub i = d + iD, respectively.
Step 2. Define linguistic terms. Linguistic terms are given on intervals which contain at least one crisp datum. Let p ≤ n be the number of intervals which has at least one crisp datum. Then we are going to have p linguistic terms notated as A i , i = 1, . . . , p, i.e., a set of fuzzy numbers. Fuzzy number A i is a discrete fuzzy number which has a membership function of degree one in a i , of degree 0.5 in both a i−1 and a i+1 and of 752 Fuzzy Estimations for Detecting Abrupt Changes: Cases on Tourism Series degree zero in other intervals. The fuzzy number is written as, . . , 0 ap . Meanwhile, for i = 1 and i = p, we define fuzzy numbers A 1 and A p as, ap , respectively.
Step 3. Fuzzify crisp data. In fuzzification process, a crisp datum is assigned a fuzzy number. That is when a crisp number belong to interval a i , the fuzzy number is the corresponding linguistic term A i . We also collect the index of the linguistic term, i, as the weight of fuzzy number A i .
Step 4. Set Fuzzy Logical Relationship (FLR) and Fuzzy Logical Relationship Groups (FLRGs). Let A i and A j be fuzzy numbers of x t−1 and x t , respectively. The FLR of order one for the observations is given as A i → A j . There are (n − 1) FLRs of order one. A fuzzy number on the left hand side of an FLR arrow is also named LHS. Similarly, for the fuzzy number on the right hand side, we have RHS. Next, FLRGs are created based on the FLRs. When some FLRs have one similar LHS but of different RHSs, the FLRs are grouped into one FLRG. For example, let there be FLRs of the following forms: The left hand side of an FLRG is a unique fuzzy number whereas the right hand side comprises of at least one fuzzy number. Hence, there would be less numbers of FLRGs compare with FLRs.
Step. 5. Defuzzify fuzzy numbers. Let A i be a fuzzy number and x t be fuzzified into A i . Defuzzification of A i means that we calculatex t , the estimate of x t . We defuzzify all fuzzy numbers on the left hand side of FLRG one by one. Let A i be a fuzzy number on the LHS and A j , A k , A l be fuzzy numbers on the RHS of an FLRG, i.e., the FLRG is of the form: A i → A j , A k , A l . Also, let the heights and weights of A j , A k , A l be c j , c k , c l and w j , w k , w l , respectively. The formula of defuzzification of fuzzy number A i is given as follows: The form of FLRs of an FTS model indicates the order used in the model. For FTS of order one, the recent value is determined by one previous value as in the 4 th step above. Further, the model gives estimates for the 2 nd to the last observation. Therefore, a series with n number of observations will have (n − 1) residuals.

Change Point Model (CPM)
Change point detection is related to detecting change points in a sequence of data. Statistically, the changes are defined as changes in distribution. To model it, let x 1 , x 2 , . . . , x n be a sequence of independent observations drawn from random variables X 1 , X 2 , . . . , X n . In the absence of changes, we can model the random variables to follow a distribution, let say F 0 , i.e., X i ∼ F 0 . Given that the sequence undergoes changes at r points of time c 1 , c 2 , . . . , c r then we can model the sequence as [9] used the conventional two-sample t-statistics test as the test to detect a change point in mean in statistical process control framework. Given n observations drawn from normaly independent and identically distributed random variables, a twosample t-test is formulated to test the hypothesis that the mean changes after some observations. The following is the test to detect one change point on a sequence of n observations which the mean changes after the-kth, 1 ≤ k < n, observation. By changes in mean, we imply that µ k = µ n−k and σ 2 k = σ 2 n−k = σ 2 where µ k and σ 2 k are the mean and variance of the first k observations, µ n−k and σ 2 n−k are the mean and variance of the remaining observations and σ is the series standard deviation. The hypothesis is µ k = µ n−k versus µ k = µ n−k . The statistics test is given as follows: are the MLE estimators of µ k , µ n−k and σ 2 , respectively. The last term is the sequence polled variance. Under the null hypothesis, test statistics T k follows Student-t distribution with (n − 2) degree of freedom. Let T * k be the maximum of T k , 1 ≤ k < n. The test signals a change point at k, when T * k > h n , a critical value defined based on the distribution.
In this article, the CPM is used in the second step in the detection algorithm that is for detecting the change point on i.i.d residuals. Unless otherwise indicated we use the CPM method as implemented in the R software of R Development Core Team (2015) contributed package cpm of [10].

Residuals
Diagnostic checking of residuals is a step in modeling time series to check for the goodness of model fitting. Residuals are differences between the model and observations. Hence, it comprises portion of information on the observations not explained by the model. The best model produces residuals that satisfy properties of i.i.d random variables, i.e., that there is no dependence among the observations. A number of tools are utilized to check for the property in a series [11]. Checking independency of a sequence qualitatively is assessed by using the properties of autocorrelation of independence series while quantitatively independent observations can be analized by testing null hypothesis of no autocorrelations or testing null hypothesis of random observations on the residuals. Without loss of generality, in the following, to check for independence of a sequence of observations, we review only two tools that we mainly use in the article.
We can check i.i.d series qualitatively on the autocorrelations. The sample autocorrelations of n independent observations with finite variance follow a normal distribution with zero mean and constant variance of 1/n [27]. In other words, at a significant level α, N (0, 1). Hence, at α = 5% and α = 1% (n−k) where ρ j is the autocorrelation at lag j, can be used to test a null hypothesis of zero autocorrelation at the first k lags. Under the null hypothesis, Q follows χ 2 (k) distribution. Hence, at a significant level α, we reject the null hypothesis for Q > χ 2 1−α (k). When, we apply the test on residuals obtained by fitting a model with m parameters, we then need to adjust the degree of freedom of the Chi-square from k into (k − m).

Proposed Detecting Algorithm
The proposed detecting procedure is constructed to detect abrupt changes in a time series using CPM model. The procedure consists of 4 main steps. The first step is aimed to eliminate trend by using a nonparametric FTS model. The proposed algorithm to detect change points are as follows: Step 1. Apply FTS of order one on the raw data using FTS model in Subsection 3.1; Step 2. Check the i.i.d property of the residuals using ACF and PACF analysis; Step 3. If the residuals show some correlated series, apply ARIMA on the residuals. Go to Step 2.
Step 4. Apply CPM method on the i.i.d residuals.
The first step in the proposed procedure using a nonparametric method may result in correlated residuals. Therefore, diagnotic checking for i.i.d assumption has to be implemented on the residuals. In the following section, we show how sampling behavior of the autocorrelation function of the FTS residuals in terms of the percentage of uncorrelated residuals. Given that some autocorrelations in the second step are still significant, we proceed to ARIMA modelling. In the preliminary forecasting, transformed series with significant correlations at some lags either in ACF and/or PACF indicates a pattern of ARIMA models. Following the methodology in ARIMA modelling, we are able to get the i.i.d residuals. The detecting procedure using CPM method is on the last step. In order to have an efficient detection using CPM, we need background knowledge related to the changes in the raw data, i.e., one change point versus multiple change points, types of structural changes in the series and underlying assumption on the raw data if it is available.

Simulation Study
The proposed procedure is designed to detect abrupt changes in time series. A simulation study is carried out to analyze the performance of the proposed procedure. The analysis is conducted for single outlier of AO type which is located in the middle of the simulated series. The underlying series in the simulation is of ARIMA(1,1,1).
We use the ARIMA model as defined in (8). Let T = 100, and n 0 = 0 be the simulated change point and the reference point. A simulated series with an abrupt change at time T is constructed as follows: 1) we simulate 150 observations of the ARIMA model; 2) we set the 100th observation to be the expectation minus a weight of its standard deviation. The mean and variance of the model at time t = T are determined according to (4) and (5), respectively. In other words, we define the breakpoint at t = 100 as follows: Z 100 = (1 + φ)Z 99 − φZ 98 − wsd(Z 100 ); 3) In order to avoid the pre sample effects on simulated data, we use the last remaining 100 observations for simulation, i.e., we use the 51th to the end observation; By doing this, we have a series of length 100 contaminated with a breakpoint of size w at the 50th observation.

Sampling Behavior of Sample ACF of Residuals
As it has been said in Section 2, we set the break size such that the deviation of the change point is around 4 times larger than the maximum standard deviation on the simulated series, that is the standard deviation of the last observation, i.e., see (7). In the case with the ARIMA model chosen, the breaksize is greater than 4.2. Further, we use break size greater than or equal to 5, but we may also display some results using break size less than 5 in order to see the pattern.
One realization of the ARIMA model with a break size of 6 and the sample ACF and PACF plots are given in Fig. 3  series. The residuals is shown in Fig. 4A and the sample ACF and PACF plot is in Fig. 4B. Although, plot of the residuals show a similar pattern to that of the raw series, the sample ACF and PACF of the residuals show an i.i.d sequence.
In the simulation, the sampling behavior of the autocorrelations of the residuals is shown as the percentage of uncor-  We report some of the results for N = 100 in Fig. 5. We record all the results in Table 1 and Table 2.
There are 4 subfigures in Fig 5. On each subfigures, there are 100 sample ACFs of residuals from 100 simulated series with particular breaksize. The sample ACF plots are displayed as line plots. We plot this way in order to see the trend more clear graphicaly. The 1% significant autocorrelation bands (shown as green horizontal-dashed line) are added on the plots to compare it to the default significant level (shown as the red-dashed line). As shown in the figures, overall, the larger the insignificant intervals are, the more the number of uncorrelated residuals is. This trend is similar to the size of breaks. In other words, the percentage of uncorrelated residuals is an increasing function of the break size. For higher number of realizations, on average, the percent- age of uncorrelated residuals on each case of w does not change. As mention previously, we define abrupt changes with size w ≥ 5. As seen on Table 2, at the break size of w = 5 on the default significant level, i.e. α = 0.05, only around 49% to 58% of the residuals are uncorrelated. These percentages increase to around 86% to 92% using lower significant level (see Table 1).

Percentage of Correct Detection
The next step is to apply CPM method on the residuals to detect the change points.
As mention previously, some background information about the raw data are needed in using the detection method effectively. We use detectChangePointBatch function because we want to detect only one change point. Given that we set the changes in variance of the change point and the residuals are generated using normal standard error, we set the type of CPM model as Bartlett type and use the default parameter for significant level to determined the treshold for the test statistics, i.e. α = 0.05 . The model is used to detect changes in variance of an i.i.d normal sequence. We generate 10, 000 realizations of the ARIMA model and set the break size from 5 to 9.5 with increment 0.5. For each case, we apply the FTS and collect residuals. Then, we check for the uncorrelated property on the residuals on both significant levels. Next, we apply CPM method on all the residuals.   The percentage of correct detection regardless of the i.i.d assumption on the residuals is shown in Table 3. The column label 'cp' in this table summarizes the percentage of correct detection, defined as a correct identification of the change point location. Meanwhile, the column label 'cp±1', 'cp±2', and 'cp±4' record the percentage of correct period detection, defined as a correct identification of a change point located within one, two and four observations of the simulated change point, respectively. The column label 'miss' reports the percentage of misidentified change point, i.e., the number of observations in a series that are identified as change point while they are not in the 'cp±4' interval. We show the percentage of undetected change point in the colum 'undetec' that is the number of series which the method fails to detect any change point. The 'total' column contains the row summation of all percentages of three columns right on the left.
A similar setting is set for the cases when we consider only uncorrelated residuals. We display the results for detecting change points using uncorrelated residuals at 1% and 5% significant level in Table 4 and Table 5, respectively.
Overall, the percentage of correct detection is an increasing function of the break size and the significant level. Particularly, at significant level 0.05 and break size w = 5, percentage of correct detection of the change point is 21%, but it becomes adequate and high as we consider larger intervals around the true change point, i.e., at cp±1, cp±2 and cp±4, the percentage of correct period detection are 31%, 35% and 41%, respectively. Figure 6 display the number of the detected residuals on their detected change points for all sizes of break. The percentage versions of these subfigures are shown previously in Table 4 and 5. As it can be seen in both figures, the detected change points are consentrated around the simulated change points. The numbers of correct detection (around the change point) in 0.01 uncorrelated residuals (bottom) seems larger than that of the 0.05 uncorrelated residuals (top). However, the number of undetected series in the 0.01 uncorrelated residuals is larger than that of the other correlated residuals. It is important to note that we have set the significant level for threshold in the cpm script by using the default score. All in all, a good performance of the proposed procedure is reached when the significant level for the sample autocorrelation is set at 5%.

Empirical Examples
In this section, we consider the abrupt changes in two tourism data mentioned in [28] and in [3], [2]. The first series exhibits an increasing trend while the other depicts both an increasing trend and a strong seasonal pattern. We demonstrate the detailed steps of the proposed procedure and compare the results obtained by the poposed detecting algorithm to that of the existing detecting algorithm. i.e., automatic detecting procedure in tsoutliers package version 0.6-8 in R which is based on the detecting algorithm in [7].
Forecasting Taiwan tourism series using fuzzy time series is discussed in [28], [29]. The authors acknowledge that there is a possible outbreak of Severe Acute Syndrome (SARS) reported in 2002/04 on the series. Further, the authors consider the SARS period to be from 2002/11 to 2003/06. Meanwhile, it has been reported that there are two terrorist attacks in Bali (popularly known as Bali bombing) happened at a quite closed time. The first attack was in October 2002 and the second was in October 2005 [2]. The unexpected outbreak and bombings had affected the number of tourist arrivals as presented in the references. We aim to detect the first time the tourist numbers undergoing abrupt changes in their structure. In other words, we aim to detect the change points due to the intervention.

Example 1 : Detecting SARS in Taiwan Tourism
In forecasting monthly Taiwan tourist arrivals using FTS in [28], the ARIMA model for the series is also discussed. The in-sample data, i.e. on time interval 1984/01 to 2000/4, is modeled as ARIMA (1,1,1). By knowing that there is an event of SARS in Taiwan, Chow-test was used to detect the break on the out-of-sample data (2000/5 to 2005/9). Having rejecting the null hypothesis of no changes, the estimation and forecast time interval are then reset to the time before the breakpoint. For our example, we apply proposed algorithm on the out-ofsample data to detect the abrupt change on the series due to the intervention of SARS. We obtain the monthly series from data CEIC.

Generating Residuals : Applying FTS on Raw Data
We display the Taiwan tourist series in Fig. 7. One can see that there is a dramatic drop nearly on the middle of the overall increasing trend in the series (see Fig. 7A). We highlight the short interval where the series undergoes the break on the second subfigure, Fig. 7B. The sample ACF and PACF of the series is displayed in Fig. 7C. where we can see that the series is correlated. We would like to estimate the series such that it gives i.i.d residuals conditional to the change point.  We apply FTS of order one on the Taiwan data set and display the estimates in Fig. 8A. The residuals shown on Fig. 8B still depict the similar drop pattern as does the raw data. Figure  9 shows diagnostic plots of the residuals. The ACF and PACF plots show a white noise phenomenon. Statistics test of Ljung-Box Q shows insignificant correlations for the first 20 lags of the residuals. The AICC plot shows that the minimum AICC is when the autoregressive order is 0. All plots indicate independent observations. The last two subplots show comparison of the sample distribution to the standard normal. The presence of one outlier is clearly seen on the quantile plot. However, overall spread of the residuals shows a normal distribution.

Detecting Change Points : CPM (Change Point Method)
As said in the introduction, cpm package provides a set of tools to detect changes either for parametric or nonparametric approcah. We use detectChangePointBatch function because we want to detect change points in a fixed-sample size or we want to do restropective detection problem. Based on diagnostic checking on the residuals, one can have a good idea of the type of CPM model should be used. Another option is of trial and error method. For the residuals of Taiwan series, our first attempt is to employ CPM model assuming normal distri-    Figure 10 displays the D k,n statistics of the residuals. We add a vertical-dash line on the default plot to show the maximum point of the statistics which is the detected change point. The change point is located at the 35th residual. Therefore, on the raw data, location of the abrupt change is on the 36th observation, that is on 2003/03 (see Fig. 11).   Figure 11. An abrupt change location on Taiwan tourist series.

Example 2 : Detecting Bali Bombings
Monthly numbers of the tourist arrivals in Bali from January 1982 to August 2019 are shown in Fig. 12A. Overall, the series shows an increasing trend and a seasonal pattern starting round 90s. The data set used in the example is the shorter interval shown in Fig. 12B. The series themselves and the sample ACF and PACF within the period suggest nonstationary and seasonal behavior indicated by the slow decreasing and oscillating ACFs.

Generating Residuals of Bali Data Set
Aiming at eliminating trend on the series which at the same time reveals the seasonal pattern, we apply FTS of order one the short interval. We plot FTS estimates superimposed by the observations and display the residuals in Fig. 13A and Fig.  13B, respectively. The sample ACF and PACF (in Fig. 13C) show a stationary pattern with significant correlations only at lag 1 and 12. One can see that the sample ACF oscillates within the significant bands. This suggests that the entertained model is a seasonal multiplicative ARIMA(1,0,0)(1,0,0)12. We estimate the parameter model using arima and estimate function in Matlab. The result below shows a significant existence of a seasonal autoregressive behavior in the series.    Diagnostic plots of the seasonal ARIMA residuals in Fig.  14 show an i.i.d Gaussian series. The sample ACF and PACF show a white noise phenomenon. The fact is supported by the Ljung-Box Q test and the AICC plot that shows uncorrelated observations at all lags up to 20 and up to 10, respectively. The QQ plot and the comparison to the standard normal plot show that the residuals seem to be asymptotically normal.

Detecting Change Points in Bali Data Set
We input the residuals obtained on the previous subsection into the cpm package in R. Because we want to detect multiple changes, we treat the residuals as being a data stream which observations are received and processed sequentially. We use processStream function. We try to detect using all CPM types assuming normal distribution. A trial signals change points when we use GLR. The CPM model generates statistic tests to detect changes in mean and in variance of an i.i.d Gaussian stream. The R script is attached on the appendix. Plot of the residuals along with the detection times and the change points is shown in Fig. 15  We display the change points obtained by the proposed algorithm in the Bali data set. On the raw data, the multiple change points are on the 33, 35, 71 and 72 th observation. We need to add 1 to the vector as the input to the cpm is the residuals of the second to the last observation of the raw data. Figure 16

Taiwan Data Set
The following R script is used to detect the abrupt change on the Taiwan data set. We input the raw data and choose automatic detection function, tso(). The result gives a significant AO (additive outlier) type occuring at time 2003/03. This result is exactly equal to the detected change point obtained by the proposed procedure. This result confirms that the proposed algorithm provides an alternative tool in change detection analysis.

Bali Data Set
Following [30], we detect outliers in Bali tourist arrival series using the arima framework in the tso package. The tso() function gives one AO type of outlier effects detected at time 2002/11 and one TC (temporary changes) at time 2003/04. Under the script, it seems that the function recognizes only the first event in Bali bombings.

Conclusions
We propose a procedure to detect abrupt changes involving fuzzy approach in change point model (CPM) framework. The proposed fuzzy procedure is used as an estimation model to capture trend in the raw data. We model the change as an AO (additive outlier) type as in time series outlier analysis. We use a parsimonous ARIMA model contaminated by a single abrupt change to simulate the sampling behavior of the autocorrelations of the Fuzzy Time Series (FTS) residuals. We also provide the detecting performance for the proposed detection procedure in the simulation. The proposed procedure gives a satisying result in detecting the SARS outbreak in the increasing trend series of tourist arrivals in Taiwan. Meanwhile, 4 change points are detected in the seasonal Bali tourism series where the first two points are closed together, as well as the last points. The locations of the group of change points are related to the two well-known terorist attacks in Bali, respectively. The change points of the two cases are also captured using an automatic detection package of tso in R. The package gives an AO type of outliers at exactly similar location to the one obtained by the proposed method for the Taiwan's series. However, the package gives only one AO type of outliers for the Bali's series.
Eliminating trend in time series analysis is a preliminary process in forecasting aimed to get the series stationary in mean. Meanwhile, eliminating trend in change detection in the framework of CPM is aimed to get the i.i.d residuals. We have shown by simulation that FTS of order one, as proposed in this article, can be used to eliminate a trend in a contaminated time series. The residuals can be used later as the input on the CPM to detect the changes. However, some notes need to mention. We define the abrupt change as the AO type of outliers. Only single break located in the middle of the simulated 760 Fuzzy Estimations for Detecting Abrupt Changes: Cases on Tourism Series series is considered in the simulation. Hence, further research on other types of outlier effects with multiple change points is a promising topic. Parameters of the ARIMA model are set such that we work on moderately small variance series. The setting are purposedly chosen such that we are able to describe the series. Also more importantly, the simulated model resembles the empirical series which shows a slow increasing trend with and without a seasonal pattern. Therefore, simulating ARIMA models with high variances may offers an opportunity to observe new findings such as implementing other types of FTS models in generating i.i.d residuals. Lastly, given that the variance of ARIMA models is an increasing function of the time observation, we define the minimum breaksize of the abrupt changes as a function of the model standard deviation at the simulated change point and at the last time point. As there is no maximum size of the break, we do not discuss it on the article. library ( data . table ) mydata <-fread ( " T ai w a n T o u r i s t E r r o r 1 . csv " ) x = mydata $ residual library ( cpm ) resultsM <-d e t e c t C h a n g e P o i n t B a t c h (x , " GLR " , alpha = 0.05) # plot s t a t i s t i k Dk , n plot ( resultsM $ Ds , type = " l " , xlab = " residuals " , ylab = expression ( D [ t ]) , bty = " l " ) abline ( h = resultsM $ threshold , lty = 2) abline ( v = resultsM $ changePoint , lty =2) cp = resultsM $ changePoint write . csv ( cp , " ch an g e p o i n t T a i w a n . csv " ) 2. R Script for Detecting Change Points on Bali Data Set library ( data . table ) mydata <-fread ( " BaliError . csv " ) x = mydata $ e library ( astsa ) # package for time series y = sarima (x ,1 ,0 ,0 ,1 ,0 ,0 ,12) summary ( y $ fit ) SARIMA _ error <-resid ( y $ fit ) library ( cpm ) res <-processStream ( SARIMA _ error , " GLR " , 500 , 20) plot ( SARIMA _ error , type = " l " ,) abline ( v = res $ detectionTimes , lty = 2) res $ detectionTimes abline ( v = res $ changePoints ) res $ changePoints