An Experimental Stochastic Model of the El Niño – Southern Oscillation System at Climatic Time Scales

An explicit bi-variate stochastic model of the El Niño – Southern Oscillation system (ENSO) is built for the first time directly from observations of the Southern Oscillation Index (SOI) and sea surface temperature (SST) in the Niño region 3.4 (1876-2011). According to the resulting stochastic difference equation, the system remembers its past for two years, has the eigenfrequency 0.24 year and the damping factors 0.5 and 0.05. Statistics of SST variations, including properties of El Niño and La Niña, are also estimated through a Monte Carlo experiment. The time-domain model is used to describe ENSO in the frequency domain, including maximum entropy estimates of spectra, coherent spectra, coherence function, and frequency response function. The spectra are peaked at 0.22 year; the coherence increases from 0.6 at the lowest frequencies to over 0.9 between 0.18 yearand 0.33 year exceeding 0.8 at all frequencies above 0.12 year. This means that SST and SOI determine each other by up to 85% everywhere but at the lowest frequencies. The gain factor is peaked at 0.24 year while the phase factor’s magnitude is close to π everywhere. Therefore, the statistical predictability of the bi-variate ENSO time series does not improve much over the scalar case predictability.


Introduction
The El Niño -Southern Oscillation (ENSO) system is regarded as "the strongest internal climate mode on interannual time scales" [1] and "as one of the best studied climate phenomena, both observational and theoretical" [2]. Yet, in the multitude of publications on ENSO we could not find an explicit experimental (based upon analysis of observations) description of ENSO as a bi-variate stochastic system in the time and frequency domains. Our paper fills this gap.
Using the actual time series of mean annual Southern Oscillation Index (SOI) and sea surface temperature (SST) in the ENSO region 3.4, we will construct a stochastic model of the bi-variate ENSO time series and study properties of sea surface temperature in the time and frequency domains using observations as well as Monte Carlo simulations of the system's behavior over long time intervals. Our goal is to study statistical properties of the ENSO system -the sea surface temperature and, to a lesser extent, the Southern Oscillation Index -as components of a bi-variate random process; studying properties of El Niño and La Niña events constitutes only a part of this work. Statistical properties of the ENSO system are examined here at climatic time scales, which begin, roughly, with year-to-year variability; this is why the unit time interval ∆t is selected to be 1 year. Also, at ∆t = 1 month the energy of both SST and SOI oscillations at frequencies exceeding 0.5 year -1 are orders of magnitude smaller than at the lower frequencies.
We believe that our detailed description of the ENSO stochastic system based on experimental data rather than on conceptual theoretical constructions can be used for validation of both theoretical models and the results of numerical simulations of climate within the IPCC program.

Data and Methods
The initial data for the study include the time series of SOI mean annual SST within the ENSO region 3.4 (averaged from 5°S to 5°N and from 120°W to 170°W) , both from 1876 through 2011. The SOI data are published by the Australian Bureau of Meteorology [3] while the SST data were extracted from the file HadCRUT3 [4].
The observed time series 1, 2, [ , ] , 1,..., , n n n x x n N ′ = = x Where N= 136 (and the strike means transposition), is regarded here as a sample of a bi-variate linearly regular random process, and our task is to evaluate its statistical properties in the time and frequency domains. Being a sample of a linearly regular process, the time series x n can be Universal Journal of Geoscience 1(1): 28-36, 2013 29 presented as the output of a linear filter with a bi-variate white noise at the input. Specifically, it is assumed here that x n is a sample of a zero mean autoregressive process of order p, or AR(p): where ( ) ( ) 11 12 ( ) ( ) 21 22 are matrix AR coefficients and 1, 2, [ , ] n n n a a ′ = A is a bi-variate innovation (white noise) sequence with the covariance matrix 11 12 21 22 .
Here R 11 and R 22 are the variances and R 12 = R 21 covariances, of the white noise components 1,n a and 2,n a .
This is a proper and efficient approach in time series analysis, especially in situations when the number of observations is limited, as in our case. Note also that the AR models are capable of presenting random processes with practically any given spectrum. Estimates of the AR order p, matrix AR coefficients j Φ , j = 1, …,p, and the white noise covariance matrix A R are required in order to describe properties of the time series x n in the time and frequency domains. For the sake of definiteness, we will be regarding the time series of SST 1,n x and SOI 2,n x as the output and input, respectively, of a linear system, which is described in the time domain with the above given stochastic difference equation (1). The matrices (2) and (3) are estimated here by using the multi-variate version of the Levinson's algorithm while the optimal order p of autoregression is selected on the basis of four criteria: Akaike's AIC, Parzen's BIC, Hennan-Quinn's CAT, and Schwarz-Rissanen's Ψ [5,6]. The spectral matrix s(f) that corresponds to Eq. (1) is The elements of the spectral matrix 11 12 (4) are the spectral densities 11 ( ) s f and 22 ( ) s f of the output 1,n x (SST 3.4) and the input 2,n x (SOI), respectively, and 12 21 ( ) ( ) s f s f = are the complex-valued cross-spectra (the bar means complex conjugation). These quantities are used to calculate other functions that describe the response of the output component 1,n x of the ENSO system to the input 2 The confidence intervals for these spectral characteristics can be calculated in accordance with the number of degrees of freedom ν for the spectral estimates [7]; for the case of autoregressive analysis of bi-variate time series, ν can be assumed equal, as a first approximation, to N/2p [8]. Spectral characteristics obtained in this manner for Gaussian time series with a properly chosen AR order present maximum entropy estimates.
The autoregressive approach to the analysis of multi-variate time series has at least two advantages over the non-parametric approach. Firstly, it results in an explicit description of the time series in the time domain with bi-variate (in our case) stochastic difference equations. Secondly, through assuming a specific parametric form for the time series, one can obtain relatively reliable estimates of statistics for short time series whose length is comparable to the time scales of interest.

Time domain
As can be seen from Figure 1, the behavior of ENSO from 1876 through 2011 does not seem to contradict the hypothesis of stationarity for the random process that generated the time series n x .If the time series is split in halves, each containing 68 values, the statistical moments such as mean values, medians, variances, correlation 30 An Experimental Stochastic Model of the El Niño -Southern Oscillation System at Climatic Time Scales functions, and spectral densities estimated for these shorter time series do not differ significantly from each other or from respective estimates for the entire time series. The estimates of skewness and excess for the halves of the time series do not differ much from each other and satisfy the Gaussianity conditions. Thus, the time series of SOI and SST can be regarded as samples of Gaussian stationary processes.
When approximating the bi-variate ENSO time series n x with autoregressive models of orders from one through five the AR order p = 2 has been selected by all four criteria, with the respective linear stochastic difference equation The numbers in parentheses in Eqs. (5) and (6) show the rms errors of respective estimates. (Only two decimal digits are kept here.) The system (5) can be re-written as where 3,n x and 4,n x are new variables. In a matrix form: and, according to Eqs. (5) and (7) 0.55 0.03 0.37 0.01 2.57 0.06 5.09 0.16 The respective system of homogeneous equations has four linearly independent solutions 1 2 3 4 , , , and n n n n λ λ λ λ , where , 1, , 4, i i λ = … is the i-th eigenvalue of the matrix Ψ (e.g., [9]). In our case, Note that according to this statistically-optimal bi-variate model of ENSO the current values of SOI and SST depend upon a linear combination of SOI and SST values for the two previous years. This means that the model (7) that was obtained directly from analysis of observations is more complicated than conceptual physical or stochastic models developed under assumptions about the oscillatory form of the basic equations that describe the ENSO system (e.g., [10][11][12]).
The presence of a feedback loop in the stochastic equations (5) that describe annual variations of ENSO can lead to mutual modulation of SST by SOI and SOI by SST. Examples are given in Figure 2.
This oscillatory behavior of the SST−SOI system obtained from our experimental stochastic model is similar to the behavior of the bi-variate system SST−thermocline depth in the equatorial Pacific as described with the conceptual model developed in Jin [10]. According to Jin's conceptual model, the ENSO system has a robust oscillatory mode in the range from 3 to 5 years; according to our observation-based model, the period of oscillations is In the recent publication [13] dedicated to the effects of random forcing upon the behavior of SST at smaller time scales, the eigenfrequency of the underlying oscillation was assumed to be 0.25 year -1 ; this assumption is quite close to our experimental result.
The statistical predictability of interannual variations of SST as a scalar process is known to be low [14]. Our results show that it does not become much better in the bi-variate case. As seen from the white noise covariance matrix A R above, the one-step relative SST prediction error d (1), which is equal to the ratio , amounts to 0.84, and, of course, the error d(τ) cannot improve as the lead time τ grows. Note, however, that the quality of prediction of SST as a scalar process (independent of SOI) is even worse: d(1) = 0.90. In other words, the transition from a scalar to a bi-variate model does not significantly improve the statistical predictability of SST within the Kolmogorov-Wiener theory. As the ENSO time series can be regarded as Gaussian, this poor statistical predictability statement holds for nonlinear approaches to statistical extrapolation of ENSO (or any other Gaussian process) as well [15]. All in all, the description of the ENSO system in the time domain with the bi-variate stochastic difference equation (5) shows that the system • is inertial (the dependence on the past extends for two years at ∆t = 1 year); • has an oscillatory mode with the characteristic time scale of about 4.2 years and the damping factors 0.05 and 0.5; • contains a feedback loop which seems to be able to excite damped periodic oscillations within the SOI/SST system though the main contribution to SST variations comes from the external stochastic forcing; • has poor statistical predictability properties.
Equations (5) and (6) can be used to simulate the bi-variate time series x n for arbitrarily long time intervals and thus to study its statistical properties including those that cannot be deduced with sufficient reliability from the observed data (such as higher statistical moments, the range, etc.). An example of simulated SST data is given in Figure 3. The analysis of a long (50,000 years) time series of SST simulated in accordance with Eqs. (5) and (6) showed that the simulated series has the same parametersautoregression order, coefficients and the noise covariance matrices -as the observed data.
We will concentrate on the analysis of the SST component of the time series. As seen from Table 1, the first four statistical moments of the simulated SST do not differ significantly from those obtained for the observed time series. The Gaussian approximation for the probability density is acceptable for both the observed and simulated data. This agreement between observations and simulations allows one to study additional statistical characteristics of SST using simulated data. The two columns of the table describe statistically equivalent (at the 95% confidence level) sets of random variables whose probability densities are either Gaussian or close to it. The results of analysis of El Niño and La Niña maximum and minimum temperatures in the actual and simulated data are shown in Table 2. It is assumed here that we have an El Niño (La Niña) event if the annual temperature exceeds the temperature averaged over the entire SST time series by at least 0.5°C (drops below the average by at least 0.5°C).

32
An Experimental Stochastic Model of the El Niño -Southern Oscillation System at Climatic Time Scales As seen from the table, the numbers of occurrences of El Niño and La Niña events and their major properties are rather symmetric with respect to the mean SST. This should have been expected because SST variations are Gaussian.
The higher statistical moments of respective probability densities of the simulated data -skewness and excesshave much larger magnitudes than for the observed data and indicate that the probability distributions are by no means Gaussian as they are bounded on one side by 0.5 (or by -0.5). Further analysis shows that maximum (minimum) temperatures during El Niño (La Niña) occurrences can be approximated rather satisfactorily with Gumbel maximum (minimum) probability distribution.
The results of analysis of intervals between successive El Niño (La Niña) occurrences are shown in Table 3. According to both the actual and simulated data, the average interval between successive El Niño's (or La Niña's) amounts to 7 years; it can be as long as 17 (16) years (observations) and almost 50 years (simulations). The probability densities of the lengths of intervals between successive El Niño's and between successive La Niña's are best approximated with a Poisson distribution with parameter λ = 7. The estimates of probability density types for extreme temperatures and intervals between events for the observation data are unreliable because of the small number of data (20 points or less).
The longest run with El Niño events in the 50,000 years simulated data set was 8 consecutive years (7 years for La Niña). However, the probability of such extreme events is negligibly small (≈10-4) while most El Niño's and La Niña's runs contain one or two years (66% − 67% and 27% − 28%, respectively); events that occur three years in a row have probabilities about 0.05. Having in mind the short length of the observed time series, these results do not seem to disagree with observations.

Frequency Domain
The maximum entropy estimates of SOI and SST spectra are shown in Figure 4. Note first that the results of spectral analysis of the simulated ENSO data (shown with crosses) that were discussed earlier practically coincide with respective spectra estimated from the observed time series. This statement is true of all other frequency domain characteristic and it proves the high quality of stochastic simulations of the ENSO system. The presence of maxima somewhere between 0.15 year -1 and 0.3 year -1 in observation-based estimates of SST and SOI spectra in this area of the Pacific is well-known (e.g., [16,17]). We can see here that there is just one maximum at about 0.22 year -1 . This and the behavior of the spectral density at lower frequencies makes the ENSO spectra rather unique in climate.
The correlation coefficient between SOI and SST is known to be high (about -0.83 [e.g., 18,19]), which is unusual for climatic scale processes. However, the correlation coefficient should not generally be used as an indicator of the degree of linear dependence between random processes. High values of linear correlation do mean that the time series are closely related (as in our case) but the dependence of the relation on frequency remains unknown. Low values of correlation do not necessarily mean the lack of a strong linear dependence. A simple example would be the correlation coefficient or even the entire cross-correlation function between a stationary random process and its first increment: it can be close to zero everywhere in spite of the strictly linear relation between the two processes. Relations between time series should be described in the frequency domain using the coherence function, coherent spectra, and the frequency response function with its gain and phase factors. An example of such calculations for the Southern Oscillation Index contribution to the global temperature time series was given in Privalsky and Jensen [20].
The coherence function for the ENSO system ( Figure 5) stays statistically significant everywhere, exceeds 0.9 within the frequency band between 0.18 year -1 and 0.33 year -1 , and never goes below 0.81 at frequencies above approximately 0.12 year -1 . This means, in particular, that the linear relation between SOI and SST is responsible for at least 65% and up to 85% of the power of their variations at f > 0.12 year -1 and in the band between 0.18 year -1 and 0.33 year -1 , respectively. The smallest but still significant coherence (0.56 and higher) occurs at the lowest frequencies. Such behavior is unusual for coherence function estimates at climatic scales. To the best of the authors' knowledge, this issue has not been studied yet but it looks like the "normal" behavior of the coherence function between climatic processes would be a more or less monotonic decrease with frequency (see examples in Figure 5).The ENSO system seems to be an exception.
The high coherence between SOI and SST explains why the coherent spectrum of SST (the spectrum of the part of SST strictly coherent to SOI) is so close to the full spectrum, especially at frequencies above 0.15 year -1 ( Figure 6). Note also that the largest share of the SST spectrum which is not related at all to SOI belongs to the lowest frequencies.

34
An Experimental Stochastic Model of the El Niño -Southern Oscillation System at Climatic Time Scales  As seen from the figure, the gain factor G 21 (f) that describes how SST is transformed into SOI is quite similar to the SOI-SST gain factor G 12 (f) which means that the components of the bi-variate ENSO process excite each other in the same manner.
The phase factor ϕ 12 (f) (Figure 8) is close to -π practically everywhere, which means that the oscillations of SOI and SST occur mostly in opposite directions at all frequencies: SST increases when SOI decreases and vice versa. In the frequency band from 0.2 year -1 to 0.4 year -1 , the phase factor may differ statistically significantly from -π but this does not help with forecasting SOI or SST because the phase difference in that band, when transformed into time units, amounts to less than 1 year, the unit time step. This is another feature that confirms the low statistical predictability within the system. Obviously, the phase factor SOI-SST will be a mirror image of the factor SST-SOI with respect to π. The above empirical results seem to agree with the recent theoretical stochastic model of ENSO suggested by Kleeman [12] and based upon the assumption that the ENSO system behaves as a bi-variate Ornstein-Uhlenbeck process. Firstly, the shape of the spectra in Figure 4 is similar to the theoretical spectrum suggested in that work for the dissipation parameter values between 0.2 and 0.8. Secondly, Kleeman's model explains the very high coherence observed between the components of the ENSO bi-variate time series (see Figure 5). On the other hand, the Ornstein-Uhlenbeck process is a Markov process, which corresponds to a bi-variate autoregressive process of order p = 1. In our discrete-time bi-variate case the AR order p = 2 which means that the observation-based model (5) is not Markovian. Its Ornstein-Uhlenbeck (Markovian) analog would be a fourvariate AR process of order p = 1 [see Eq. (7)].

Conclusions
In this work, we applied methods of autoregressive analysis of multi-variate time series and maximum entropy spectral estimation to observations of the annual Southern Oscillation Index and the annual sea surface temperature in the Niño region 3.4 for 1876-2011.
Briefly, the major results obtained here through analysis of observation data can be formulated as follows. · It was found that the observation data are best approximated with a second-order bi-variate stochastic difference equation; its coefficients and the white noise covariance matrix were estimated along with respective rms errors; the equation describes a damped oscillatory system with the eigenfrequency ~0. 24 year -1 and the damping factors 0.05 and 0.5. ·Statistics of SST as a component of the above bi-variate model, including properties of El Niño and La Niña phenomena were estimated through the Monte Carlo approach. ·The time-domain model was used to give a comprehensive description of the ENSO bi-variate process in the frequency domain; the description includes not just the SOI and SST spectra (their behavior had been generally known before) but also the coherence function, coherent spectra, and the frequency response function.
The time-domain model describes the "deterministic" relations between the two components of the ENSO system expressed in the form of a linear dependence between them, plus a substantial contribution from the external stochastic forcing. The second order of the equation means that statistically significant links between SOI and SST exist at time lags of up to two years. The connections do not play a dominant role in the ENSO system predictability properties which depend mostly upon the unpredictable stochastic forcing.
The existence of the eigenfrequency at 0.24 year -1 in the ENSO system can be regarded as an experimental confirmation of the theoretical model proposed by Jin in [10,11]. The linear feedback between the ENSO components along with the input white noise can excite the damped periodic oscillations in the system. As in the Jin's model, oscillations can be generated by the deterministic part of the linear system and by the stochastic forcing. According to our results, the latter play a more important role.
Variations of SOI and SST were simulated over a long time interval (50,000 years) as components of a bi-variate random process and the simulated data were used to determine statistical properties of ENSO variability including those that cannot be estimated reliably on the basis of the observed time series. It was shown that ·extreme temperatures during El Niño (La Niña) events can be as high as 28°C (as low as 23°C), ·the average interval between successive El Niño's or La Niña's amounts to 7 years, while the maximum interval between successive El Niño (La Niña) events can be as long as almost 50 years; ·though individual El Niño's or La Niña's can last up to 8 or 7 years, 95% of events last 1 or 2 years (67% and28% cases, respectively).
The behavior of the ENSO time series in the frequency domain reveals a number of properties uncommon for atmospheric, oceanic, or land processes at climatic time scales: ·the spectra of both SOI and SST do not decrease with growing frequency; ·the spectra have a single broad maximum centered at f ≈ 0.22 year -1 ; this maximum reflects the existence of the damped periodic oscillation obtained from the solution of the system of stochastic linear difference equations that describe the observed bi-variate ENSO time series; ·the linear interdependence between the components of the ENSO time series is shown to be the strongest at intermediate and high frequencies; · the coherence between SOI and SST is very high, especially in the band from 0.18 year -1 to 0.33 year -1 where it reaches 0.93; this means that the interaction between SOI and SST is responsible for up to 85% or the variance of respective processes in that frequency band; the coherence does not go below 0.8 within the entire frequency band from 0.12 year -1 to 0.50 year -1 ; ·the part of the spectrum of SST variations strictly coherent to variations of SOI is close to the full spectrum of SST, especially at frequencies where the coherence is particularly high; a similar statement holds for the SOI spectrum; ·the respective gain factors show that the mutual transfer of energy between SOI and SST is stronger at frequencies above approximately 0.15 year -1 rather than at lower frequencies; ·the phase factor magnitude is close to π, which means that the presence of the damped periodic oscillation in ENSO cannot substantially improve its statistical predictability. All these conclusions hold for Niño regions SST12, SST3, and SST4, with some variations in the numerical values of the eigenfrequency, maximum coherence, and some other statistical characteristics.
Years ago, it had been shown both theoretically [21] and experimentally [22][23][24] that the spectra of climatic processes generally decrease as frequency increases and do not contain significant peaks. The ENSO system behaves in a different manner. The theoretical dynamic and stochastic models developed, respectively, by Jin [10,11] and Kleeman [12], lead to the ENSO system's spectra of the bell-shape type. The coherence function estimate obtained here behaves in the manner predicted by Kleeman [12]. Our experimental results provide some verification for those theoretical models as well as a detailed description of other statistical properties of the ENSO system at climatic time scales in both time and frequency domains.