Chaos Theory Modelling for Temperature Time Series at Malaysian High Population Area during Dry Season

The aim of this study is to model the temperature time series at Malaysian high population area during dry season through chaos theory. The selected high population area is Shah Alam located in Selangor state of Malaysia. Chaos theory modelling is categorized into two parts namely analysis and prediction. Analysis by the phase space plot showed that the nature of the observed temperature time series is chaos. Hence, the time series is predicted via the chaotic model. Results from the chaotic model showed that the temperature time series is well predicted with Pearson correlation coefficient near to 1. The result is compared with the traditional method of autoregressive linear model. Based on the computed values of average absolute error, root mean squared error and Pearson correlation coefficient, the chaotic model is found better in predicting temperature time series at Shah Alam area during dry season. This indicates that the chaos theory is applicable for temperature time series at Malaysian high population area. This finding is expected to facilitate stakeholders such as Malaysian Meteorological Department and Department of Environment Malaysia in managing temperature and climate change problem.


Introduction
Recent developments in information technology and computing ability over the last few decades have made it possible for in-depth exploration of complex systems in the natural sciences. The main issue is to understand the dynamics of the complex systems with disorder structure. The disorder structure needs to be explored before any further action such as the prediction of the complex systems can be done. One of the theory which can be applied is called chaos theory.
In modern meteorological and hydrological studies, some complex systems such as river flow, rainfall, air pollution, sea level and temperature need to be modelled in order to understand the structure as well as the nature of the data. In this study, chaos theory is applied in order to model the chosen complex data and study the nature of the data.
High temperature has an adverse health effects on human population and it can lead to mortality (1). The rapid urbanization in any population settlement can increase the local community activities which lead to heat production (2). Therefore, the study of temperature systems is crucial.
In order to identify the nature of temperature systems, the chaos theory is applied. The identification of chaos in atmospheric systems started with an accidental discovery by Edward Lorenz (3), a meteorologist, when he solved the weather system numerically. The solutions of weather system are found to be unstable, and almost all of them are nonperiodic. Later on, Lorenz discovered that the weather systems exhibit chaos nature which allowed only short-term prediction.
Applications of chaos theory on temperature time series have been done worldwide such as in India (4,5), Nigeria (6), Iran (7), Denmark (8), Canada, the United States, Brazil, the United Kingdom, South Africa, China, Japan and Australia (9,10). In Malaysia, chaos theory has been successfully applied to time series of ozone (11)(12)(13), river flow (14)(15)(16) and particulate matter (PM 10 ) (17). Thus, chaos theory is hoped to facilitate in analysing and predicting time series of the temperature as well.
Application of this approach is still new in Malaysia. Since rapid urbanization can increase the local community activities which lead to heat production, this study has become the pilot study to apply the chaos theory towards the temperature time series at Malaysian high population area. In this study, the power of the chaotic model will be examined in analysing and predicting the temperature time series.
This study aims to: (i) identify the nature of temperature time series at Malaysian high population area through chaos theory, (ii) determine the minimum number of factor that affects the temperature time series at Malaysian high population area through chaos theory, (iii) predict temperature time series at Malaysian high population area through chaotic model and traditional method and (iv) determine the best method in predicting the temperature time series.  (18). The dry season in most states in Malaysia occurs between June and August each year, during the Southwest Monsoon. Malaysia is currently experiencing some haze periods. Haze is particles, dust and smoke that accumulate at low levels of the atmosphere. Among the haze factors are regional factors, namely the suspended particle-borne factor resulting from open burning activities in Indonesia to Peninsular Malaysia through the medium of wind during the Southwest Monsoon (19)(20)(21).

Temperature Time Series
The dry season during the Southwest Monsoon is synonymous with hot conditions and high temperatures.
Heat exhaustion or warmness stroke takes thousands of lives during every summer time. Therefore, it is better if the examined time series duration is selected during the Southwest Monsoon. The study began in 2017. Thus, the time series chosen was during the Southwest Monsoon in 2017. In this work, the Southwest Monsoon is called as dry season.
Sudden increase of temperature is harmful to health. Some severe cases of high temperature can lead to death. Hence, the prediction model of the temperature is necessary to be developed. Among the sources of increasing temperature are vehicle smoke and human activities. This often happens in areas with high population distribution.
According to Department of Statistics Malaysia, four states in Malaysia with high population distribution are Selangor, Johor, Sabah and Perak. Therefore, the study must be conducted in these states. In this work, chaos theory is applied on the temperature time series at Shah Alam area during the dry season.
The total area of Shah Alam is about 290.3 km². Shah Alam is one of the busiest towns which are scattered with industrial and education areas. From 2017 demographic data, it was reported that the total population of Shah Alam was 189,000. Hence, the prediction of temperature at Shah Alam is crucial to maintain public health.
Time series represents a quantitative measure of any complex systems. Time series, denoted by , is recorded at time . In this work, the chosen time series is temperature. The temperature series used are the secondary data obtained from Malaysian Meteorological Department and Department of Environment Malaysia. The data were recorded hourly in unit for three months, from 1st June until 31st August 2017. Thus, the overall observed temperature time series at Shah Alam area during the dry season in this work was 2208. Mathematically, the temperature time series is written in the form of: (1) where is the time series at -th hour and is the total hour of observation. In this study, hours. There is only one missing temperature time series from the total of 2208 data. The missing time series was replaced by the same hour time series from the previous day. Time series plot is a visual method which involves plotting the state variables of the system and observing the trend. Figure 1 is the hourly temperature time series at Shah Alam area during the dry season which was obtained from Malaysian Meteorological Department and Department of Environment Malaysia. Malaysian climate is categorized as tropical rain forest with a lot of rainfall and high temperatures throughout the year. Malaysian climate contains humidity and rain. Throughout the day in a year, Malaysian average temperature ranges from 28°C to 32°C. However, the temperature observed at Shah Alam was slightly different where it ranged from 21 o C to 36 o C. The high value of temperature time series might be associated with the dry season that occurred during the observed period which was from June to August. Therefore, it is important to analyse and predict Shah Alam temperature time series. Table 1 presents statistical description of the temperature time series observed at Shah Alam during the dry season of the year 2017. Referring to Table 1, the range between minimum and maximum values was quite high. Therefore, this study is suitable to be carried out in order to test the power of chaos theory in analysing the lowest and highest temperature time series.

Theoretical Framework
Systems such as river flow, rainfall, air pollution, sea level and temperature are usually reported as complex, nonperiodic and nonlinear. The complexity makes the identification of the time series' nature become complicated and hard task. Therefore, chaos theory offers help in order to elaborate the hidden information as well as determine the nature of the observed time series.
The tools of nonlinear analysis which are employed in chaos theory include qualitative tools such as observation of the state variables namely phase space plot and the reconstruction of phase space. In this paper, the phase space plot is used to identify the nature of the observed temperature time series. Furthermore, chaotic model is developed using the concept of phase space reconstruction.

Nature of Observed Temperature Time Series Identification
In this study, phase space plot is applied to identify the nature of observed temperature time series at Malaysian high population area during dry season. Equation (1) of is used to graph the phase space plot of . It has been noted from Equation (1) that the time series is collected hourly. Therefore, in order to reserve the time series' originality, is chosen. For that has been defined, the phase space plot is graphed.
If a well-defined attractor is observed on the phase space plot, then the nature of the time series is chaos (22).

Development of Chaotic Model
In order to develop chaotic model, time series in Equation (1) is separated into two parts namely and .
is a set of training time series to develop chaotic model, while is a set of testing data which facilitate to examine the performance of the prediction model.
In this work, temperature time series at Malaysian high population area namely Shah Alam was selected. The hourly time series was observed during the dry season from 1 st June to 31 st August and recorded as (Equation (1)). The first 2040 hourly time series was selected as and the last week of August was chosen as . The overall number of time series used is presented in Table 2. The first step to develop the chaotic model is the reconstruction of phase space. The scalar time series from Equation (1) is reconstructed into: (2) with delay time, and embedding dimension, . Equation (2) is also known as -dimensional phase space. The value of both parameters must be identified. It has been noted from Equation (1) that the time series is collected hourly. Therefore, in order to reserve the data originality, is chosen. Since the phase space plot is graphed in two-dimensional , therefore, Value of nearest neighbors are selected based on the minimum value of , where and is the Euclidean distance. According to Jayawardena (23), the basic chaotic model to predict is done using the local average approximation method which takes the average of the values. Therefore, the equation is: The power of chaotic model as well as traditional approach in predicting temperature time series at Malaysian high population area namely Shah Alam is determined through the average absolute error ( ), root mean squared error ( ) and Pearson's correlation coefficient ( ). The and values portray the difference between the observed and predicted time series. The lower the value of and , the better the performance of the model. On the other hand, value reflects the relationship between the observed and predicted time series. The closer to -1 or +1, it explains that the observed and predicted time series are in a good agreement and close to each other.

Chaotic Analysis
The first aim of this study is to identify the nature of temperature time series at Malaysian high population area through chaos theory. In order to achieve this aim, the phase space plot is graphed.
From methodology section, it has been determined that . Hence, the phase space of is plotted.
The phase space plot insists on the chaotic nature of the hourly temperature. Figure 2 corresponds to the reconstruction of two dimensions with delay time for temperature time series at Shah Alam. The projection of the attractor on the plane is observed. For the hourly temperature time series at Malaysian high population area, the projection yields a clear attractor in a well-defined region. Phase space represented the dynamical systems where each point on phase space is referred as a particular state of the system at a particular time. Phase space representation is versatile tool that can be applied in time series analysis. For a random time series, the points are scattered and fulfilled the phase space. Vice versa, from Figure 2, it can be clearly seen that most of the points are converging to well-defined region. These are also known as an attractor (22). The existence of attractor suggests that the nature of the studied temperature time is chaos. Since it is confirmed from phase space plot that the nature of the observed temperature time series is chaos, this confirms that the prediction model through chaos theory namely chaotic model can be developed.

Chaotic Model
It is confirmed from phase space plot that the observed temperature time series exhibit chaos nature. Therefore, according to some literatures such as Lorenz (3), Abarbanel (24) and Sprott (25), the characteristic of observed time series is sensitive dependence on initial conditions. Hence, only short-term prediction is allowed. Therefore, in this paper, the one-hour ahead prediction is done. The prediction duration is 168 hours. Based on Figure 2, the chaotic nature is detected in two-dimensional graph, therefore, is chosen.
through the local average approximation method are shown in Figure 3. Obviously, the predicted and observed data are close to each other. Furthermore, it can be seen that the increase and decrease in temperature time series at Shah Alam are well predicted.
Scatter diagram can also be used to represent the relationship between the observed and predicted time series. The nearer the points to the middle line, the closer is the relationship. From Figure 4, it can be concluded that the values of observed and predicted time series are close. Therefore, the local average approximation method successfully predicts the temperature time series at Malaysian high population area of Shah Alam.

Comparison between Chaotic and Traditional Model
Performance of the model is tested through in the computation of , and . The and values give an idea of the difference between the observed and predicted time series. The lower the value, the better is the performance of the model. Table 3 shows a comparison of performance indexes between traditional model (through autoregressive linear) and chaotic model (through the local average approximation method) for the observed temperature time series at Malaysian high population area of Shah Alam. From the calculation of performance index, it is found that the chaotic model is better in predicting the temperature time series. Most of the existing studies are more concerned with the calculation of Pearson's . Thus, from Table 3 it can be seen that the best prediction model for Shah Alam temperature time series during dry season is the local average approximation method via parameter and . Furthermore, from Table 4, it can be seen that the local average approximation method predicts the best for the Shah Alam temperature time series as it increases 1.697% value and decreases 17.546% and 11.262% values. Therefore, the results demonstrate that the chaotic model is at its best in predicting the temperature time series at Malaysian high population area of Shah Alam during the dry season.

Factors Influence Temperature Time Series
Research conducted by Adiwijaya et al. (26) found that temperature time series are influenced by humidity, air pressure, rainfall, length of solar radiation and wind speed. Furthermore, according to Samuel et al. (27), seasonal variation of dry and wet seasons also have an impact on temperature. On the other hand, research by Hashim (1) detected that urbanization, industrialization and house development contributed to the increasing of temperature. Next, study by Dutta et al. (28) reveals that location of observation also is one of the factor that can influence the temperature.
As , this suggests that at least two factors affect the observed temperature time series at Malaysian high population area of Shah Alam during dry season. The list of factors detected and listed above suggests that more than two factors influence the temperature time series. This finding is compatible and consistent. Hence, is reliable.

Conclusion According to Research Objectives
i. Identify the nature of temperature time series at Malaysian high population area through chaos theory.
Results from phase space plot showed that the nature of temperature time series at Malaysian high population area of Shah Alam during dry season is chaos. ii.
Determine the minimum number of factors that affect the temperature time series at Malaysian high population area through chaos theory.
Phase space is plotted in two-dimensional graph. Since , this suggests that at least two factors influence the observed temperature time series at Malaysian high population area of Shah Alam during the dry season.
From literature reviews, temperature time series are influenced by human activities such as urbanization, industrial, house and building development, as well as climate factors such as humidity, air pressure, rainfall, length of solar radiation and wind speed. iv. Determine the best method in predicting the temperature time series.
Compared to traditional methods, the chaotic model with application of local average approximation method is better in predicting the temperature time series at Malaysian high population area. From the calculation of , the best prediction model is chaotic model via parameter and .

Future Research
With the application of phase space plot, the nature of temperature time series was investigated in this study. In future, other techniques such as Cao method, Lyapunov exponent as well as correlation dimension are suggested to be applied.
The outcomes of the prediction show that chaos theory is applicable towards temperature time series. Thus, chaos theory is suggested to be applied towards other environmental variables such as wind speed and relative humidity including air pollution such as carbon monoxide and sulphur dioxide.
As urban areas are having uncertainties temperature due to heavy pollution, the research area can be extended to Malaysian urban areas such as Klang Valley, Johor Baharu, Johor and Georgetown, Penang.

Research Contributions
i.
Identification of time series nature can diminish the scope for the development of the prediction model. For example, if the temperature time series is found to be chaotic in nature, then, the prediction model based on chaos theory can be developed. ii.
The calculation on the number of variables affecting the temperature series gives an idea to the stakeholders in controlling the causes of temperature uncertainty. iii.
The elaboration of factors affecting the temperature series can provide awareness to public, so that precautions can be taken to control temperature. iv.
Identification of the best method in predicting the temperature time series can give the idea that there is other alternative method, namely chaos theory that can be used to build the temperature time series prediction model. v.
Prediction model of temperature time series is essential to get an early warning on the increase or decrease in temperature and to facilitate the stakeholder in making preparations when faced with uncertain climate.