Probability Distribution Fitting of Rainfall Patterns in Philippine Regions for Effective Risk Management

This study aims to determine the best fit frequency distribution of rainfall patterns for event forecasting in order to address potential disasters. The monthly rainfall data were taken from the PAGASA which are analyzed using Chi-Square and K-S goodness-of-fit tests. Rainfall data for the past 26-30 years was used to determine the distribution pattern fitted into more than 60 Probability Distribution Functions (PDF). The best fitted PDF both in Kolmogorov-Smirnov and in Chi-square tests were used in statistical inferences. Findings showed that each site has specific theoretical probability distribution functions to infer rainfall events. Varying levels of rainfall events were measured using the CDF. Consequently, the statistical inferences found in this study are important for designing optimum flood control facilities. Moreover, it is also contributory for the effective management of irrigation system of the National Irrigation Administration for a more efficient water supply to the agricultural sector.


Introduction
Rainfall is the most important environmental factor contributory to agricultural activities of the Filipino people across regions in the country. Irrigation supply depends much on the availability of water where the volume distribution is based on the available water resources such as spring, river, and rainfall. Water supply through irrigation is an important strategy in alleviating the current food crisis. However, rain-fed agriculture is still the dominant practice in most upland areas. Soil moisture management in some areas of the country is faced with limited and unreliable rainfall and high variability in rainfall pattern.
Collection and measurement of hydrological data proved difficult on the part of hydrologists due to limited data with some gaps in the series. It is vital to study the variability of rainfall pattern to address climate changes which resulted to occurrences of floods and droughts in several regions. Dry and wet seasons can be felt simultaneously in the country. When parts in Luzon have been flooded, most areas in Mindanao were experiencing dry season and vice versa as claimed by South Travels [1]. It can be observed that the Philippines have been devastated by these famines for quite a time now. We have been receiving news of flash flood events in various parts of the country, the latest of which occurred in Northern Mindanao, while other areas (Luzon for instance) are experiencing limited rainfall. Similar variations can also be noted in other parts of the globe as reported by JPL TRMM Team [2] and Melville [3].
Such variation can also be noted within adjacent regions or provinces where dry season is widespread in most regions, but some geographical areas are severely hit by excessive monsoon rain leading to flash floods. One example of the incident earlier cited was the December 2010 storm that swept the cities of Iligan and Cagayan de Oro into a severe state of calamity. In contrast, most areas in other regions such as Region XII did not experience heavy rainfall during the time that the storm smashed the two cities.
Other related studies which conducted rainfall observations in their areas of jurisdiction showed similar disparity in the occurrences of rainfall distribution. Precipitation anomalies were also described in the articles of Hillis [4], the Bureau of Meteorology [5] and Kumar [6].
However, most of those researches used tabular and graph presentations to describe the rainfall intensities and distribution frequencies. Such information does not tell us the extent of erratic rainfall, when extreme seasonal anomalies are apparent not only in the Philippines but in the entire globe. A review on the published papers of Tilahun [7] and Persson [8] revealed that a specific probability distribution can best describe the rainfall patterns for prediction purposes. It is in this light that the researcher conducted a study of rainfall patterns in some selected regions in Mindanao with the anticipation that the rainfall variability of the selected study sites will fit to a Environment and Ecology Research 6(3): 178-186, 2018 179 specific theoretical distribution function to predict the probability of events. The baseline is supported with maps, tabular presentations, graphs and illustrations. This study includes the analysis of the frequency distribution of rainfall events, extreme rainfall and return period analysis to observe the impact of climatic anomalies or extreme weather cases such as drought and flood risks. The information gathered in this study provides a significant contribution to agricultural sectors, water resource projects, hydropower plants, rural development planning and disaster management in terms of flood estimation, forecasting of extreme events and the return period, water management and civil works

Objectives
This study aims to determine the distribution of rainfall patterns of Cotabato City, General Santos City, Lumbia Misamis Oriental and Kabacan North Cotabato. Specifically, it aims to investigate the shape of frequency distribution curves of rainfall; verify the theoretical distribution function best fit to describe the monthly and annual rainfall patterns; determine the exceedance probability and the return period of occurrence; and calculate probability of events of the four sites.

Research Design
The study used available rainfall data of Cotabato City (1986-2011), General Santos City (1982-2011), Kabacan North Cotabato (1985-2011) and Lumbia Misamis Oriental  in Mindanao. The methods of statistical analysis employed distribution fitting and probability estimates. To be able to test the samples for normality, a simulation model with proven reliability was used for the distribution fitting utilizing Chi-Square and Kolmogorov-Smirnov goodness-of-fit tests.

Collection of Data and its Reliability
Rainfall data of the study sites from 1982 to 2011 was collected from the Climatology Division of the PAGASA Main Headquarters and from USM Agromet Station. Due to limited sources, data reliability check cannot be pursued but to consider the data taken from the official documents of PAGASA and local Agromet Stations to be reliable.

Site Characteristics
The range of rainfall data, elevation and the hydroclimatic type of the four study sites were identified and supported with maps and tables. The annual rainfall data of the sites were also presented for easy categorization purposes. Based on the Modified Coronas Classification devised by Fr. J. Corona in 1920 below, General Santos City and Kabacan North Cotabato have the same climatic pattern which is Type 4 with rainfall intensities that are evenly distributed throughout the year. The former is considered one of the driest places in the county. Kabacan is characterized by dry season for one to three months with less than 76 millimeters or more rainfall per month throughout the year. The wettest month has more than three times the rainfall of the driest month. This type of climate is conducive to intensive rice cultivation and plantings of bananas and other fruit trees according to Wikipedia [9]. On the other hand, Cotabato City with 70% of its total land area is below sea level, belongs to Type 3 category, for which seasons are not very pronounced but relatively dry from November to April and wet during the rest of the year, while Lumbia Misamis Oriental is Type 2 ( Figure 1).  Cotabato City has heavier rainfall during the months of May to November, while General Santos City has no pronounced wet season, but rainfall is evenly distributed throughout the year. Meanwhile, Lumbia Misamis Oriental has its wet months from June to October while Kabacan, has its pronounced rainy season from May to November, otherwise, the rest of the months are dry. The geographical locations of the four study sites are also shown in Figure 2. Among the four stations, Lumbia is located the highest at 182 meters above sea level while General Santos City is situated the lowest at 15 meters above sea level with the lowest average rainfall level recorded at only 949.70mm. Cotabato City has the highest average precipitation of 2,464.3mm for the period covered in this study (Table 1).

Distribution Fitting
The data was analyzed using a simulation model to determine the specific theoretical probability distribution that is best fitted to describe the annual and monthly rainfall. Note that the monthly rainfall for each site was treated separately, for example, the rainfall data for the month of January from 1986-2011 for station 1 was processed in the distribution fitting software. As a result, the Chi-Square and K-S GoF tests generated their best fit frequency curves for that month. Either of these curves are the PDFs that can best describe the rainfall pattern of the station for that month (January). The same procedure was done to all stations from January to December; and, to the total annual and average annual rainfall data for the period covered specific in each site.
This software was chosen by the author since it supports sixty (60) probability distribution functions including the Chi-Square and K-S goodness-of-fit tests. This test measures the "distance" between the data and the distribution being tested. The fit is considered good if the distance (or test statistic) is less than the critical value that also depends on the sample size and the significance level chosen. Once the distributions are fitted, the software displays the reports which include the test statistics and critical values calculated for various significance levels. Along with this, a recommendation is also provided in the report, whether to reject or accept the best fitted distribution at various significance levels (0.2, 0.1, 0.05, 0.02, 0.01). This study uses the 0.05 significance level in selecting the best fit curves. As such, the Probability Distribution Function (PDF) for each study site (monthly, total annual and average annual rainfall) was presented in graphs and in tabular forms.

Tests Reliability
In the study of Tilahun [7], two statistical tools namely Chi-square and kolmogorov-smirnov goodness-of-fit tests were used to identify which theoretical probability distribution function best fits the rainfall data. This technique was used in this study, where two different results were generated.
This might confuse the reader to think which is more reliable between the two. In most instances the Chi-square test gave its best fitted theoretical probability distribution based on Gaussian assumption of normality. It must be noted that the Chi square goodness-of-fit test depend only on the set of observed and expected frequencies and degrees of freedom. To be valid, Chi-square relies much on sample size and can be used for any sample population if the assumption on minimum expected cell frequency of not less than five (5) is met. This kind of test is non-parametric in the sense that it does not involve any population parameters or characteristics. In contrast, the K-S test statistic does not depend on the underlying cumulative distribution function being tested. It is an exact test which assumes that the data follow a specified distribution. It is a kind of a parametric statistic that performs well under a wide range of distributional assumptions which in general is more powerful than those of the non-parametric techniques. However, the use of non-parametric tool is preferred if the distributional assumption is not justified.

Best-fitted Theoretical Distribution
The rainfall data of each of the four sites was fitted to determine the distribution that can best describe the monthly and annual rainfall. The model used in distribution fitting supports three types of reliability tests namely Kolmogorov-Smirnov, Anderson-Darling and Chi-Square. In this study, rank results generated from K-S and Chi Square were considered as the best PDF curves to describe the monthly and annual rainfall in each of the four sites. In most instances, each test confirmed different best theoretical distributions for a given data. Table 2 exhibits two PDF curves which ranked number one in Kolmogorov-Smirnov and the other ranked the same in Chi-Square Goodness-of-fit test. The model generated more than 60 distribution types which also ranked differently in the distribution-based and Gaussian-based tests. In this case, only those curves nearest to the variables being in the first rank were selected as paramount models to be used in estimating probability of events for the sites in terms of total annual and average annual rainfall intensity patterns. For clarifications, the total annual best fit PDFs are useful for predicting exceedance probabilities and the return periods. The importance of such information is vital for civil engineering works such as design of dams and other flood control structures. The data on average annual rainfall is useful on irrigation projects where estimation of an effective rainfall that will exceed a given percent chance of occurrence can be determined as suggested in the USDA, SCS (1967) based on the study of Nieber [10]. Distribution fitting showed the total annual rainfall of Cotabato City can be fitted to Weibull 3P and Johnson SB ( Figure 6). On the other hand, Figure 7 illustrates an Error and Log Pearson 3 distribution to forecast events associated with the total annual rainfall of Lumbia Misamis Oriental. For General Santos City, the software simulated Kumaraswamy and Log Pearson 3 ( Figure 8) to be the crucial models to illustrate the total annual data. Finally, Weibull and Rice are the two best fit theoretical distributions for the total annual rainfall of Kabacan North Cotabato (Figure 9).     Table 3 showed various PDF curves which ranked number one from each statistical test. To cite one, the rainfall data for the month of January for Cotabato City can be best described by a Johnson SB (K-S ranks no.1) or by an Erlang3P distribution (Chi-Square ranks no.1). Each month of the year is fitted to two distribution types which also vary because of the unique rainfall diversity. In other words, the monthly rainfall data of the station for the period of 26 years is best described by up to 19 theoretical probability distribution frequencies. The rest of the three sites can also be described by more than ten probability distributions.

Forecasting Using CDF Graphs
The cumulative distribution function (CDF) graphs for the stations generated by the statistical software are also shown below, with which probability of various rainfall events for each site can be read. An illustration on how to use the CDF graph to make statistical inferences is also demonstrated.
Other assumptions for possible rainfall occurrences and even extreme events can be determined using the CDF graphs as well as checking the result using the distribution equations available in existing resources. When using the equations, please also note that the parameters are already provided together with the corresponding simulated PDF results given by the distribution fitting software. Results obtained from calculations using the distribution equations (PDF) give the same values when using the CDF graph.    Computation results mean that for the total annual precipitation of Lumbia, the site will experience a rainfall between 1500mm to 1800mm (34%) and up to 1800 to 2200mm range (33%). A rainfall of less than 1500mm is likely to occur at 26% and an event beyond 2200mm at 7% chance.

Determination of Exceedance Probability (EP) and the Return Period (RP)
The exceedance probability levels (EP) of the four stations are presented in monthly expected rainfall with the corresponding return periods (RP) in tabular form showing the fitted EP results of the sites (Table 9). One can be optimistic to assume that a given rainfall would be equalled or exceeded at different exceedance probability levels particularly at 60% and 80%. Although at 20% level, one must be conservative not to rely too much on this rainfall but to plan for a supplemental water source for the water management sectors such as the Local Water Districts whose water source is from spring. Also, a good forecast of monthly rainfall in advance is required for the optimal management of the hydroelectric power production system.
Correspondingly, relevant information can be read from CDFs of average annual rainfall to give a direction in managing the water requirements of the agricultural sector by the National Irrigation Administration (NIA).
In addition, it can be noted that the exceedance probability estimates using the total annual rainfall can also be determined using the same procedure (CDF reading) for purposes of designing dams, landfill drainage, rock slopes, bridges and other public safety facilities. The following highlights have been drawn as follows: In a study conducted by Tilahun [7], he was able to show the suitability of using continuous probability distributions that can best describe extremely variable data for arid and semi-arid regions in Ethiopia. Similarly, results of this study showed rainfall data of sites fitted to several probability density functions specific at each station. The technique used by Persson [8] by means of goodness-of-fit tests also proved advantageous in the distribution fitting of the rainfall data of all study sites.
The monthly rainfall variables are fitted to several probability density functions. As described above, certain types of variables follow specific distributions that can best describe the data. Since the PDF curves are already fitted for each of the monthly rainfall data of the four study sites, one can therefore use these distribution curves for prediction of events such as maximum rains, or minimum available rainfall occurring within a specific month.
The total annual and average annual rainfall data of the four stations showed different probability distributions in the sense that the data for the total annual rainfall was taken from the sum of the total monthly rainfall of sites in a year. In contrast, the average annual rainfall was taken from the sum of the twelve monthly average rainfall data. There is a variation in the figures for the total and average annual rainfall as evidenced by different frequency curves generated through the simulation process. The results for the total annual rainfall should not be confused with that of the average annual rainfall because the information for the former cannot be applied to the latter. The utilization of information depends upon which data is needed by the end-user, as explained earlier.
The statistical inferences above is crucial to the agricultural sector to be able to plan ahead of time (Pulhin,[11]) particularly now that the Philippines has recorded several typhoons and flash floods, and dry spells (WHO [12]) as a result of climate change also experienced by other countries. Reference [13] recorded famines of long drought in Southern Mindanao and the tropical cyclone which adversely affected the agricultural sector. In fact, reports said that there have been alterations in the farmers' planting calendar due to changing rainfall pattern as documented by IRIN [14].

Conclusions
Based on the above findings, it is often desirable to understand the shape of the underlying distribution of the population for sound predictive purposes. One can draw a conclusion with confidence since the best fitted theoretical distribution for a given data is determined using parametric and non-parametric tools. It can be observed that there are differences in the PDFs generated and this is attributed to the variation of the sample population. There is a specific frequency distribution for each test statistic performed, or each rainfall data follows a specific PDF, thus, the differences observed in the results. Moreover, the question on which one to choose, the PDF generated by the parametric or that of the non-parametric statistical tool, depends on the judgement of the end-user. If they choose the former tool, the level of confidence in accepting the hypothesis is 95%, otherwise, the degree of failure is only 5%. The same is true when they choose the PDF from the latter. The only difference between the two is reiterated once again, that the Chi-square test depends on the set of observed and expected frequencies and degrees of freedom while the K-S test does not depend on the cumulative distribution frequencies because it assumes that any test statistic follows a specified distribution function. Using either way is not much of an argument. We can use the best fitted PDFs to forecast rainfall patterns of the sites such as the probability of occurrence of rainfall intensities and these predictions are considered with higher reliability. The argument whether the parametric is better than the non-parametric tool is beyond the scope of this study.
Since research findings revealed that rainfall data follows a specific frequency curve, this data was made available in this study for future hydrological processing of information. It is not just enough to be aware what has taken place in the area for the last three decades. Having just in mind that the place is vulnerable to the effects of extreme rainfall is futile without utilizing a statistical tool to responsibly forecast climatic patterns with greater confidence. The researcher therefore concludes that the findings discussed in this instrument can be used as a guide for development planners, agricultural sectors, water management agencies‚ civil works (design of dams‚ bridges‚ drainage and other related structures), hydroelectric power plants as well as in hydrological planning (flood and drought estimation) and disaster management.