Power Law Behavior and Tail Modeling on Low Income Distribution

Authors agree that this article remains permanently open access under the terms of the Creative Commons Attribution Abstract Poverty is an important issue that needs to be addressed by all countries. Poverty is related to a group of people earning a low income (lower-tail of the income distribution). In Malaysia, low-income earners are classified as the B40 group. This study aims to describe the behavior of the low-income distribution using the power law model. For this purpose, an inverse Pareto model was applied for describing the lower tail data of Malaysian household income. A robust and efficient estimator, called the probability integral transform statistic estimator, was utilized for estimating the shape parameter of the inverse Pareto distribution. Based on the fitted inverse Pareto model, not all households in the B40 group complied with the power law behavior. However, the power law was able to provide a good description for the group of B40 that was below the poverty line. Based on the inverse Pareto model, the parametric Lorenz curve and the Gini index were derived to provide a robust measure of the income inequality of poor households in Malaysia.


Introduction
Poverty refers to the phenomenon of deficiency in various source factors of living requirements. Poverty can cause a person to be discriminated against among their community, lose their sense of belonging to the community and even worse, become more vulnerable to criminal activities (UN-HABITAT 2003). In Malaysia, poverty is defined as the low-income group below a poverty line income (PLI). The government of Malaysia has put in a lot of effort in developing policies to reduce poverty. Based on the report by the Economic Planning Unit (EPU Malaysia 2010;2012), poverty in Malaysia has decreased from 52.4% in 1970 to 1.7% in 2012, while extreme poverty has decreased to 0.2% in 2012 from 6.9% in 1984. Although records show a significant reduction in terms of poverty, many problems and challenges remain among the low-income group in Malaysia, particularly income inequality. Per capita, the income for ethnic Chinese and Indians are generally greater than Malays, which indicates a disparity and unbalance income among the races. Malaysian income distribution is very different than other developing countries such as Vietnam and Chile. In these countries, ethnic minorities have low incomes (Agostini et al. 2010, van de Walle & Gunewardena 2001, but in Malaysia, the majority ethnic group of the Malay has low income compared to other ethnics. As mentioned by Alesina and Glaeser (2004), homogeneous populations without different groups of ethnicity tend to have more fair income distribution. Thus, the issues of inequality and disparity of income in a country with homogeneous ethnicity is less serious than a country with a variety of ethnicities. This implies that income inequalities are a major concern, especially for multi-racial countries like Malaysia.
The increase in people in poverty and income inequality will affect the economic growth and people's welfare. In Malaysia, in addition to the PLI measure, a group of people with the lowest 40 percent household income, known as the B40 group, has also been defined as a low-income group (Department of Statistics Malaysia, 2016a). The problems faced by the people in the B40 group include low income, debt, difficulty in home ownership, job permanence, and their children cannot afford to pursue higher education, and etc (Lee 2017). Hence, besides the poor (below PLI), the B40 group is also considered and assisted by the Malaysia government.
Statistical models and inequality indices are among the popular methods used to describe and analyze the characteristics of income distribution and inequality. Over the last decade, the world has witnessed an interest in the application of power law in economics (for example, see Clementi F, Gallegati 2005;Gabaix 2016;Reed 2001). The study to find a general model for income and wealth distribution began over one hundred years ago. In fact, the work by Pareto (1895) has become very popular among economist researchers. Pareto (1895) suggested that the tail on income and wealth distribution will follow the power law, in which that the probability of distribution ( ) α is a positive shape parameter. There are many empirical distributions that exist in the economy and other fields that are found to follow the characteristics of the power law (for example, see Ishikawa 2005;Oancea et al. 2018;Klass et al. 2006;Safari et al. 2018b).
Although previous research mostly focused on the upper tail of income distribution, there is new evidence that the power law (with positive power, α) is also valid for the lower tail of income distribution (Reed 2003;Toda 2011). A study by Devadoss and Luckstead (2016) on the size of a city in the US found that the lower tail of size distribution follows an inverse Pareto distribution. However, in a recent work, Safari et al. (2019) proposed an inverse Pareto model for lower tail data of income distribution, which demonstrates empirical evidence of power law behavior. Thus, in this study, we further investigate the application of an inverse Pareto model to describe and analyze the distribution of B40 and poor household income groups in Malaysia.

Data Usage
Data used in this study was obtained from the Department of Statistics Malaysia (DOSM) through Bank Data of UKM located at the School of Mathematical Sciences, Faculty of Science and Technology. The data was collected by DOSM through an official survey called the Household Income Survey (HIS). In this study, the data set considered consists of the net gross household income data for 2014.

Inverse Pareto Model Based on Power Law
The lower tail of the income distribution illustrates poor income earning. The properties of this lower distribution can be explained using the power law, which leads to the definition of the inverse Pareto distribution described by the following cumulative distribution function (CDF): (1) where x l is a threshold value and α > 0 is a shape parameter. The value of α describes the heaviness of the inverse Pareto tail where the smaller α, the heavier the tail, leading to the interpretation that the poor household income group has high-income inequality. Based on Equation (1), the probability density function for the inverse Pareto can be derived as: (2)

Comparison of the Threshold Value of the B40 Group and Poor Income Group
The suitability of the threshold x l for the B40 and poor income groups were evaluated through a statistical goodness-of-fit criterion, the Kolmogorov-Smirnov (K-S) statistic (Safari et al. 2018a;Safari et al. 2018b). Based on this criterion, an optimal threshold was chosen when it minimizes the K-S statistic. According to the report by DOSM (2016b), the threshold value for the poor income group in Malaysia for 2014 was approximately RM 950 for monthly income, which was equivalent to RM 11400 for the annual income. Based on the HIS 2014 data, the threshold was approximately RM 38324.80 for the B40 group. Let X 1 , X 2 , …, X n be a random sample of income data, the distribution function of F n (x i ) and F n−1 (x i ) can be written as: ( 3) and (4) where ( ) is an indicator function. Thus, the threshold value, x l will be chosen based on K-S (D l ) statistic that minimizing Equation (7) given by:

Parameter Estimation Based on the Robust Method
A robust method, called the probability integral transform statistic estimator (PITSE), was used to estimate the shape parameter of the inverse Pareto model. As shown by Safari et al. (2019), PITSE was able to provide a more precise estimation than the maximum likelihood approach, particularly when the extreme value exists in the lower tail data. Through PITSE, the parameter estimator for α of the inverse Pareto can be written as: (8) where t > 0 is an adjusted tuning parameter. For any value greater than t, PITSE gains robustness but loses its relative efficiency. In this study, the value of t = 0.883 was used that corresponded to 78% asymptotic relative efficiency.

Diagnostic Model
The fitted inverse Pareto model was evaluate using an inverse Pareto plot and R 2 coefficient given by: (9) where i = 1,2, …n, F n (x i ) is the empirical cumulative probabilities, A high value of R 2 indicates that the inverse Pareto model can adequately explain the lower tail data of household incomes.

Lorenz Curve and Gini Index
The Lorenz curve, which was introduce by Lorenz (1905), is a graph that represents the inequality corresponding to the ratio of income earned by any ratio of the population in the income distribution. The 45º line in the curve indicates perfect equality of income distribution. The higher the difference between the curves to the 45º line indicates a greater income gap among the population. The empirical formula for the Lorenz curve is given as: (10) where x (i) is the i-th order statistics of household income for i = 1,2, …n, μ is the sample mean of household incomes is the largest integer less than i-p+1 (Cowell and Flachaire, 2015). However, if the inverse Pareto model could describe the behavior of the lower tail data, the Lorenz curve can be derived based on the inverse Pareto using the following equation: where p∈ [0, 1], and E(x) and F −1 (u) are the mean and quantile function of the inverse Pareto distribution, respectively. The simplified version of Equation (11) is given as: (12) Apart from the Lorenz curve, the most commonly used inequality measure is the Gini index. The value provided by the Gini index is between 0 and 1. A value of 0 indicates perfect equality, while the Gini index with a value equal to 1 indicates perfect inequality in income distribution. The empirical Gini index is given as: Based on the inverse Pareto model, the Gini index is derived based on the following formula: where L (p) is the Lorenz curve of the inverse Pareto distribution. The simplified version of Equation (14) is given as:

Results and Discussion
Tables 1 and 2 show the descriptive statistics for the distribution of annual income earned by the Malaysian citizen (B40 and poor household) for the data of HIS 2014. For the B40 group, the overall mean and median were RM 25364 and RM 25789, respectively. The difference between the overall mean and median was small, which indicates that both the mean and median can be used as a central measure for the B40 income earned. For the area factor, the mean and median for the urban citizen indicates a larger income than a rural citizen. The mean/median income for a rural citizen is greater than the overall mean/median, while the mean/median income for an urban citizen is less than the overall mean/median for the B40 group. This is a normal scenario that could happen in any country. However, the differences of mean and median income between rural and urban areas for the B40 group were not much. Thus, the disparity for rural and urban areas is not serious. For a factor of the ethnic for B40 group, the data shows that the non-native citizen has a large mean and median income compared to the native citizen. The mean/median income for the non-native citizen is greater than the overall mean/median, while the mean/median income for the native citizen is less than the overall mean/median for the B40 group. Although the difference is small, the government should consider this problem seriously because the proportion of native citizens in Malaysia is more than 60% of the total population. A low mean/median income for native citizen indicates a large portion of poor people in Malaysia. The education factor shows a normal behavior, which indicates that the citizen with higher education level is able to obtain greater income than those with a lower level of education. Regarding the location of the state, the mean income varies around RM 22797 to RM 29955, while the median income varies around RM 22256 to RM 31320. The citizen in more developing states, such as W.P. Kuala Lumpur, W. P. Putrajaya, Selangor and Melaka, are able to earn a higher income compared to those in other states. Apart from that, the value of the overall minimum and maximum income for the B40 group shows a very large difference, specifically RM 2369 to RM 38324, respectively. The same scenarios occur for factors of area, ethnicity, education, and state. The minimum and maximum income for all of these factors shows a very large difference and disparity among citizens in the B40 group. For the poor income group, the behavior followed nearly the same scenario as the citizens in the B40 group. Based on Table 2, the overall mean and median were RM 9059 and RM 9472, respectively. The difference between the overall mean and median for poor citizen groups was also small, which indicates that both the mean and median can also be used as a central measure. For the area factor, the mean and median for the urban citizen indicate a larger income compared to the rural citizen. The mean/median income for the rural citizen was greater than the overall mean/median, while the mean/median income for the urban citizen was less than the overall mean/median. However, the disparities of income between rural and urban poor citizen groups were smaller than the B40 group for the area factor. For the ethnic factor of the poor citizen group, the data also shows that the non-native citizen has a large mean and median income compared to a native citizen. However, the disparities of income between rural and urban poor citizen groups were smaller than the B40 group based on ethnicity. For the education factor, the poor income group shows that the citizen with a degree certificate earned more income. However, citizens with a diploma (2 nd rank after degree) earned a lower income than citizens with only a STPM certificate (3 rd rank after degree). This explains why the career opportunities of the poor income group are more secure for citizens with a STPM certificate than those with a diploma certificate. The rest of the education levels have the same scenario as the B40 group. The state factor for the poor income group shows a different behavior than the B40 group.  Table 2, none of the citizens listed in the poor income group live in W.P. Putrajaya. W.P. Labuan had the largest mean/median of income for the poor citizen group. However, it was surprising that W.P. Kuala Lumpur was among the states with the lowest income mean/median. This contradicts the scenario that the B40 group and poor income group represent a larger inequality of income distribution for citizens that live in W.P. Kuala Lumpur. Table 3 shows the estimated parameters, K-S statistic and R 2 coefficient values. Based on a much smaller K-S statistic value in Table 3, the threshold of the poor income group is more optimal than the threshold based on the B40 income group. Moreover, based on the R 2 value, the inverse Pareto model was much better fitted to the poor income group data than the B40 group. Based on Figures 1 and 2, the inverse Pareto model was well fitted to the poor income group data. Therefore, the inverse Pareto model can adequately explain the lower tail data of poor citizens in Malaysian households. For the B40 group, the inverse Pareto model failed to describe the distribution of its income data. Thus, information regarding the inequality measure, such as the Gini index and Lorenz fitting, using Equations (12) and (15) could only be done for the poor income households group. For the B40 group, the inequality measure for the Gini index and Lorenz fitting can only be approximated based on the empirical approach represent in Equations (10) and (13), respectively.  Figure 3 shows the parametric Lorenz curve of the poor income group based on the fitted inverse Pareto model and the Lorenz curve of the B40 group based on the empirical formula. The Lorenz curve of the poor income group was closer to the "line of equality" than the Lorenz curve of the B40 income group. This suggests that the income is fairly distributed among the poor households compared to the B40 households. If the income group among the poor is divided into two groups, the bottom 80% and top 20%, as shown in Figure 3, the bottom 80% earned around 75.07% of the total household income of the poor while the top 20% only earned around 24.93% of the total household income. In addition, for the B40 group, the bottom 80% earned around 71.53% of the total household income and the top 20% only earned around 28.47% of the total household. Apart from that, Table 4 provides information about the inequality measure of income distribution for Mathematics and Statistics 7(3): 70-77, 2019 75 both the B40 and poor income groups. Since the inverse Pareto model was well fitted to the data of the poor income group, the Gini index could be computed using the formula derived from the inverse Pareto model as shown in Equation (14). For the B40 group, the Gini index could only be approximated using the empirical computation in Equation (13). Based on Table 4, the disparity of income in the B40 group is larger than that of the poor income group.

Conclusions
This study analyzes the behavior of the lower tail distribution for income data in Malaysia, particularly for citizens that belong to the lowest 40 percent household income (B40 group) and those with a very low income distributed below a measured poverty line income (poor group income). This study used an inverse Pareto model to approximate the power law behavior for the lower tail of data corresponding to both of the groups. A robust method, known as PITSE, was used for estimating the shape parameters of the inverse Pareto model. A comparison of the threshold value corresponds to the B40 group and the poor income group was performed using the K-S statistic method. The threshold value for the poor households group was better than the threshold value for the B40 group for the application of inverse Pareto modeling. Based on the fitted model, only the poor income group data provided good approximation of the inverse Pareto model. This result indicates power law behavior for the poor income group in Malaysia. For the B40 group, the inverse Pareto model failed to describe the distribution of its income data. Thus, information regarding the inequality measure, such as the Gini index and Lorenz, derived from the inverse Pareto model could only be used for the poor income group. For the B40 income group, the inequality measure for the Gini index and Lorenz fitting can only be approximated based on the empirical approach. Based on the fitted Lorenz curve for both income groups, nearly 80% of the total household income of the poor was owned by the poorest 80%, whereas the remaining was owned by the richest 20%. Thus, the 80/20 rule of the Pareto principle follows an inverse order when considering the income distribution of poor households. However, the issue of income inequality can also be controversial. For example, income inequality can be caused by something as simple as a doctor having more income than a driver. If both are paid the same, the citizen will be hesitant to become a doctor because of the imbalanced commitments corresponding to the income earned. If all of the citizens receive the same amount of income, the productivity of the nation will be negatively affected. However, the government should always monitor and play a role in controlling this concept of capitalism. The severity of income inequality will lead to economic, social and political problems. Increasing inequality will cause the majority to be dissatisfied with their lives compared to high-income groups. Therefore, the government should always put an effort to reduce the inequality of income as much as possible without causing people to lose motivation for productive growth.