Analyzing Returns and Pattern of Financial Data Using Log-linear Modeling

Technical analysis is useful for forecasting the price movement through the analysis of historic data. This sort of movement has Turn of the year effect also and useful for short term prediction. If the direction of price of two or more assets is same, it becomes necessary to analyze the returns also. We first use optimal band to predict the direction of price and create a contingency table of the data to analyze the pattern (movement) against returns. We use log-linear modeling for the analysis of the contingency table. We next include the volume of transactions as one more variable in the contingency table. The table consisting of three variables, Pattern, Returns and Volume is further analyzed by using log-linear modeling. We test various hypotheses of association for these variables by using Chi-square test for contingency tables.


Introduction
Stock market always attracts the investors to invest money according to their choice from which large profits can be earned. Fundamental driver behind maximizing profit is the strategy of buying and selling of the stocks. The buying and selling behaviour of investors is also affected by Turn of the year [Ritter 1988]. It is well documented that turn of the year, the average ratio of buying and selling, is more in first 9 days of January than mid January to mid December and last 9 days of December. Rozeff and Kinney (1976) also gave explanation about the January effect that the average of returns of stocks is higher in January month than in other months. There are number of articles available to discuses the Turn of the year effect. Jay R. Ritter (1988) proposed a theory based on the tax-loss-selling named, "parking-the proceeds" to explain the Turn of the Year effect on the NYSE daily returns from 17 Dec 1970 to 16 Dec 1985 using t-statistic. Barber and Odean (2008) tested the hypothesis based on attention grabbing stocks. These statistical tests confirm that the behaviour of individuals and institutions differ while buying and selling the stocks.
There are several technical indicators proposed by researchers and financial experts for the prediction of pattern. Some of these indicators are Bollinger Band [Bollinger (2001)], Moving Average, Moving Average Convergence/ Divergence, Relative Strength Index, Confidence Index, [Hoque and Gias (2009)] and Optimal Band [Vijay and Paul (2015)], to predict this buying and selling behaviour of stocks.
Most of these indicators are based on the past returns, their moments and/ or volume of transactions. For the short term investors/ traders, this analysis is important to make the decision of their investments. However, if the indicators exhibit similar pattern for two or more stocks, the decision is made on the basis of return and its association with pattern. We, here, classify the historic data as per their pattern by using optimal band [Vijay and Paul (2015)]. For each of the categories of pattern, we further divide the whole data into different categories of returns. If the interest lies in the classification of pattern then historic values of returns are used to predict the same but if one is interested in forecasting the returns then the historic value of pattern becomes more useful [Vijay and Paul (2015)]. Therefore, it becomes important to analyze the strength of dependence between the two variables, returns and pattern.
First, we use the historic data to see the buying and selling pattern by using the optimal band [Vijay and Paul (2015)]. The pattern data is then divided into three categories, namely, Sell(Y S ), Neutral(Y N ) and Buy(Y B ). This is further used to estimate the future category of returns, High, Moderate and Low. The whole data is then presented in the form of a 2dimensional contingency table by using the variables, returns and pattern. Note that each of these variables has three categories. In technical analysis, one of the fundamental drivers is volume of transactions. We include the volume as third variable with its two categories, namely Up and Down. This division of volume is primarily based on the range of historic returns. This creates a 3-dimensional contingency table. A partial table is the cross-classification of two of these three variables for fixed level of the remaining one [Kateri (2014)]. Thus, there are two possible sets of partial tables corresponding to the variable volume, we test different hypotheses for these tables.
The hypotheses for 3-dimensional contingency table are  based upon: 1. Association between buying and high-return under upvolume and down-volume.
2. Relation between selling and high-return/ low-return under up-volume and down-volume.
3. Relation between neutral and all categories of returns under up and down volume.
A first sensible assumption is that the association between pattern and returns exhibits a linear trend. The linear trend is measured by Pearson's correlation coefficient, defined through their categories [Anderson (1996)]. For the purpose of testing of hypotheses, we use observed frequency and expected frequency to find the test statistic Z(H) under hypothesis (H). It is known that Z(H) follows Chi-square distribution with degree of freedom equal to the number of unconstrained log linear model parameters which are set to zero under H and, at a particular level of significance [Boulesteix (2006)]. All the hypotheses of independence or conditional independence can be equivalently represented in terms of interaction parameters of a log-linear model. Log linear modeling is a widely used method for the analysis of a contingency table.
Parameters of log linear model describe the interaction/ association among two or more variables. One of the advantages of using log-linear model is that it goes beyond a single summary statistics and specify how the cell counts depend on the levels of categorical variables. They model the association and interaction pattern among categorical variables. These are appropriate when there is no clear distinction between response and explanatory variables, or there are more than two responses [Vellaisamy and Vijay (2007)]. If any hypothesis of independence is accepted then the interaction parameters can be assumed to be zero. If the hypotheses of independence is rejected, the values of these interaction parameters help in analyzing the influence of different categories of variables. The log-linear modelling, therefore, helps us identifying the level of a variable which has strong influence on another variable. Hence, this approach is not only useful for prediction of pattern but also deals with its association with other variables. We demonstrate the process of classification of the data in the form of a contingency table. Various hypotheses are tested by using χ 2 test of independence/ conditional independence. The association, if exists, is described by the parameters of log-linear model. The structure of the paper is given below: Section-2 deals with formation of contingency tables by using trading band approach. Section-3 presents, briefly, the log-linear modeling for 2 and 3-dimensional contingency tables. Various hypotheses of association are also represented in terms of interaction parameters. Analysis of contingency tables is shown in Section-4. Conclusion and future aspects are presented in Section-5.

Contingency table for Returns, Pattern and Volume of transactions
Consider the series X 1 , X 2 , ..., X n of returns of a stock. We define the process of construction of a contingency table for pattern and returns of the series. We first use optimal band [Vijay and Paul (2015)] to divide the data into three categories of pattern, namely, Sell, Neutral and Buy.
Once divided, the cardinality of each of these subsets of the time series data will represent the count of each category of pattern. We further divide, for each categories of pattern, these subsets into subsets corresponding to the returns, that is, High, Moderate and Low. We use the following algorithm to construct a 2-dimensional contingency table.
The parameters a, b, c and d are obtained in the following step-3.
Step-3 We obtain the parameters a, b, c and d by solving the optimization problem.

M ax
Step-4 We next define, for 1 ≤ i ≤ n -4, Let us now denote by Y S , Y N and Y B , the subsets corresponding to the categories Sell, Neutral and Buy of pattern respectively. We have the following rule: The total cell count corresponding to the Sell, Neutral and Buy are given by the cardinality of the sets Y S , Y N and Y B .
Computational Research 3(1): 1-7, 2016 3 Next, we divide each of the subsets Y S , Y N and Y B into High, Moderate and Low returns.
Step-1 Consider the set Y B , and denote the maximum, minimum and average values of the set Y B by Y B max , Y B min and Y B Ave respectively. Define the intervals Step-2 The classification is defined by the following rule: Let y ∈ Y B , then Here, Y BH , Y BM and Y BL are subsets of Y B corresponding to the categories High, Moderate and Low of returns.
Similarly, we obtain the subsets The 2-dimensional contingency table for the variables pattern and returns is formed by the counts given by cardinality of these subsets.
The table is represented above We next present a concrete example. Example: We consider the Maruti Sazuki Co. daily returns data from 23 Nov 2007 to 23 Nov 2009. The total data points are n = 478.
Step Step-2 We now find a linear function f defined as The initial values of the parameters are chosen as Note that, different initial values of parameters may give different estimates but the function's value remains unchanged.
Step-4 We define the following bands: Now, we divide the data for each of the categories of pattern into the categories of returns (High, Moderate and Low). As an example, we use the following criteria for the set Y B .
Step Step-2 The total number of data point in categories Y BH , Y BM and Y BL are given by In the similar way, we divide the data corresponding to Y S and Y N to obtain the following 2-dimensional Table-2. For short-term investors, another key factor is volume of the transactions. We next include the volume of transaction as third variable to construct a 3-dimensional contingency table. Each subset corresponding to the categories of Pattern and Returns is further divided into Up and Down categories of volume. We define a constant q ∈ (0,1) such that the data given in set, for example, Y BH is divided into two categories of volume by the following relation q * max(vol) < U p V olume max(vol) min(vol) Down V olume q * max(vol).
Here, q depends on the volume data. The two categories of data denoted by Y BH U and Y BH D are corresponding to Up and Down volume of transaction.
For each of the two categories of volume, we use the process given above to obtain the 3-dimensional contingency table. With value of q = 0.4, the range of volume of transaction is given by. Using the above rule, we get the following 3-dimensional table.
Note that if the above 3-dimensional table is marginalized over third variable, that is, volume, we obtain the Table-2. We next give a brief description of log-linear modeling for analysis of contingency table. Also, we present a class of hypotheses for the log-linear models.

Log linear modeling
Log-linear modeling for 2-dimensional table: Loglinear model for a 2-dimensional table describes association between two categorical variables. A log-linear model expresses the cell counts depending on levels of the two categorical variables.
Here, λ AB ij is interaction effect of variables A and B, λ A i (λ B j ) is main effect of A (B) and λ is overall effect. All these parameters satisfy the following constraints(Vellaisamy and Vijay (2007)).  These parameters are estimated by using their maximum likelihood estimators given below.
To know more about log linear modeling and effect of levels of variables, see, Vijay (2011).
Log-linear modeling for 3-dimensional table: Log linear model for 3-dimensional contingency table is straight forward extension of (1), and is given by (2) Maximum likelihood estimates of the parameters are also defined similarly. The above model is useful in explaining several interaction effects [Vellaisamy and Vijay (2007)], for example, that is, A and B are independent given C. Similarly, All the hypotheses of independece/conditional independence for a 3-dimensional table are presented below-  We consider 2-dimensional and 3-dimensional contingency tables for analysis. Chi-square test statistic is used to test whether a set of log-linear parameters is zero or equivalently, to test the hypotheses of independence. If any of the hypothesis of independence or conditional independence is rejected, the parameters will be used to analyze the influence of the categories of variables.

Analysis of 2-dimensional table
The 2-dimensional table of pattern and returns is created for daily returns of Maruti Sazuki Co, using the process given in Section-3. We use the closing price of stocks of the company from 23 Nov 2007 to 23 Nov 2009 [ Table- 2]. We test the hypothesis of no association for this table by using The Z(H) value is 182.634 which is greater than χ 2 0.95 with degree of freedom (4). This leads to rejection of hypothesis of independence. That is, the variables A(Pattern) and B(Returns) are dependent on each other [Anderson (1996) pp 27-28]. We now obtain the maximum likelihood estimates of log linear parameters. The main effect parameters are given in Table-4 Table-6.

and 5. The interaction parameter is presented in
-0.1317 0.1367 -0.0051 Note that the parameterλ AB 1j andλ AB 2j attain the maximum value for j=3, that is, corresponding to Low returns. This implies that when the data exhibit selling pattern, there are more chances that the next return will be low in comparison to Medium and High. On the other hand,λ AB 3j is maximum for j= 1 which shows that when the buying pattern is exhibited, there are more chances of the next return to be high.
The main effectλ A i shows that the stock remain Neutral most of the time.λ B j shows that the returns of the stock is maintained at Moderate returns. Similarly, one can interpret the other values of interaction parameters.

Analysis of 3-dimensional table
We consider the returns for two different periods.

The analysis for returns for the period 23 Nov 2007 to 23 Nov 2009
We now divide the data given in above table as per the intensity of volume, that is, Up and Down. This creates a 3dimensional Table-3 for volume of transactions (Up, Down), pattern and returns. We test the seven hypotheses, given in Section-3, by using the test statistic The following table presents the value of test statistic and standard chi-square value to test it at 95% level of significance.

Data between Jan 2009-May 2009
Again, we take same company data but different period. We now create a similar 3-dimensional table for the period of Jan 2009 to May 2009. The contingency table is given below: 0.1609 0.0248 -0.1857 Once again the tables of parameters exhibit similar pattern, that is, when the pattern is Sell, there are more chances of return being low. For this data we get similar results.

Conclusion and future aspect
We analyze the relationships among pattern, returns and volume of transactions of stock market data. The data is presented in the from of contingency tables. These tables are analyzed by using log-linear modeling and the hypotheses of interactions are tested by using chi-square test statistic. We use these tables for further analysis to see the strength of relationship among the variables by using the maximum likelihood estimates of various parameters of interaction. The Maruti-Sazuki Co. stock data clearly shows that pattern and returns are independent of the volume of transactions. Also, the log-linear model parameters show that the influence of categories of each of these variables will not depend upon the categories of other variables uniformly. The selling pattern and low value of next day return have more correlation than the other categories. Also, this analysis is not affected by Turn of the year effect.
One can similarly include more variables to analyze the multi-dimensional contingency tables. The construction of categories can also be defined by using other technical indicators, such as relative strength index, principal volume oscillator etc. These tables may be further expanded to include more categories of each of the variables, for example, we have included two categories of volume and similarly three categories of other two variables.