Utilizing Data Mining and Factor Analysis for Identifying Activity Base Costing Cost Drivers in Iranian Bank

Most of firms are interested to know about the real cost of products and services they provide. In order to measure the real cost, several methods and techniques have been provided in literature. One of the most important and prominent methods is the Activity Based Costing (ABC) method. Determination of cost drivers in ABC is regarded as a very difficult and tiresome task. Although various methods have been introduced for this purpose but each method suffers from certain disadvantages. This paper introduces the Data Mining (DM) and Factor Analysis (FA) methods to improve the effectiveness of the ABC method. The proposed combined methods let decision makers to consider several cost drivers respect to their effects on costing, simultaneously. In 2010, we have applied the proposed method in Export Development Bank of Iran: EDBI (one of the Iranian leading banks). The results confirmed the usefulness and effectiveness of the proposed methods for bank ABC system and its economic decisions.


Introduction
All the methods used for identifying, exploration and determination of the products and services costs are called costing (Babad and Balachandran, 1993). Each of the developed methods in this context meets specific part of organizational requirements in a way that choosing a product of service costing methods depends on the management information requirements and their ultimate objective. However, what's so important these days is that organizations need an approach which in addition to simplicity and inexpensiveness provides the most information regarding the processes, activities and services for managers. Therefore, the traditional costing methods seem inappropriate (Kaplan and Anderson, 2007).
Activity-Based Costing (ABC) is a method which in addition to the above mentioned benefits meets a greater range of managers' requirements. The main idea behind ABC is that a cost object (e.g. products, services) is generated by activities which require and consume resources, for instances, labor and equipment (Demeere and Stouthuysen, 2009).
According to Lievens et al.' opinion (2003), an ABC system is a two stage cost allocation process. In the first stage, organizational activities are determined and the indirect costs are allocated to activity centers. This stage of cost allocation process is carried out using resource Cost Drivers (C.Ds) which represent the activity's usage of indirect cost. In the second stage, the costs in each activity center are assigned to cost objects using activity C.Ds. Therefore, the activity cost consumption for each product or service is determined by activity C.Ds through the second stage. These steps provide a great deal of information relating to the processes and activities in addition to the cost of products and services (Schniederjans and Garvin, 1997).
Under finding of Bhimani and Pigott (1992), Cooper and Kaplan (1992), Gordon and Sivester (1999), Innes (1990), and Turney and Anderson (1989), many potential advantages of the ABC systems include: (1) helping to determine non value-added activities, (2) improving the ability of managers to make pricing, production, and investment decisions through the provision of more accurate product and process costs, and (3) provision of the required conditions in order to improve the market share and better control and monitoring over the costs. Despite of ABC considerable advantages and power in providing the required information for managers, it suffers from certain limitations that may cause significant deviances in the obtained results. One of the limitations lies in choosing the cost driver. Inaccurate selection of the cost driver may lead to confusing results. Therefore, in this paper, in addition to the introduction of the Data Mining (DM) method, application of the Factor Analysis (FA) for cost driver determination has been described 2 Utilizing Data Mining and Factor Analysis for Identifying Activity Base Costing Cost Drivers in Iranian Bank using an example.

Cost Driver Determination Methods
Cost driver is a measurable and logic variable which can be used to determine the amount of the resource consumption by each activity and the activity consumption by cost objects (Zafar, 1998). Actually, what is important in every costing method is the selection of the cost driver. Since in contrast to the traditional costing methods, multiple cost drivers are applied in two stages in ABC and this leads to more accurate and specific allocation of the costs. Thus, proper selection of the cost drivers has a significant impact on the model's reliability (Khademi and Aliheidari Bioki, 2009). Several methods for cost driver determination have been proposed. In 2001, Prof. Kocakulah at South America Indiana University made a comparison between business loan profitability of a bank using ABC and traditional costing method. It should be noted that in this paper, cost driver determination method is of a great importance which in this paper has been examined using the T-test correlation regression coefficient (Kocakulah and iekmann, 2001). In 1993, Blanchandar used the Greedy Algorithms to optimize the cost driver determination process (Blanchandar,1993). In paper of Schniederjans and Garvin notice that one year later, Dekin used the logical relationships for this purpose and in 1997, Schniederjans applied the AHP method for the optimized selection of the cost driver (Schniederjans and Garvin, 1997).

Cost Driver and Data Mining
DM is the process of deducing the unknown, exact and potentially useful information out of the data. DM is considered as a set of techniques used for information identification or decision making knowledge from the pieces of data in a manner that this information can be used in decision making, forecasting, prediction or estimation contexts (Giudici, 2003).
According to the researches, DM is related to costing methods from two perspectives. First, in which stages of costing methods can we use DM techniques? Second, how obtained results from costing method can be used in DM projects?
In 2003, Kim and Han developed an article with the title of "Application of the Genetic Algorithm and Neural Networks in Activity Based Costing". The main focus of the paper was on the optimum selection of the cost driver using the Genetic Algorithm and considering a nonlinear cost function using the Neural Networks (Kim and Han, 2003).
With respect to the second question, application of costing methods in the field of DM, a limited number of researches have been conducted that some of them propose a costing model for DM projects. Masand, & Shapiro (1996), Domingos (1998), Boehm (2000), and Putnam (2000) tried to measure the cost of DM activity and its profitability in areas such as CRM but, none of them proposed a model for cost determination prior to the launch of the data mining project. In 2008, Menasalvas addressed this issue. In their proposed model, a stage was added to the DM process. Using this model and at the business definition stage, the total process cost is estimated and in the case of feasibility, the associated budget is allocated (Menasalvas, 2008) Reviewing on other studies shows that there are several costing methods such as logical method, Cause-and-Effect method, Analytical Hierarchy Process (AHP), Delphi and Regression methods. In this paper, use of the DM and FA processes as a method for cost driver determination has been studied by a case.

Factor Analysis
FA is a generic name that is predicated to a set of multi-variable statistical techniques with the goal of defining the hidden structure within data. Generally, factor analysis addresses the analysis of the correlation structure among a number of variables through the definition of a set of common hidden elements called factors. Factor analysis enables the researcher first to identify the independent structure factors and then determine a justifiable limit for each variable. Thus, the factors and related descriptions for each variable are determined and then, the initial applications of the factor analysis i.e. abstraction and data volume reduction are obtained. In data abstraction, the hidden factors are revealed and then, these factors are interpreted and subsequently, data are described through fewer numbers of variables. Data volume reduction is accomplished through the score calculation for each hidden factor and replacing it with the initial variables (Thompson, 2004).
A century ago, Carlos Spearman published an article about the factor analysis in an American Psychological Magazine. Since the current mathematical factor analysis methods assume one, two or more factors with predetermined structure, until mid 90's, interpretation of the factors was conducted smoothly. The main purpose of the FA was to validate the obtained structure through the experimental solutions and quantitative analysis of the tests. The only multiple factor analysis proposed in 1935 by Tartson made it possible to derive the predetermined factor structures. However, all of the multiple FA methods such as Centroid method are complex and time consuming methods and in some cases result in different factor structures which in turn lead to a wave of despair especially in efforts made for meaningful interpretations of the factors. Thus, this facilitates the articulation of more practical and easier ideas (Lee Daniel, 2000).
Failure in factor interpretation even led to "Functionalism" in some ideas and theories such as Tartson's who considered the factor analysis as a scientific approach for the validation of the process nature related hypotheses (Lee Daniel, 2000).
Around 1950, reputation of the FA was indemnified by its proponents due to three wrong (erroneous) assumption devised by some authors. First it seemed that FA has been considered something more than just an ordinary statistical approach. Secondly, they considered the problems as abstract and implicit issues even if there were more appropriate experimental approaches to the problem. Ultimately, the set of variables was very large and extensive. Therefore, their efforts were focused on to extend its application far beyond its ability. Hotlin in his principal combinations showed that this deadlock enabled the calculation of the orthogonal factors unit matrix. Although this method requires a demanding mathematical calculation, but development of the computer systems may resolve this problem and soon, many researchers were attracted towards it (Harman, 1967). The emergence of this method was a renaissance for the FA method and was recognized as a structured research method in almost every science fields.

Data Mining for Determining Cost Driver
Nowadays, one of the major issues that businesses are facing with is the rapid data generation. Explosive growth of the stored data is the major reason that necessitates the need for novel technologies and automated tools in order to transform a great amount of data into knowledge and information (Khademi and Aliheidari Bioki, 2009). Data mining as a proper solution to this problem is an automated process to extract patterns that reflect the tacit knowledge hidden in data warehouses, large databases and other enormous data repositories (Hand et al, 2001). Different steps have been proposed by researchers and DM is considered as an entire process in all of them. This process has been summarized in six major steps: 1) Business problem recognition, 2) data understanding, 3) data pre-processing, 4) modeling, 5) model evaluation, and 6) model interpretation and expansion (Hand et al, 2001).
A research conducted in 2008 at Export Development Bank of Iran (EDBI) titled as "Design of a banking service costing structure based on a DM based ABC method". EDBI was established on July 10th 1991, and subsequently initiated its activities in September 1992. EDBI acts as Iran's Ex-Im bank and plays a pivotal role in providing financing facilities and banking services to Iranian exporters and the buyers of the goods and services of Iranian origin. The algorithm is illustrated in Figure 1 has been applied for cost driver determination (Khademi and Aliheidari Bioki, 2009). This algorithm has been articulated based on the six-step DM process.
In this paper, in addition to introducing the application of DM in ABC method, it's been tried to apply the FA method for cost driver determination.

Factor Analysis for Determining Cost Driver
One of the developments of the FA is the factor matrix. It's also called the factor pattern matrix. This matrix preserves the factor loadings. Factor loadings are the correlation between factors and variables which is expressed by Formula 1: In formula 1, ij W denotes the weight of j th variable at i th factor; j S denotes the variation of the j th variable and i λ represents the specific value of the ith factor. If the result of the formula is a high score, it means that there's a significant correlation between the factor and variable. Loadings are used when there's not a significant correlation between factors and variables (Harman,1967). The fundamental assumptions in FA are of the conceptual nature than statistical nature. From the statistical point of view, deviation from normality is allowed to a point that the observed correlations are reduced. Normality is valid through the statistical test being significant, but these tests are not practiced typically.
In addition to the statistical base for the correlation of the data matrix, the researcher must make sure that there's a significant correlation for data matrix in order to determine the ability of the factor analysis. If through the assessment, there are no significant correlations greater than 0.3 then the FA may prove inefficient.
There are various criteria to determine the number of the extracted factors. Eigen value criterion, variance 4 Utilizing Data Mining and Factor Analysis for Identifying Activity Base Costing Cost Drivers in Iranian Bank criterion and Scree Plot may prove usefulness in determining the number of factors. This criterion is the simplest method for determining the number of factors. Most researchers including Keiser used the Eigen value of 1 as a basis for the factors number determination. The variance criterion is also another topic for determining the number of factors. In this method, the cumulative variance percentage is the basis of the decision-making.
A summarized process of determining the cost driver using FA is show in Figure 2. As shown in Figure 2, first the original variables are determined as the cost driver by the researcher. Then, the required data are collected from the organizational data bases. Before we start the FA, FA validation tests must be conducted. In this paper, it's been shown that this technique is suitable and effective for determining the cost factor. In factor extraction step, it's shown that cost driver costs effects are significant and effects of all variables must be considered for the determination of the cost driver. At the final step, the sum of the factor scores is calculated as the ultimate cost score.

A conceptual model for ABC in bank
As described earlier, ABC method applied the cost driver through two separate steps. One for calculating the activity cost, and another for the calculation of the cost group. In the first step, the cost driver is chosen and in the second step, the activity cost driver is determined. In this paper, for development and implementation of the proposed model, data of Export Development Bank of Iran have been used. The proposed model for baking service costing as an experiment is illustrated in Figure 3 (Khademi and Aliheidari Bioki, 2009): In the model above, allocation of the costs from resources to services is carried out in two steps and then, service cost at each branch is calculated individually.

Data Analysis and Findings
Currently, at Export Development Bank of Iran (EDBI), DM process is used in order to determine some of the cost drivers described in Figure 3 such as cost of the information plan. The results obtained from the implementation of the DM process in this bank has shown that using this method it's possible to profit from the massive data and information stored in the organizational data bases (Khademi and Aliheidari Bioki, 2009). In our study, according to expert opinion and result of regression model, there are linear relationships between dependant and independent variables so that cost drivers are considered as independent variable and cost group is considered as dependant variable. The results from the cost driver selection process are shown in Table 1. In Table 1, two cost driver is chosen (Total of Records ( r T ), Total of Cash Records ( cr T ). Total of records are every transaction which is occurred in every accounting document and total of cash records are only cash transaction which is occurred in every accounting document. Now, in this paper it's been tried to consider the overall effects of the initial selected cost drivers for the costing purpose in addition to the introduction of the FA method. In this section, results of the implementation of the FA method for determining the level 1 cost drivers (Hardware and software cost and cost of branches) are provided and for one of the cost groups are described in four steps.
Step 1: Selection of the primal variables The first step in FA method is to select the primal variables (primal cost drivers). In this step, primal variables have been chosen based on expert's viewpoints. These variables are provided at Table 2 for each cost group. Data used for the FA are 2007 which have been retrieved from the Oracle data bases using the SQL Server software and have been analyzed using the SPSS software for each branch and monthly intervals.
Step 2: Suitability test for FA application For the suitability of the data used by the FA, KMO and Bartlett test have been utilized. The obtained results for each testing method and each cost group are provided in Table 3. .000 As shown in Table 3, the values for the KMO test are 0.775 and 0.872 respectively and since these values are greater than the proposed value (0.5), hence, use of FA as a size reduction technique is allowed. Also, result of the Bartlett test proves a significant correlation among the variables.
Step 3: Factor derivation In this step, the Scree Plot criterion and Eigen value greater than 1 criterion have been used for determining the number of the factor derivations. As shown in Figure 4, number of the selected factors for two factor groups is 1. However, one cannot certainly determine the number of the factor score for sure. But it's possible to use other criteria such as explained variance or Eigen value. Since there's need to derive one factor for each two cost groups, the other two criteria will not be discussed. It means that derivation of a factor for each group according to the descriptive variance and Eigen value in Table 4 are   6 Utilizing Data Mining and Factor Analysis for Identifying Activity Base Costing Cost Drivers in Iranian Bank definite. Table 4 shows that one factor can properly explain 97.002 and 96.08 percent of the variance for all variables.

Component Number
Step 4: Application of the factor scores When the objective of FA is to replace the set of variables with a new but limited set of variables for further analyses, then it's essential to calculate the same values or factor scores and replace the original values with these new scores. Various statistical soft ware's such as SPSS is useful for calculating and applying the factor scores in analyses. In this paper, since the objective of the factor analysis is to replace the original variables with a new set of original variables combination; factor scores will be calculated and stores in Excel software for further calculations of the activity cost as the cost driver. Table 5 provides a part of the output and input of FA process.
According to expert opinions, this proposed method has some advantages. Because of using a lot of data which is related to cost group, this method can be better than traditional methods for selecting cost drivers. We presented the result of this study to more than 20 experts and ask them to compare reliability of the proposed model to traditional method. They confirmed that the result of proposed method outperforms the traditional method. The other important advantages of the proposed method are having effect of several cost drivers simultaneously, and its ability of having several scenarios in short time with low cost.

Conclusions and Final Remarks
What's so important at every costing method is to determine and select proper and effective cost drivers. Since in the ABC, in contrast to traditional costing methods, several cost drivers are applied at two steps, this results in more accurate allocation of the costs to activities. Therefore, proper selection of the cost drivers significantly affects the reliability of the model. Because cost allocation plays an important role in organizational decision making, determination of the cost drivers through the proposed method is also very important in a way that in attention to this problem may cause significant deviations in ABC results.
In this paper, two methods of cost driver determination have been introduced. First, DM process that is focused on optimum use of raw data collected from daily operations and stored in data bases of a bank and using the hidden knowledge within this data may help to improve the results of the ABC method. Additionally, in this method, one or two ultimate drivers are selected among these original cost drivers while; the FA method has been introduce to highlight the importance of effects of the total cost drivers on decision making. It means that selection of the cost driver through the proposed method enables us to consider several cost drivers with maximum effects for costing purpose. The advantage of the proposed method over traditional methods is that the effects of multiple cost drivers are considered simultaneously. According to the application scope of the multivariable techniques and on the other hand, need for improvement of ABC method and also the commitment for determining the actual cost at firms and organizations may lead to opportunities for further research and development.