CHAID Analysis to Determine Socioeconomic Variables that Explain Students' Academic Success

This study aims to determine students’ characteristics that predict their academic success. The study group consisted of 4229 students studying at middle schools in Burdur. The data were collected using a questionnaire in the 2014-2015 academic year and analyzed using CHAID (Chi-squared Automatic Interaction Detection) analysis, a type of decision-tree technique. The CHAID analysis was completed with 10 branches and 18 nodes, and indicated that students’ academic success (end-year success grade) was explained by certain factors including mothers’ and fathers’ educational status, going to an educational center, parents’ employment status and parents’ monthly income. The most important parental characteristic for a student’s end-year success grade was found to be the father’s educational status.


Introduction
With the increasing need for information in this changing world, it has become important for countries to train students with high qualifications.Failure has always been regarded as one of the most significant problems of education systems as it has a negative effect on meeting the need for qualified labor force while wasting the country's resources.Therefore, multilateral studies have been conducted around the world to minimize student failure at school and to determine the factors that affect their success.These studies indicate that students' success is affected by many factors and that achievement of educational objectives at the highest level depends on the inter compatibility of these factors.When even one factor is below acceptable levels, the educational process is negatively affected.Students' success depends not only on school-related factors such as teaching methods or educational sources or on individual characteristics such as gender, age or intelligence, but also on the socioeconomic status of student's family [1][2][3][4][5][6][7][8][9][10][11].
Although low success level does not always result from poor socioeconomic characteristics, the strong effect of poor socioeconomic characteristics on students' success has been emphasized since mid-1960s.Coleman et al., known for their studies on this subject, revealed that students' academic success is determined by their families' socioeconomic characteristics and factors related to the social environment out of school [8].Ichado [12] indicated that school performance is mostly affected by students' environment.Croll (2004) stated that the students with more advantages in terms of socioeconomic background obtain better results and continue their education for a longer time.Studies conducted by OECD also indicate that families' socioeconomic status and students' educational results are strongly related to one another and that students' success increases as families' socioeconomic status reaches to a better level [4,13].Students with a more advantageous background obtain 38 points more than others, which correspond to approximately one academic year [13].The effect of this situation on students' success varies depending on the country's level of development.The effect of families' socioeconomic characteristics on students' success increases and the effect of school decreases as the country's level of development increases.However, the lower the country's level of development, the lower the effect of the families' socioeconomic characteristics on students' success and the greater the effect of the school [14].Although one-fifth of the variety in students' success in OECD countries is related to their socioeconomic variables, there are great differences from country to country.For example, socioeconomic level explains 23% of students' academic success in Germany, while it explains only 12% of students' academic success in Japan.
Some of the studies analyzing the relationship between students' academic success and families' socioeconomic status indicate that parents' educational status and family income are the long-term determinants of academic success.Therefore, it is recommended that policies be developed which will affect these factors in order to increase academic success in the long-term [15].Harmon and Walker [16], Chevalier and Lanot [2], and Brunello and Checci [17] stated that although family income is closely related to academic success, families' education level is the most important factor in this subject.As families with higher educational status can provide better academic support [18] and economic and social resources to aid their children's success [19], a family's educational status is considered to be the most significant reason for the differences in students' success [2,3].PISA reports also indicated that students whose mothers are high school graduates are more successful than others.It was reported that in OECD countries, students whose parents are university graduates obtain better results than those whose parents are not university graduates, and this effect is quite distinctive in some countries [17].Kean and Tsai [7] also highlighted the significant effect of parental education level on students' academic success.
Some of the studies analyzing the relationship between socioeconomic characteristics and success indicate that family income is a strong and important determinant for students' success.For example, Blanden and Gregg [20] emphasized that family income affects students' educational gains.Chevalier and Lanot [2] reported that children of poor families show a lower success rate, and that the great differences in academic success arise from families' economic status.Tansel [21] stated that higher family income increases academic success at every level of education.According to Chiu [3], there is a strong relationship between a country's income distribution and students' success.In an analysis for the past achievements, students in rich countries had higher scores on science.Students in the countries with more equal income distribution are more successful than those in countries with unbalanced income distribution.Children of families with a good financial status are more successful than those of families with a poor financial status although the relationship between families' economic status and students' academic success may become more complex depending on countries' social policies.Students whose families were wealthier had higher grades than those whose families were poorer [26].
In addition to these factors, demographic factors such as family size, number of siblings and birth order also affect students' success.As the number of family members increases, the children may be provided with fewer opportunities at home and their educational success may worsen [5,23,24].It was found that academic success decreases as the number of siblings increases.The highest success is observed among students with only one sibling, and the lowest success is observed among students with seven or more siblings [5].This is also a determinant factor in access to education.Since the existing resources are divided among the children, a high number of siblings also negatively affect the opportunity to go to school.Birth order is another factor that affects students' success, particularly in developing countries where older children are expected to help their parents at and out of home and to contribute to the domestic economy when necessary.Therefore, there is a possibility for older children to be more successful and access a higher education level.Absence of one of the parents, the environment where the family lives, and the occupation of the family can also affect students' success [25].Astone and McLanahan (1991) reported that children living with a single parent or foster parents are provided with less support and receive less help in their school lives than children living with their parents [25].
Study results reveal that students' academic success can be predicted based on the socioeconomic variables in their life.However, it was demonstrated that this effect can vary depending on the level of education, social policies and development of a country and time [26,27].For example, socioeconomic status has a strong effect on students' success in Turkey compared to other countries with similar characteristics.In Turkey, 19% of the differences in students' success is directly explained by students' different economic and sociocultural status.The success gap between the students in the lowest quartile and in the highest quartile of the economic and sociocultural status index is greater in Turkey than in the OECD countries.While the success gap is 88 points on average in OECD countries, it is 92 points in Turkey although the difference between students' grades is smaller in Turkey than the difference between the grades of students in OECD countries.This implies that this difference is more strongly related to students' socioeconomic background in Turkey although the difference between student grades is less significant [13].Determination of familial characteristics that affect students' academic success in Turkey will provide data for micro-and macro-level policies to be developed to reduce inequality and increase quality in education.The literature shows many studies conducted in Turkey on this subject.These studies mainly analyze the relationship of students' success in language, mathematics and science with socioeconomic variables.They were usually conducted with the data of PISA, TIMMS, [4,29,30,32,33], the scores from exams such as Student Achievement Assessment Test (ÖBBS), Student Selection Examination (ÖSS), High School Entrance Examination (OKS), and Placement Test (SBS) [5,28,34,35] or the data obtained from students at the same age or grade [36].
The present study aims to determine the socioeconomic variables explaining students' academic success as indicated by end-year success grades.The end-year success grade is the arithmetic mean of a student's end-year grades of all courses in their curricula [37].This study did not limit students' success to a particular field.The study was conducted using data obtained from all students at all grades of middle school (5th, 6th, 7th and 8th).Using the CHAID analysis decision-tree method to analyze the effect of socioeconomic variables on success provides an advantage in that it indicates differences between the groups.The study CHAID Analysis to Determine Socioeconomic Variables that Explain Students' Academic Success was conducted in Burdur, a city known for its success rates in national exams.Therefore, this study is expected to be a guide for both educators and policy implementers and to contribute to the relevant literature.

Materials and Methods
This study aimed to determine the socioeconomic variables explaining students' academic success and employed a relational screening model.

Population and Sampling
The study data were obtained from 66 of the 78 middle schools in Burdur (data of 12 schools could not be accessed) in the 2014-2015 academic year, attended by the study group including 4229 students.Of the schools accessed in the study, 13 were in the city center, 23 were in the district center, and 30 were in villages.Of the schools that could not be accessed, 7 were in villages, and 5 were in city and district centers.The researchers assumed that the data that could not be accessed were randomly distributed since the villages that could not be accessed were located in different districts, and at least one school was included in the study sample.In addition, student populations of less than 50 in village schools that could not be accessed were considered not to change the study findings.Considering the fact that more than 90% of the schools in the city and district centers were accessed, and plenty of data were used in the study, the researchers decided to continue the study.During the study, the schools were divided into three categories according to the number of students: schools with less than 100 students, 100 to 200 students, and more than 200 students.Afterwards, one class from each grade in the schools with less than 100 students, two classes from each grade in the schools with 100 to 200 students, and three classes from each grade in schools with more than 200 students were selected randomly.Table 1 shows the distribution of schools and students according to type of locality.Table 1 indicates that the study included 930 students from villages, 1462 students from districts, and 1837 students from the city center of Burdur.The students who correctly answered all questions were taken into consideration, and the study was conducted with 3849 students.

Data Collection Tools and Process
The data were collected using a personal information questionnaire for students.The personal information questionnaire included questions on students' gender, family income, parents' education level and employment status, the people students lived with, number of siblings, whether they attended a training center or course, received private courses, the duration and price of the courses, and the place of residence.The questionnaires were administered by the researcher in the schools in the city center where the researcher explained the questionnaire.For the schools in villages and districts, the school administrators were informed by phone about the questionnaire.The questionnaires were left to be taken from the documents divisions that belong to these schools in the Directorate of National Education.School administrators were asked to administer the questionnaires to one class at each grade and return the completed questionnaires to the documents divisions that are related to their school.The researcher reminded the school administrators by phone in order to increase the rates of responses, but 12 schools did not respond.

Data Analysis
The variables that explain the end-year success grades of the students were determined using CHAID, a decision-tree data-mining method, based on the students' answers.The success grades were addressed as categorical dependent variables in which the categories are granted the certificate of merit and the certificate of appreciation, but not any certificate of achievement.
A decision-tree method offers a number of advantages over more commonly used statistical techniques [38].For instance, it is nonparametric and nonlinear.These features mean that missing data are not a problem; the analysis does not require normality and homogeneity assumptions of the data as they have a strong iteration algorithm.Linear relations between variables are neither assumed nor necessary [39].Regression analysis mostly uses intra-group variability while establishing the model.Therefore, it may be weak in reflecting the differences between groups.Assael (1970) reported that a decision-tree method can be used to overcome this problem.A decision tree is a predictive model that looks like a tree.Every branch of this tree is a category and indicates the test's result.Its leaves are parts of the dataset which belong to these categories and represent the classes [40].
Various methods exist to make categorization and segmentation in decision trees.Three algorithms can be used: CHAID, C&RT (Classification and Regression Trees) and QUEST (Quick, Unbiased, Efficient Statistical Tree).All of these algorithms are mainly used to outline the relationships between variables and statistically significant structures and divide the data sets into subgroups using decision rules [38,41].The CHAID's advantages are that its output is highly visual and easy to interpret with multiple trees [42].
In CHAID analysis, the categorical data are divided into subgroups and their effects on the dependent variable are tested.For example, the decision to reply or not to reply a mail is the dependent variable, and age and income variables are independent variables.The effect of individuals between the ages of 18 and 24 who have an income of more than X TL on the dependent variable is revealed [43].The CHAID algorithm was first defined by Kass (1980) for dependent variables at the level of categorization.However, it can be applied to continuous or discrete dependent and independent variables.In this analysis, the variable that best explains the dependent variable is selected by comparing all independent variables, and the dataset is categorized into subgroups according to the selected independent variable.These subgroups continue to generate more subgroups for significant variables [44], thus the relationships between the subgroups can also be seen.The categories are significantly merged for each explanatory variable, and the Bonferroni p values and X 2 statistics are calculated by creating contingency tables according to the dependent variable.The explanatory variables are compared with each other, and the data were separated into subgroups according to the categories of the explanatory variable with the lowest Bonferroni p value to find the best division for the explanatory variable.Selected explanatory variables were recompared, and the separation was performed according to the best explanatory variable.CHAID analysis shows the most significant explanatory variables and their interactions with the dependent variable via a tree diagram using the chi-square statistics, Bonferroni method and category merger algorithm [44].
The algorithm of decision-tree methods consists of three steps.The first step is to "create the tree" based on the principle of creating subgroups that are homogeneous at maximum level.Selection of the subtree that best explains the dependent variables among these subtrees resembles pruning a tree.The process is completed with the "selection of the most appropriate tree" after the pruning step [45].

Findings
Student characteristics that affect their end-year success grades (EYSG) were analyzed using CHAID analysis.First, the dependent and independent variables involved in the study to create the model were indicated, and then findings obtained from the CHAID analysis were shown.

Findings on the Study Model
Table 2 shows the dependent and independent variables involved in the study to create the model, as well as the findings obtained.

Findings on CHAID Analysis
In this study, the end-year success grades (EYSG) of the students were categorized as being granted the certificate of merit, the certificate of appreciation, but not any certificate of achievement.The students' characteristics that affect their EYSGs were analyzed using CHAID analysis.Figure 1 shows the CHAID analysis results.
Figure 1 shows that the analysis was completed with 10 branches and 18 nodes.Father's education level, going to any training center, study center or course, mother's employment, father's employment, mother's education level, and monthly income were found to explain the students' EYSGs.Among the family characteristics, fathers' education level had the most significant effect on the students' EYSGs ( 2 =350.324,p=.000, df=4).This variable generated three nodes (node 1, node 2, and node 3): illiterate/primary school/middle school, high school and university.
The rate of being granted the certificate of appreciation was 37% and the certificate of merit was 51.7% among the students whose fathers had an educational status of middle school or below.The rate of those who were not granted any certificate of achievement was 11.3%.Among the students whose fathers had a high school education, the rate of being granted the certificate of appreciation was reduced to 26.7%, the rate of being granted no certificate of achievement was reduced to 4.8%, and the rate of being granted the certificate of merit was reduced to 68.5%.The lowest rates of being granted the certificate of appreciation (9.9%) and being granted no certificate of achievement (1.9%) were found when students' fathers were university graduates.These students were granted the certificate of merit at the rate of 88.1%.The branches and the nodes on these branches are indicated separately and in detail.Figure 2 shows the nodes 1, 4, 5, 11 and 12 on the first, fourth and seventh branches and shows the nodes 2, 6, 7, 13, 15 and 16 on the second, fifth and eighth branches that emerged as a result of the CHAID analysis.
The first node in figure 2 shows that the variable that affected the EYSGs of the students whose fathers had a middle school education and below and who went to any training center, study center or course ( 2 =80.775, p=.000, df=2).While the rate of being granted the certificate of merit was 46.5% among the students who were not going to a training center, study center or course, this rate rose to 65.9% among those who went.For the students who were not going to a training center, study center or course, the rate of being granted the certificate of appreciation was 39.4% and being granted no certificate of achievement was 14.1%.For the students who were going to a training center, study center or course, the rate of being granted the certificate of appreciation fell to 30.4% and being granted no certificate of achievement fell to 3.6%.Figure 2 indicates that the CHAID analysis continued for the students who were not going to any training center, study center or course.This analysis implies that the EYSGs of the students who were not going to any training center, study center or course and whose fathers had an education level of middle school and below were affected by their mother's education level (  2 =14.767, p=0.019, df=2).This variable was divided into two nodes: middle school graduates (illiterate/primary school/middle school) at most and high school graduates (high school/university/other) at least.Node 11 indicates that among the students who were not going to any training center, study center or course, the rate of being granted the certificate of merit was 62.4% if their mothers had an education level of high school and above and 45.1% if their mothers had an education level of middle school and below.
The second node in Figure 2 shows that fathers' employment had the most significant effect on the EYSGs of the students whose fathers are high school graduates (  2 =23.84, p=0.00, df=2).The variable of fathers' employment was completed with two nodes: tradesman/self-employed/retired/civil servant and worker/unemployed/other.The rate of being granted the certificate of merit was 74.2%, being granted the certificate of appreciation was 20.7%, and being granted no certificate of achievement was 5.1% among the students whose fathers were tradesman, self-employed, retired or civil servant.While the rate of being granted the certificate of merit fell to 60.4%, being granted the certificate of appreciation rose to 35.3% as being granted no certificate of achievement (4.3%) among the students whose fathers were worker/unemployed.Families' monthly income had the most significant effect on the EYSGs of the students whose fathers were tradesman, civil servant, retired or self-employed and had high school education (  2 =16.076, p=0.001, df=2).Among these students, the rate of being granted the certificate of merit was 80.8% if their families' monthly income was higher than 2001 TL and 67.8% if their families' monthly income was lower than 2000 TL.Node 10 shows that going to any training center, study center or course affected the EYSGs of the students whose fathers were workers/unemployed or "other" ( 2 =23.84, p=0.00, df=2).While the rate of being granted the certificate of merit was 56.3% among the students who were not going to these kinds of institutions, it was 69.2% among the students who were.
Figure 3 shows the nodes 3, 8, 9, 10, 17, and 18 on the third, sixth and tenth branches that emerged as a result of the CHAID analysis.
Node 3 in the Figure 3 shows that mothers' employment had the most significant effect on the EYSGs of the students whose fathers were university graduates (x 2 =35.924, p= 0.00, df=4).This variable created three nodes: unemployed, worker/cleaning lady/retired, and civil servant/other.Among the students whose fathers were university graduates, the rate of being granted the certificate of merit was 85.9% if their mothers were unemployed, and 67.9% if their mothers were retired, workers or cleaning ladies.This rate was 94.2% among the students whose mothers were civil servants or had other jobs.
Node 8 shows that the EYSGs of the students whose fathers were university graduates and mothers were not working depended on the family's monthly income (x 2 = 10.731,p=0.014, df=2).The rate of being granted the certificate of merit was 88.4% among the students whose families' monthly income was 2001 TL and higher.This rate fell to 75.3% among the students whose families' monthly income was 2000 TL and lower.

Results and Discussion
The study results showed that that students' end-year success grades are affected by fathers' education level, mothers' education level, going to any training center or attending course, employment of mother or father, and family's monthly income.However, the most important parental characteristics for the end-year success grade of the students were found to be the father's education level.
According to the research, the father's education level had the most significant effect on the students' end-year grades.Fathers of the students who had the highest end-year success grades were at least university graduates, and fathers of those who had the lowest end-year success grades were secondary school graduates at most.Some studies consider family educational status as the most significant socioeconomic indicator for students' academic achievements [2,18,50,51,52,53,54,55,56].While some of these studies suggest that father's educational status is more significant, others consider the maternal educational status more important.In countries with high income levels, maternal educational status is deemed more significant [21].Like many studies conducted in Turkey, this study indicated that father's educational status is more important for academic success [4,23,32,46,47].Turkey's social structure is considered to influence this result.In Turkey, father's educational status is almost always higher than that of mothers [4].In addition, patriarchal family structures dominate many regions of Turkey.Fathers are the heads of families and make many decisions.The income of the family and how it is spent are determined by the father.It is fair to say that there is a correlation between paternal educational status and family income since the employment rate for women is only 26.7% in Turkey [49].Well-educated men generally have better jobs, earn more and marry women who are also well-educated.Thus, we can assume that the social status of Turkish families is determined by the characteristics of fathers.The environments where students study (at home or elsewhere) differ depending on the social status of their families.In addition, studies suggest that the higher the educational status of the head of a family is, the higher the educational expenses are.Heads of families who are university graduates spend three times as much money as those who are high school graduates.However, high school graduates spend twice as much as those who only complete middle school [50].Considering these factors, father's educational status is considered to be the primary factor in the academic success of Turkish students.
The CHAID analysis showed that going to any training center or attending a course that supports the student's education was the factor which explains the end-year grades of the students whose fathers are middle school graduates.Students who were attending a course had higher end-year grades than those who were not.The same variable also explained the end-year grades of the students whose fathers were high school graduates, unemployed or working at the minimum wage.In two groups, students who were going to such institutions had higher end-year grades than those who were not.Students whose fathers and mothers have low educational status receive limited academic support from their parents.It is fair to say that this support decreases or becomes of a poorer quality as students attend to further educational schools.Institutions such as training centers or courses are opened to support schools and to ensure the improvement of students on the subjects in which they are struggling.It can be concluded that these institutions provided the students with academic support that cannot be provided by their parents.They helped them increase their success.The literature includes some studies that support the present study indicating that students who go to training centers have a higher success than those who do not go.For example, Tomul and Savaşçı [36] stated that the most important variable in explaining academic achievement is attending a course and/or taking private lessons.Some studies argue that these institutions allow students to learn much more [58,59].
The students who were not going to any training center or attending a course, whose fathers were middle school graduates at most and whose mothers had a higher education level were observed to have higher end-year grades.Likewise, the PISA results show that the students whose mothers were at least high school graduates had more success than other students in all OECD countries [22].According to some studies, mothers' education level is an indicator of students' success or failure [60][61][62][63] and plays a more determinant role compared to fathers' education level [62].It is argued that mothers with a higher education level provide their children with the academic and social support in a better way, which is important for their educational success [18].They can be guides for their children in their lessons if need be [5].Therefore, mothers of the students who were not going to any training centers or courses and whose fathers had a lower education level might have allowed these children to receive academic support, which they could not receive from their fathers or the institutions such as training centers, from their mothers.Turkish traditions attributed the most important role to mothers in meeting all needs of children and preparing them for life from babyhood.Mothers spend more time with their children than fathers.This may be a reflection of cultural life and traditional structure.
The CHAID analysis indicated that the socioeconomic factor that explained the students' end-year grades and was significant for those whose fathers were high school graduates was father's employment status.The end-year grades explained mother's employment status for those whose fathers were university graduates.The students whose fathers were high school graduates and civil servant, retired or self-employed had higher end-year grades than those whose fathers were unemployed or workers employed for the minimum wage.The students whose fathers were university graduates and mothers were civil servants had higher end-year grades than those whose mothers were unemployed, cleaning lady or working for the minimum wage.These findings are in parallel with the study results of Hoffman and Youngblade [64].Hoffman and Youngblade [64] reported that poorly paid, stressful jobs with long hours can jeopardize the quality of parenting by their demands on parents' time, energy, and attention [18].Similarly, Memon et al. [65] suggested that a significant relationship is present between parents' occupational status and academic achievement of students.Dinçer and Kolaşin [4] set forth the parent's employment as a determinant for academic success.Students whose fathers were employed obtained 11 to 14 more points, and whose mothers were employed obtained 13 to 18 more points on all tests.Ainley et al. [66] and Zabulionis [62] also reported that students whose parents were employed obtained higher grades.These findings imply that if parents' occupation is based on unqualified labor or yields a low income, this reduces the possibility for the students to gain academic success.In other words, even if the parents are employed, insufficient monthly income might have prevented the parents from organizing qualified educational environments for their children.It can be concluded that parents' employment is not sufficient alone for students' success; they should work to generate regular and permanent income.Thus, another finding of the research indicates that the socioeconomic factor that explains the end-year grades of students whose fathers were high school graduates and civil servant, retired or self-employed, and students whose fathers were university graduates and mothers were unemployed is the family's income.The students whose families' average monthly income was higher than 2000 TL had higher end-year grades than those whose families' average monthly income was lower than 2000 TL.High family income increases the possibility of making more investments for students' education.This factor allows parents to organize their children's educational environments and to provide them with different opportunities.A similar study conducted by Lacour and Tissington [67] found that poverty directly affects academic achievement due to the lack of resources available for students' success.In addition, financial problems increase conflicts within the family and prevent parents attaching the required importance to their children's education [2].Considering these findings, it is fair to say that students' success increases as family income increases.The literature also indicates that students whose families have higher monthly income are more successful [5,31,48,49,65,69].Some studies reported that a permanent and high income is important.This income enables both female and male students to have higher academic success at all levels of education [21].However, Chevalier and Lanot [2] stated that there was no clear finding indicating that family income affects students' educational outcomes.In some OECD countries, financial status does not pose an obstacle for success.For example, students whose families have a very poor financial status may show high success in Finland or Japan [22].
In conclusion, the present study revealed that parents have a significant role in the determination of children's educational identity, and this role has a broad area of influence.However, family background such as income, social class and residence varies depending on the family's education level.Therefore, parents' education level can be regarded as the long-term determinant of academic success.Policies that affect these factors should be developed to increase academic success in the long term.

Figure 2 .
Figure 2. Variables Explaining the EYSG of the Students Whose Fathers' Education Level is below High School

Figure 3 .
Figure 3. Variables Affecting the EYSG of the Students Whose Fathers are University Graduates

Table 1 .
Students According to Type of Locality

Table 2 .
Dependent and Independent Variables Involved in the Study to Create the Model.
*Illiterate: i.e. someone who has no school experience, thus unable read and write.