Investigating the Predictability of Academic Participation and Performance on First-Year Retention of College Freshmen

First-year retention of freshmen has been one of the most important indicators contributing to university education quality. This study seeks to investigate the specific predictive capacity of academic integration on college students’ first-year retention for the purpose of understanding the specific probability impact and predictability of academic participation and performance by applying the statistical models to analyze the freshmen data. We analyzed data of a cohort consisting of 738 full-time freshmen to examine the predictability of students’ academic participation and performance on their first-year retention at a Midwest state university. The academic factors including students’ participation and performance were combined with the other factors of students’ demographics and financial aid as the predictive variables on first-year retention in a logistic regression model. Results of the logistic regression show that students’ academic participation indicated by credit hours completed and academic performance measured by cumulative GPA significantly predict first-year retention. As a student’s earned credit hour per semester increases by one, the probability of returning to the university increases by 6.6%. An increase of 0.1 GPA for a student increases the odds of returning by 2.9%. These two predictor variables explain 41.4% of outcome variation of first-year retention. The specific probability prediction revealed by this study offers university administrators and faculty more measurable and accurate information to their decision makings for improving freshman student retention. Other implications were also discussed including the suggestion that broad efforts be enhanced in creating a more supportive environment that focuses primarily on student academic success.


Abstract
First-year retention of freshmen has been one of the most important indicators contributing to university education quality. This study seeks to investigate the specific predictive capacity of academic integration on college students' first-year retention for the purpose of understanding the specific probability impact and predictability of academic participation and performance by applying the statistical models to analyze the freshmen data. We analyzed data of a cohort consisting of 738 full-time freshmen to examine the predictability of students' academic participation and performance on their first-year retention at a Midwest state university. The academic factors including students' participation and performance were combined with the other factors of students' demographics and financial aid as the predictive variables on first-year retention in a logistic regression model. Results of the logistic regression show that students' academic participation indicated by credit hours completed and academic performance measured by cumulative GPA significantly predict first-year retention. As a student's earned credit hour per semester increases by one, the probability of returning to the university increases by 6.6%. An increase of 0.1 GPA for a student increases the odds of returning by 2.9%. These two predictor variables explain 41.4% of outcome variation of first-year retention. The specific probability prediction revealed by this study offers university administrators and faculty more measurable and accurate information to their decision makings for improving freshman student retention. Other implications were also discussed including the suggestion that broad efforts be enhanced in creating a more supportive environment that focuses primarily on student academic success.

Introduction
As retention rate is one of the essential indicators of university performance, retaining freshmen is vital for many institutions. It is even more important for institutions that are experiencing decreases in the projected number of high school graduates who matriculate. Among all public four-year institutions, the freshmen-to-sophomore attrition remains the highest. First-year retention of freshmen has been one of the most important indicators contributing to university education quality. It is imperative to keep track of entering freshmen, understand the issues of attrition, and help them retain in college. For decades, various theories and research have been developed and conducted in addressing the factors that impact first-year retention (e.g., Bean & Eaton, 2000;Ishler & Upcraft, 2005;Tinto, 1996Tinto, , 2017. Academic and social integration have become the mainstream theoretical framework for studying student retention. The category of academic factors has been proven to be an essential and significant predictor of variables on retention. (e.g., Nishimoto & Hagedorn, 2003;Tinto, 1996;Wolf-Wendel, Ward, & Kinzie, 2009). However, the majority of research on academic factors has been focused on using the correlational approaches that lead to some general or even vague conclusion on the relationship between student academic integration and retention. Moreover, the academic and social integration model can be insufficient in gaining further clarity on student retention (Davidson, & Wilson, 2013).
This study seeks to investigate the specific predictive capacity of academic integration on college students' first-year retention. The purpose of the study is to understand the specific probability impact and predictability of academic participation and performance by applying statistical models to analyze the freshmen data. Specifically, this study intends to develop a logistic regression model that determines the proportion of freshmen's first-year retention variances that can be accounted for by their first-year academic participation and performance. Furthermore, it examines the specific quantitative predictability of freshmen's academic success on their retention at a public university located in a small-sized city in the Midwest. Results of this study not only add to the body of research on the impact of academic integrity on retention research, but more importantly provide data-driven evidence for university retention administrators as they seek to gauge the quantified contribution of academic success to retention and the balance of resources and efforts to promote student persistence between academic areas and other factors.
Tinto's integration model (1996,2017) provides theoretical insights into academic factors related to the design of this study. This model explains all of the aspects and processes that influenced an individual's decision to drop from the university, and how these processes interact to ultimately produce attrition. The different types of leaving behavior identified include academic failure, permanent dropout, temporary dropout and transfer. Tinto asserts that dropout occurs because the student is insufficiently integrated into different aspects of university academic participation and performance.
Several studies have examined the factors of academic participation and performance influencing students' retention and degree completion (e.g., Friedman & Mandel, 2011;Saele, Dahl, Sorlie, & Friborg, 2017). Focusing their attention on social activities, the lack of interest in academic participation and study among college freshmen is problematic for student retention and graduation due to less academic engagement (as cited in Dirk, 2010). Credit hours enrolled and completed remains a reliable indication of academic participation and engagement. Less than 12 credit hours completed per semester decreases the likelihood that students will retain at college and earn their degree (Complete College America, 2013). The full-time enrollment and completion of 30 or more college-level credits by a student explained a more variance of student retention than the factors of autonomy, competence, relatedness, external pull, or external support combined (Hafer, Gibson, York, Fiester, & Tsemunhu, 2018).
In addition to examining credit hours completed, researchers have investigated the impact of the quality perspective of academic performance on student retention. Students with higher grade point average (GPA) were more likely to retain and graduate (Fryer & Greenstone, 2010;Tinto, 1993). Higher college GPA coupled with rigorous high school course work and fewer remedial courses were related to higher student retention (Warburton, Bugarin, & Nunez, 2001). Hafer et al. (2018) specifically found GPA exhibits the direct moderate effect on student retention. For transfer students, Zhai and Newcomb (2000) revealed that a student's GPA is the best indicator of expected academic performance after transfer while the student's post transfer GPA is the single most important measure associated with retention.
A couple of recent studies investigated the predictability of the academic and social factors on college student retention. Williams, Smiley, Davis, and Lamb (2018) analyzed the predictability of cognitive and non-cognitive factors on the retention rate among freshmen and concluded that high school GPA, ACT/SAT scores, college first-year GPA, academic status, gender, age, residence status, and financial status predict retention. In another study, the interrelations and predictability among socioeconomic status, psychosocial, and student success variables were tested in student retention models. It was found that psychosocial variables, which include problem solving, academic efficacy, connectedness to professors and college, were often good mediating variables when predicting student GPA that leads to eventual retention (Sass, Castro-Villarreal, Wilkerson, Guerra, & Sullivan, 2018).
Consistently, the research demonstrated academic performance held a strong influence on college freshmen's first-year retention. Student retention seems to be generally an academic issue. For a student to persist, the student requires better academic preparation, participation and performance (Buckley & Lafleur, 1991). However, academic factors are a not the single construct that impacts student retention. Other constructs of social, psychosocial, demographical, and environmental factors also contribute to student retention (Astin, 1993, Sass et al., 2018Tinto, 2017). It is important to understand the capacity or specifically the proportion of the measurable academic factors in predicting student retention compared to the other constructs of factors. This study was designed to examine the predictability of the academic factors on college freshmen's first-year retention. Specifically, it investigates the quantified predictability of students' academic participation indicated by credit hours completed and their performance measured by college GPA on their first-year retention.

Subjects
The subjects of this study were a cohort of 799 full-time freshmen enrolled in the fall semester of a teaching oriented Midwest state university in the United States. Of the 799 students, 100% had defined values for age and gender; 752 (94.1%) had valid ACT scores; 767 (96.0%) had high school GPA values; 790 (98.9%) had indicators of returning or non-returning status of the following fall semester; 787 (98.5%) had valid data indicating credit hours completed; and 786 (98.4%) had valid cumulative GPA. Table 1 presents the categorized information of the cohort's demography, financial aid, college cumulative GPA, and retention. There were a total of 738 cases for model building and validation after the pre-analysis data screening. The subjects for model development included a 50% of the students with valid values of the relevant variables. The data were randomly split into two approximately equal groups: the model analysis sample and the holdout sample. Of the 349 cases in the analysis sample selected for model development, 310 cases were included in the analysis. The remaining 36 cases were not included because of missing data on one or more independent variables. There were 389 cases in the holdout sample for model validation.

Data Collection and Process
The data regarding the cohort's demography, financial aid, college credit hours completed, college cumulative GPA, and return status in the following fall semester were collected from the university database. Financial aid data included the amount of awards for three categories-grants, loans, and scholarships, a student received in each of the fall and spring semesters. The average amount of the two semesters in each of three types of financial aid was also computed for analysis. For students who did not return for the second (spring) semester, their financial data in the first semester were used instead of the average amount.
Credit hours completed in this study referred to the average credit hours earned across the two semesters by each student. For students with actual credits completed in only the first semester, the number of credits in this semester was used instead of the average number of the two semesters. Students who enrolled, but did not take any classes or less than 12 credit hours in either of the two semesters were eliminated from the analysis. A majority of the students' cumulative GPA data were those earned in the second (spring) semester. The first (fall) semester GPA was used for those who did not have GPA records in the second (spring) semester. Students without any GPA records were eliminated from the study subjects. The cumulative GPA was transformed into a broad-ranged data by multiplying 10 for easy interpretation.
For ethical consideration, all the data used in this study is secondary data that were collected and approved by the university. Individual student's record was not published, as the focus of this study is to develop a logistic regression model predicting student retention at a university, not investigating specific students.

Research Approach and Variables
This study used the quantitative approach of predictive design to investigate how well students' academic participation and performance coupled with demographics and financial aid predict first-year retention. A complex correlational design was applied to gauge the predictability of the factors of student demographics, financial aid, and academic success on their first-year retention at the university. The tested prediction generates the quantified information on the degree of predictability of the predictor variables on students' first-year retention with similar future cohorts of college freshmen.
The outcome variable of student first-year retention in this prediction study was indicated by returning or not returning in the following fall semester coded as 0 = not returning, 1 = returning. The predictor variables included three categories of data: 1) Demographics: gender coded as 0 = female, 1 = male and first generation coded as 0 = non-first-generation, 1 = first generation; 2) Pre-college academic data: ACT composite and high school GPA; 2) Financial aid: semester average grant coded as 0 = No grant, 1 = $ 2,000 and less, 2 = More than $2,000), semester average loan coded as 0 = No loan, 1 = $2,000 and less, 2 = 2146 Investigating the Predictability of Academic Participation and Performance on First-Year Retention of College Freshmen More than $2,000, semester average scholarship coded as 0 = No scholarship, 1 = $400 and less, 2 = More than $400; 3) College academics: academic participation measured by semester average earned credit hours, academic performance measured by cumulative GPA, and total number of the developmental courses a student took coded as 0 = No, 1 = one course, 2 = two and more courses.

Statistical Modeling Techniques
Multivariate models were developed using binary logistic regression analysis in this prediction study. Binary logistic regression is a proper and useful tool in predicting the dichotomous outcome variable from continuous or categorical predictors (Field, 2017). In this study, the outcome variable of first-year retention was measured by the two categorical data while the predictor variables included both continuous and categorical measurement scales. Academic participation measured by semester average earned credit hours and academic performance measured by cumulative GPA are continuous while the other predictor variables are categorical.
The SPSS logistic regression program automatically generates dummy variables for each value of the categorical variable. Dummy variables were created using the requirement of taking developmental courses and the three variables of financial aid: semester average grant, semester average loan, and semester average scholarship. Dummy variables were used in the model development with the group of students receiving no financial aids (grant, loan, and scholarships) and no developmental courses as the reference category.
A sample of 50% of the cases from the valid data of the cohort group was randomly selected for use in the model building phase of the analysis. The remaining cases were held out for use in validating the model. Listwise deletion of missing cases (i.e., where a case is removed if information on any one of the predictor or criterion variables is unavailable) was employed leaving 313 cases for the initial phase. The models were based on goodness-of-fit tests to determine the overall fit of the model to the observed frequencies and to examine how various iterations of the models improved their ability to predict the dependent variable.

Assessment of the Goodness-of-Fit
All the tests and statistics related to a goodness-of-fit proved that this model fits the data very well. A test of the direct model against a constant-only model was statistically reliable, with Wald statistic = 71.23, p < .001. This indicated that the predictors, as a set, reliably distinguished between returning and non-returning students. Omnibus tests indicated that the model was significant with χ2 = 101.38, p < .001. The overall measure of how well the model fits is given by the likelihood value (-2 log likelihood). Smaller -2 LL value means that the model fits the data better. The direct model produced a -2 LL value of 246.17 with Nagelkerke's R 2 =.414. The Hosmer and Lemeshow test of goodness-of-fit was also applied to assess the model. Statistics indicated that the model was a very good fit (χ2 = 6.46, p = .596).
Using the cutoff value of .55, prediction success for the cases utilized in the development of the model was high, with an overall prediction success rate of 80.6% and correct prediction rates of 91.4% for the returning students and 48.1% for the non-returning students. When the model was validated with the holdout group of cases, the overall prediction success rate was even higher (82.9%), with correct prediction rates of 90.4% for the returning students and 57.7% for the non-returning students. The differences for the classification results between the two groups of model analysis and holdout verification were small, which reflected stability in the model when applied to another sample.

Significant Predictors and Their Predictability in the Model
The direct model statistics show that the non-significant predictor variables include gender (p = .201), first generation (p = .342), semester average loan (p = .180), semester average scholarship (p = .345), and developmental courses a student took (p = .101). The predictor variables in the direct regression model that produced significant differences between the groups of returning and non-returning students are semester average earned credit hours (p = .011) and cumulative GPA (p < .001). Semester average grant of $2,000 and less was a negative marginal predictor (p = .074).
The logistic regression statistics including the significant predictors' equation coefficients, Wald statistics, p-values, and odds ratios are presented in Table 2. The equations for this logistic model with significant predictors are: In(odds)Retention = -5.972 + 0.13GPA + 0.325AVERAGECR. In this equation, odds refer to the ratio of the probability that returning will occur divided by the probability that non-returning will occur. The overall returning probability of the full-time freshmen cohort was 68.0% and the non-returning probability was 32.0%, then the odds of returning was 2.125. In(odds), is the natural logarithm (In) of odds.  (1) refers to the group of the freshmen who received the amount of semester average grant ranging from $200 to $2,000.
As the equation is logarithmic, the impact of each variable on the outcome variable varies by value. Effects in terms of log-odds are hard to interpret and they are generally exponentiated to give odds-ratios (see the column headed Exp(B) in Table 2) for interpretation. Students' returning to university the following fall semester was highly related to cumulative GPA. Students with higher GPA were more likely to return. Specifically, each additional 0.1 cumulative GPA for a student will increase the odds of his/her returning by 14.5%, given the other variables in the model controlled. Given the fact that the retention rate for this freshmen cohort was 68.0%, in other words, an increase of 0.1 cumulative GPA will increase the probability of returning to 70.9%, with a returning increase by 2.9%. The variable of semester average earned credit hours also had significantly positive relation with returning. Each additional credit hour per semester for a student will increase the odds of his/her returning by 38.4%, controlling for other variables in the model. This indicates that one earned credit increase per semester will increase the probability of returning to 74.6%, with an increase by 6.6%, controlling for other variables in the model.
The Nagelkerke's R 2 provides a gauge of the substantive significance in the model, measuring how much of the variability in the outcome is accounted for by the predictors (Field, 2017). Nagelkerke's R 2 =.414 in the logistic regression model reveals that 41.4% of the outcome variation of student retention could be accounted for by the two significant predictor variables of academic participation indicated by credit hours completed and academic performance measured by cumulative GPA. These two predictor variables have explained the fairly large amount of variation in the outcome variable of first-year retention.

Discussion and Conclusions
Findings of the binary logistic regression in this study indicate that academic participation measured by credit hours completed and academic performance measured by cumulative GPA significantly predict first-year retention. College freshmen first-year retention is significantly correlated with their academic success composed by academic participation and performance, which reflects both academic quantity and quality. The achievement and/or integration of first-year academic quantity and quality constitutes the ideal foundation for student retention to the university. Students' academic success and integration motivate students' learning and enhances their commitment to the university. This finding reflected the well-documented literature that retention of college freshmen is greatly improved if the critical need for improving academic and support services for them are met. A primary ingredient that greatly enhances freshmen college survival is developing positive habits in academic engagement such as going to class (Raab & Adam, 2005).
This study suggests the importance that broad efforts be enhanced in creating a more supportive environment that focuses primarily on student academic success, regardless of the individual student's entry level of preparedness. It is essential to provide a holistic, academically-focused, and student-centered environment that optimizes freshmen's learning and academic success. This supportive learning environment should foster measurable improvements in both academic participation and performance. Considering the nontraditional backgrounds of college freshmen, universities need to develop and implement innovative and effective strategies with the focus on high-impact learning that promotes personalized leaning, adaptive learning, and blended learning (Fishermen, Ludgate, & Tutad, 2017). It is also important in retaining students to engage them through integrated, problem-based courses.
This study demonstrates the specific predictability of academic participation and performance on student first-year retention in the university. These two academic factors explain 41.4% of the outcome variation of student retention, which is a fairly large amount of the variation. Analysis of the logistic regression model shows that as a student's earned credit hour per semester increases by one, the probability of returning to the university increases by 6.6%. An increase of 0.1 GPA for a student increases the odd of returning by 2.9%. These findings reveal the specific prediction relationship between academic success and retention and meaningful information for university administrators and faculty in better understanding and improving student retention.
The above conclusion regarding the specific predictability of academic participation and performance on retention has potentially important implications for practical retention prediction, where traditionally most attention has focused on understanding the degree of correlation. The specific probability prediction revealed by this study offers university administrators and faculty more measurable and accurate information for planning and making decisions on student retention. Students taking more credit hours coupled with higher academic GPA are more like to persist. Students can be advised to enroll 15 hours each semester for better persistence using appropriate intervention, proven best practices, and evidence-based models based upon the institution context (Complete College America, 2018;Klempin, 2014). Academic support programs targeting challenging college courses and peer-led review sessions designed to develop academic skills and improve grades can improve student retention, particularly for students who may not be well prepared academically (Skoglund, Wall, & Kiene, 2018).
Although the predictability of academic participation and performance is significant, the predictive model also shows that about 58% of the retention variation cannot be explained by academic participation and performance. This suggests one key limitation of the study that it did not have the capacity to include other important factors that account for this proportion of the retention outcome reflected in the logistic regression model. These factors can include students' backgrounds, the role of the institution, campus facility and cultures, social integration, students' intentions, and psychological processes (Bean & Eaton, 2000;Braxton, 2000). Furthermore, the traditional framework of academic and social integration (Tinto, 1996) is not sufficient in gaining further clarity on student retention (Davidson & Wilson, 2013) since retention nowadays has become a more complex educational phenomenon. With the focus on enhancing student academic success in both the quantity completion of credit hours and the quality accomplishment measured by college GPA, university retention policies and procedures, allocation of resources, and implementation efforts should comprehensively consider the other various factors.