Predictive Analysis of One-Year Retention

University of Houston

Jorge Martinez, Caroline Neary, Jenna Tucker

2020-10-26

Introduction

In Fall 2019, Provost Paula Short charged a task force with developing evidence-based recommendations for increasing undergraduate student retention and timely degree completion in support of the University of Houston’s (UH) student success goals. Chaired by Dr. Teri Longacre,1 Vice Provost and Dean for Undergraduate Student Success members of the data subcommittee2 Jorge Martinez, Senior Research Analyst
Caroline Neary, Senior Research Analyst
Jenna Tucker, Director of Research and Reporting, Enrollment Services
Contributing members: Mary Dawson, Frank Kelley, Jonathan Williamson
were tasked to formulate a predictive model for one-year retention. What factors predict student retention and how can the task force leverage the predictive model into actionable policies and practices towards increasing retention rates? In this research report, we present findings from our predictive models to achieve these ends.

In this study, we compiled a host of predictors that experience and literature tells us are likely to predict retention. Using data available to us, we explored bivariate and conceptual regression models to identify the most important predictors of retention. We ran various models and selected the most parsimonious full-year model. We split our analyses into fall, spring, and full-year retention models to evaluate the best predictive value at three different time points. Finally, we calculated predictive probabilities from our models to score students for intervention. Ultimately, we can use these models to identify students in future cohorts so that our interventions maximize the probability of retention one year later.

Data and Descriptive Statistics

We focus our analysis on three full-time, degree-seeking, first-time in college (FTIC) cohorts from Fall 2016 to Fall 2018. Our total study population includes 13,927 students.

Cohort N Retention
Fall 2016 4,263 84.9%
Fall 2017 4,745 84.9%
Fall 2018 4,919 84.8%

One-Year Retention Rates, FTIC Cohorts 2007 - 2018. One-Year Retention Rates, FTIC Cohorts 2007 - 2018.

The average one-year retention rate for our study population is 85%. Since the 2007 FTIC cohort, one-year retention rates have increased from 79% to a peak of 86%. The retention rate has been steady around 85% for the past four years.

We tracked each student in our sample and collected demographic, admissions, academic, and financial indicators during their first academic year at UH. We considered all of the following characteristics in our analysis:

Admissions Demographic Financial Academic
High school rank Gender Estimated Family Contribution Hours taken
Transfer credits Race/Ethnicity Unmet financial need Hours passed
SAT/ACT Score Age Scholarship recipient Term GPA
Application date First generation status Lost scholarship Cumulative GPA
Orientation date Commute distance Financial aid award amount (excluding loans) Academic standing
First choice college Residency status Total loans D/F/W grades
Region of residence FAFSA verification selection College
Filed FAFSA as independent Full-time/Part-time
Pell eligibility STEM major
Financial delinquency (at end of term) UHin4
Payment deferment plan Honors
College change spring term
Math Core credit
English Core credit
First year math level
Term withdrawal
CORE 1101 enrollment

We began our analysis by calculating one-year retention rates by student characteristics (see full descriptive statistics). Demographically, we found women, Asian/Pacific Islanders, non-first-generation, residents adjacent to Harris County, and international students had higher retention rates compared to their counterparts.

Admissions characteristics show students who apply early in the fall (August-December) had greater retention rates than students who apply later in the admissions process. Students who rank highly in their graduating high school class also had higher retention rates with increasing ranks. SAT score differences between students retained and those who did not return one year later were minimal: those who were not retained had an average SAT score of 1179.23 versus students who returned one-year later who scored an average of 1205.95.

Once admitted, students who attended the earliest orientation in April (reserved for students admitted to the Honors College) had the highest retention rate (96%), compared to 87% for May-June, 81% for July, and 74% for August orientation attendance. Students admitted to their first choice college had a higher retention rate (87%) relative to those who were not admitted to their first choice college (82%).

Financial aid measures show some key differences between students that are retained and not retained. The table below shows students retained one year later had an estimated family contribution that is $3,772 more than the average for students not retained one year later. Retained students received $1,036 more aid (excluding loans) compared to students not retained one year later. Retained students also took out about $688 less in loans on average. Finally, non-retained students had an average of $2,176 more unmet need compared to retained students.

Variable Min Max Average Std. Dev. Retained Average Not Retained Average
Estimated Family Contribution (EFC) $0 $999,999 $16,512 $39,889 $17,082 $13,310
Total financial aid (excluding loans) $0 $53,324 $7,479 $6,991 $7,636 $6,600
Total amount in loans (all sources) $0 $42,550 $2,367 $4,657 $2,263 $2,951
Unmet need $0 $44,728 $7,976 $7,453 $7,651 $9,827

We also observed a large difference in retention rates contingent on student scholarship status in the first year. Students that lost a scholarship had a one-year retention rate of 72% compared to 97% for students that keep their scholarship. Students who did not receive a scholarship in the first place were still more likely to be retained compared to students who lost a scholarship.

Academic measures track student performance at UH in their first academic year. These include hours taken, grade point average (GPA), courses from which a student either withdrew or earned a D or F grade (DWF) and core math and English course completion. Other important academic-related characteristics include attendance (full-time, part-time) and UHin4 participation. We highlight a few important academic metrics here and the rest can be found in this link.

The table below summarizes credit hours, GPA, and DWF grades. Retained and non-retained students averaged about the same credit hours taken their first term (FTIC students must enroll full-time in their first term), but retained students averaged 1.3 more credit hours the second term compared to non-retained students. Retained students averaged a 3.1 GPA their first and second term, which is 1.1 and 1.4 points higher than non-retained students, respectively. Non-retained students had nearly three times as many DWF grades in their first year compared to retained students.

Variable Min Max Average Std. Dev. Retained Average Not Retained Average
First Term Hours Taken 0 27 14.9 1.6 15.0 14.5
Second Term Hours Taken 0 23 14.7 2.2 14.9 13.6
First Term GPA 0 4 3.0 1.0 3.1 2.0
Second Term GPA 0 4 2.9 1.1 3.1 1.7
First Term DWF 0 8 0.7 1.3 0.5 1.9
Total DWF 0 12 1.5 2.1 1.2 3.5

Nearly three-quarters of our study population were UHin4 participants (n=10,011). UHin4 is a program that provides a comprehensive plan to support timely graduation in 4 years. Eighty-seven percent of UHin4 students returned one-year later compared to 81% of non-UHin4 students. We also measured whether students changed their major to a new college during their first year. Students who changed their major to another college had a one-year retention rate ten percentage points higher than those who did not change their major college.

Finally, we investigated retention rates among students who complete math and English core courses in their first year. We found students who earned credit for both math and English core courses in their first year were more likely to return the following fall semester compared to students who did not earn credit or did not attempt these courses. The retention rate gap is largest in English courses: 88% of students that earned English core credit their first year were retained compared to 37% of those students that did not earn English core credit. Differences were smaller in math, but students who did not attempt math were nine percentage points less likely to be retained than students who earned math credit their first year. Next, we explore how retention is mediated by the combined effects of each characteristic in predictive statistical models.

Predictive Models

As we see descriptively above, retention rates vary when we slice the data by various student characteristics. In this section, we develop statistical models to understand how these characteristics interact with each other. We evaluate the predictive power of each model using various statistical procedures to develop models that help us best understand one-year retention. Using our final models, we calculate predictive probabilities to identify which students are at risk of not being retained and may benefit from intervention. These models can be applied to future FTIC cohorts for intervention at different stages in the first academic year.

Logistic Regression

Regression modeling is a statistical procedure to investigate the relationship between response (\(Y\)) and predictor (\(X\)) variables. The predictors here are demographic, admissions, academic, and financial characteristics used to predict one-year retention. In our analysis, the response variable measures whether a student enrolls the following fall semester after their first academic year (Retained = Yes/No). As a binomial response variable, we use logistic regression models to estimate the probability a student is retained one-year later based on their characteristics.

Our models take the generalized formula:

\[ln(\frac{P}{1-P})_{retained} = \beta_{0} + \beta_{1}X_{1} + \beta_{2}X_{2} + \cdots + \beta_{n}X_{n} \] where

\[ ln(\frac{P}{1-P})_{retained} \text{ is the log-likelihood of being retained,} \]

\[\beta_{0} \text{ is the intercept, and}\]

\[\beta_{1}X_{1} + \beta_{2}X_{2} + \cdots + \beta_{n}X_{n} \text{ are the predictor coefficients.}\]

Coefficients (\(\beta\)) are the rate of change a specific predictor (\(X\)) has on the probability of retention. Next, we explore how student characteristics impact the probability of being retained one-year later at three distinct time points: at the start of fall term, at the start of spring term, and at the end of the full academic year.

Fall Model

We constructed the conceptual framework for our models chronologically to coincide with student progression through the academic year, data availability, and points of intervention. This Fall Model contains all immutable student characteristics (e.g. race/ethnicity) and other descriptors (e.g., fall credit hours taken) that we have data for in September of the fall semester. This model acts as a baseline for our understanding of attrition risk at the beginning of the academic career. It helps us flag students who should be monitored as early as possible.

The ladder plot below outlines our findings for the Fall Model. Each circle represents the estimated coefficient for each predictor while controlling for all other predictors in the model.3 See appendix for tabular model coefficients. Each line represents the 95% confidence interval indicating the true value of the population coefficient falls within that range. The smaller the interval, the more precise our estimate is.4 “All models are wrong, but some are useful.” - George E. P. Box
This popular quote by British statistician George Box published in The Journal of American Statistical Association asserts statistical models cannot capture all the complexities of social realities, but our careful estimates can still be useful.
All estimates that fall to the right of the dashed zero-line indicate a positive relationship with the probability of being retained and all estimates that fall to the left indicate a negative relationship. Confidence intervals that touch or cross the dashed zero-line indicate statistically insignificant relationships. All others that do not touch or cross the line are statistically significant.5 Statistical significance at the p < 0.5 level.

Fall Model Logistic Regression. (Estimates are log-likelihoods)

Fall Model Logistic Regression. (Estimates are log-likelihoods)

This model depicts a number of statistically significant relationships. When controlling for all other variables in the model, we found African-American, Asian/Pacific Islander, and Hispanic students were all more likely to be retained relative to white students.6 Probabilities for categorical variables like race and gender are expressed relative to a reference category indicated in parentheses (e.g., ref. white). Probabilities for numeric variables such as test credits are expressed for every unit change (e.g., 1 credit hour increase). Male students were also less likely to be retained relative to female students. Relative to students from Harris County, students from other, non-adjacent Texas counties and students from out-of-state were less likely to be retained one year later. There is no statistically significant difference in retention between students from Harris and adjacent counties.

When it comes to high school preparation, all students ranked below the top 10% in their high school class were incrementally less likely to be retained one year later than students in the top 10%. Students who did not report a high school class rank also had a lower likelihood of retention than those students in the top 10%, roughly equal to the estimated likelihood of students in the 60-79th percentile range (or the top 21-40%). SAT scores were not statistically significant from zero which indicates it is not a good predictor of one-year retention. Test credits and transfer credits at entry were both significant in increasing the probability of retention. For example, for every unit increase in test credits, we saw a 2.5% increase in the odds of returning to UH one year later. Similarly, for every unit increase in transfer credit we saw a 1.1% increase in the odds of being retained one year later.7 We calculate odds ratios by exponentiating the estimate (log-likelihood) in the model. For example, the odds ratio for test credit is:

\(exp(0.0243)=1.025-1=0.025\) or 2.5%.

See appendix for model estimates and odds ratios. Visit this link for details on interpreting logistic regression coefficients and odds ratios.

Compared to students who attend orientation in May/June, students who attended orientation in April were more likely to be retained one year later and students who attended orientation in July/August were less likely to be retained one year later. This suggests early orientations have a positive impact on retention, all else equal.8 It is possible this relationship is confounded by more “motivated” students completing orientation early on.

We found students who participated in UHin4 were 16% more likely to be retained one year later compared to non-participants. Taking on additional credit hours in fall term also had a positive impact on retention: for every unit increase in credit hours taken at UH, there was a 15.5% increase in the odds of returning the following fall term.

Finally, we examine student financial characteristics in our fall model. We found students who took out more loans, students who had greater unmet need, and students who deferred their payments in the fall term were less likely to be retained one year later. For example, students who deferred their payments were 1.7 times less likely to return one year later compared to students who did not defer their payments.

Our fall model represents a baseline for predicting one-year retention at the start of the first term. We leverage theoretically important predictors that are commonly used to model one-year retention and that are available in the first term. However, our fall model captures the very early stages of a year-long process. As such, our fall model is not as strong as our spring and full-year models.9 See Appendix: Model Assessment for more details on the predictive power of our models. As illustrated in the next section, our models gain predictive power once we capture fall academic performance measures.

Spring Model

Our spring model takes the analysis further with new data collection in February of the spring term. We carry over fall model predictors and add spring attendance, risk of losing a scholarship at the end of fall term, fall GPA, and spring payment deferment. The figure below shows how our model estimates modulate with these additional predictors.

Spring Model Logistic Regression.

Spring Model Logistic Regression.

Adding spring predictors to our retention model reveals a few important observations. First, when we controlled for fall academic performance as measured by GPA, spring attendance, and financial characteristics in the spring, we found previous high school academic performance was no longer statistically significant in predicting one-year retention. This suggests that while high school preparation was useful as one of the only academic preparedness indicators available in the first term, it was no longer relevant once we have data about academic performance at UH. Standardized test scores continued to be statistically insignificant suggesting they are a poor predictor of retention. Similarly, test credits at entry were no longer statistically significant. Transfer credits at entry continued to be significant in the spring model.

Second, expanding our model to include academic performance at UH slightly modulates the strength of our estimates. For example, when we controlled for spring attendance, fall GPA, scholarship risk, and spring payment deferments we saw the positive relationship between retention and race/ethnicity become stronger for African-American students (estimate 0.55 to 0.67) and slightly weaker for Asian/Pacific Islanders (estimate 0.76 to 0.63). Furthermore, the relationship between retention and gender was no longer statistically significant from zero when we controlled for spring model characteristics. This suggests there is no difference in retention between male and female students after controlling for first-term academic performance. The difference between attending a July orientation and a May/June orientation was no longer statistically significant. UHin4 participation and the effect of total loans were also no longer statistically significant.

We found students who were enrolled part-time in the spring were 3 times less likely to be retained compared to students who were enrolled full-time. Similarly, students not enrolled in spring were 40 times less likely to be retained compared to students enrolled full-time, all else equal. These findings corroborate on-going benefits of full-time enrollment for one-year retention and show to what extent these relationships are important.

We included whether a student is at-risk for losing their scholarship from the fall term.10 In order for first-year to retain their university merit scholarship, students must maintain a cumulative GPA of at least 3.0, and complete a total of 30 credit hours. Transfer, AP, and dual credit hours count toward this requirement. Students have until the end of the summer following their first year at UH to meet these requirements, at which point a decision is made whether to renew their scholarship for the second year. For this analysis, FTICs were considered to be “at risk” for losing their university merit scholarship if they did not have at least 15 total credit hours of progress (including transfer hours) and/or if they did not have a cumulative GPA of at least 3.0 after their first fall term at UH. We found students who were in good standing for keeping their scholarships in the fall term were more likely to be retained compared to students at risk of losing their scholarship or students who did not have a scholarship. These relationships are in the direction we expect, but they are not statistically significant.

Fall GPA has a positive impact on the probability of returning one year later. For every whole unit increase in GPA, we expect to see about 108% increase in the odds of being retained one year later. This is a significant impact on the probability of being retained and supports our expectations that academic success positively effects retention.

Finally, collecting more predictors in the spring term increased the predictive power of our model. The classification rate - the rate at which our spring model accurately predicts retention - increased by 13.5 percentage points from 67.8% in the fall model to 81.3% in the spring model.11 See Appendix: Model Assessment section on ROC and AUC values for more on classification rates. The Full-Year Model takes our analysis a step further by modeling retention as a function of fall, spring, and summer term cumulative data.

Full-Year Model

Our full-year model incorporates predictors from an entire academic year for each student in our study. The predictors in this final model were carefully considered after many model iterations, statistical testing, and discussions of their theoretical and practical importance.12 Visit this link for bivariate logits. See Appendix: Model Exploration and Assessment for conceptual models and feasible solutions algorithms using rFSA. This final model adds cumulative credit hours, cumulative GPA, college change, and scholarship retention after a full academic year.

Full-Year Model Logistic Regression.

Full-Year Model Logistic Regression.

In the full year model, we found part-time enrollment in the spring term was no longer statistically significant as it was in the spring model. This is due to controlling for cumulative hours taken for the entire academic year. However, students that were not enrolled in the spring were six times less likely to be retained relative to students enrolled full-time in the spring. This effect was smaller than in the spring model.

This model does not show any statistically significant relationships between orientation month and the probability of being retained. Total loans, unmet need, and payment deferment in the fall and spring terms were also not significant. However, we found students who lost their scholarship in their first year were two times less likely to be retained relative to students who kept their scholarship. The effect was smaller for students who had no scholarship: they were 1.3 times less likely to be retained relative to students who kept their scholarship.

Hours taken and GPA were consistently positively associated with retention. For every hour increase in cumulative hours completed, we found the odds of being retained increased by 13.7%. For every whole unit increase in GPA, we found the odds of being retained increased by 142%. As expected, student academic performance was highly related to retention.

We continued to observe increased probabilities for retention for African-American, Asian/Pacific Islander, and Hispanic students as compared to white students after controlling for differences in academic and financial aid characteristics. For example, we found African-American students had 98.9% greater odds of retention relative to white students. For Asian and Hispanic students, the odds were 70% and 33% greater than white students for being retained one year later. Similarly, students from non-adjacent Texas counties and students from out of state were 1.9 and 2.3 times less likely to be retained relative to Harris County residents, respectively.

Our full year model has an increased classification rate of 85.1%, an improvement of 3.8% points. The fit of the model according to McFadden’s Pseudo-R2 is 0.348 which suggests a relatively strong model.13 Unlike R2 in linear regression, models rarely achieve a high McFadden Pseudo-R2. Models with a pseudo-R2 close to or greater than 0.4 indicate a very good fit (pg 35). Where do we go from here knowing we have a relatively useful model for predicting retention? In the next section, we explore how to use our models to identify students at risk for attrition.

Calculating Predicted Probability Scores

In addition to using the models to better understand how student characteristics interact with each other to impact retention, we can also operationalize the models to identify at-risk students for intervention. Each model results in a formula that can be used to predict the likelihood of one-year retention as a function of our predictor variables (see section on Predictive Models). We can use these formulas to calculate predicted probabilities for each individual student in a first-year cohort, both at the start of the fall term (with the Fall Model) and again at the start of spring term (with the Spring Model). As discussed previously, the Spring Model is stronger than the Fall Model; however, the Fall Model allows for earlier identification and intervention.

Once each student has an individual probability of retention, we stratify students into three groups based on risk of attrition: high risk, medium risk, and low risk. We sort each individual student probability of retention from smallest to greatest and rank them on a scale of 0-100. From here, we set the cut point for high risk at the 5th percentile (i.e., probability of retention is lower than 95% of the population), and the cut point for medium risk at the 30th percentile (i.e., probability of retention is lower than 70% of the entire population). These thresholds may be adjusted as needed to suit an institution’s population and capacity; we propose the 5th and 30th percentiles based on recommendations from EAB’s population health management model.14 Venit, Ed et al. 2020. What Can Health Care Teach Us About Student Success? EAB White Paper.

Leininger, Lindsey and Thomas DeLeire. 2017. Predictive Modeling for Population Health Management: A Practical Guide. Mathematica Policy Research: 1-6.
These thresholds focus efforts on the 30% most acute risk cases, including medium risk students where intervention efforts tend to be maximized.

The figure below shows the distribution of calculated probability scores from 0-100% in each of our models. They are colored by risk level as defined above. The dashed line represents the mean probability of retention, 85%.

Distribution of One-Year Retention Probabilities by Model (counts, n=12,406).

Distribution of One-Year Retention Probabilities by Model (counts, n=12,406).

Count of Students by Risk Group (n=12,406). Count of Students by Risk Group (n=12,406).

The figure to the right, Count of Probabilities by Risk Group, shows how many students fall into each risk group. The low risk group represents 70% of the population, the medium group represents 25%, and the highest risk group represent 5%. These proportions are represented in all three of our models. However, the scores at these cut-points vary by model:

Model Probability of Retention at 5th Percentile Probability of Retention at 30th Percentile
Fall 66.8% 82.2%
Spring 37.5% 87.9%
Full-Year 30.4% 88.4%

In our Fall model, the lowest probability of retention is 20%. This means every student has at least a 1 in 5 chance of being retained one year later according to what we can measure in the first term. When we improve our model in the spring term, the smallest probability is 0.4% (0.1% in the full-year model). Our model calibration is improved when we add important predictors to our spring and full-year models (see appendix for model assessment).

Discussion and Conclusions

In this report, we examined many factors that contribute to our understanding of one-year retention. Our goal was to better understand which variables predict one-year retention by developing a predictive model to identify students at greatest risk of attrition. Ultimately, these models can be used to identify students for intervention to maximize the probability they will enroll the following fall semester.

We collected demographic, academic, and financial indicators for students from the Fall 2016 to Fall 2018 FTIC cohorts. We tracked these students through their first academic year and collected data at three different time points: the start of the fall semester, the start of the spring semester, and the end of the academic year. We tested many models and identified a two-stage model which assesses likelihood of one-year retention when they start their first term, and again when they complete their first term. The first assessment allows for earlier intervention, while the second assessment gives us stronger insight.

When controlling for the variables in our Fall Model, we learned African-American, Asian/PI, and Hispanic students were all more likely to be retained one-year later compared to white students. We also learned female students and students from Harris County were more likely to be retained one year later compared to male students and students from non-adjacent Texas counties or from out of state. We found students who rank highly in their high school class and who come to UH with test and transfer credits were more likely to be retained. UHin4 participants and students who took more credit hours were also more likely to be retained. Students who took greater total loans, had greater unmet need, or deferred payment in the fall were less likely to be retained. Finally, students who attended orientation earlier were more likely to be retained.15 The earliest orientation in April is generally reserved for honors students; however students who attended orientation in May/June were more likely to be retained than those who attended in July or August.

In our Spring Model, we gained more insight by controlling for newly available academic measures like fall GPA, spring attendance, and scholarship status. Fall GPA was a strong predictor of retention. In addition, full-time enrollment in the spring had a strong positive effect for retention compared to part-time enrollment and especially for students not enrolled in spring.

Our Spring Model also showed us how UH academic performance mediated previous relationships. We learned that once first term academic outcomes were in the model, previous high school academic performance was no longer statistically significant in predicting retention.16 Controlling for high school rank can be problematic without controlling for high school academic quality. For example, one study finds that students who graduate from high achieving high schools may be less likely to gain admissions to elite colleges (see Espenshade et al, 2005). High grades and relative class rank vary greatly depending on the competitive context of certain high schools. A student who graduates in the top 10% of an under-performing high school may not have the same college-preparedness level of a student in top 10% from a gifted & talented magnet high school. Additionally, the relationship between gender and retention was no longer statistically significant in the spring model compared to the fall model. The same was true for test credits at entry, total loans, and UHin4 participation. We also discovered Asian/PI and African-American students were about equally more likely to be retained compared to white students in our spring model. The fall model showed Asian/PI students lead all groups as being more likely to be retained compared to white students.17 Findings from the Dropped Student Survey found Asian students were more likely to cite academic performance as their reason for stopping enrollment.

Although waiting until the end of the first year doesn’t allow time for early intervention, our Full-Year Model allows us to better understand one-year retention with data from the full academic year (fall, spring, summer). In this model, we controlled for cumulative credit hours, cumulative GPA, college change, and scholarship retention after a full year. We learned cumulative GPA, cumulative hours taken, and spring attendance were strong positive predictors of retention. Although spring attendance was not as strong in this model as it was in the spring model, not being enrolled in the spring term was still among the top predictors of one-year retention. The negative effect of student residency from other Texas counties and students from out of state became even stronger in this model when compared to students from Harris County. Losing a scholarship in the first year also lowered the odds of retention one year later.

In summary, our findings suggest several key indicators of students at risk of attrition. In academic performance, low GPA in the fall term and less than full-time enrollment in the spring term are both red flags. In terms of financial status, unmet need and deferred payment plans are both predictors of attrition. The academic and financial factors coalesce with scholarships, as students must maintain satisfactory academic performance to keep their scholarships. Students who lost a scholarship were more likely to stop-out one year later compared to students who kept their scholarship in the full year model. In fact, students with no scholarship were more likely to be retained one year later compared to students who lost their scholarship.

We recommend applying our statistical models to first year FTIC students in the fall and spring terms to identify those students at greatest risk of attrition. UH’s one-year retention rate is relatively high at 85%, which means that we cannot focus on one or two silver bullet indicators to effectively move the needle. Instead, we should take a probabilistic approach to boost an already high retention rate even higher by selecting students at greatest risk relative to their peers. We can calculate predicted probabilities for retention using our model predictors on future cohorts so that the appropriate offices may intervene as necessary. Furthermore, if we focus on students in the medium risk level, we may be able to maximize our intervention efforts.

Finally, we recommend investigating best practices at our peer institutions.18 See University of Houston Peer Institutions: Identifying Peer Groups Through Cluster Analysis for more information on identifying comparable and aspirational peers. What can we learn from institutions that are closely identical to UH that have higher freshman retention rates? What policies or practices do our aspirational peers follow that make their freshman retention rate break the 90% threshold? The table below outlines the average proportion of freshmen entering in Fall 2015 through Fall 2018 who returned the following fall as calculated by the U.S. News & World Report.

Four-year average first-year student retention rate, Fall 2015-Fall 2018 cohorts (Issue Year 2021).

Comparable Peers Retention Rate Aspirational Peers Retention Rate
Florida International University 89% University of California Davis 93%
Georgia State University 82% University of California Irvine 93%
University of Central Florida 90% University of California Riverside 90%
University of Illinois at Chicago 80% University of California San Diego 95%
University of Nevada Las Vegas 77% University of California Santa Barbara 93%
University of North Texas 79% University of Texas at Dallas 88%
University of South Florida 91% Stony Brook University 90%
University of Texas at Arlington 72%
University of Texas at El Paso 74%

There are areas beyond demographic, academic, and financial indicators which we could not evaluate in this study, but know have an impact on retention.19 This study does not control for impacts of Hurricane Harvey which hit the Houston Metropolitan Area August 17, 2017 - September 3, 2017. In the future, we would like to incorporate student engagement and sense of belonging data into the models. For example, how does student engagement with learning communities, extracurricular activities, and facilities like the Campus Recreation and Wellness Center mediate one-year retention?

The analysis brings up many questions about why certain indicators impact retention that could guide future research. Why do retained students start with more transfer or test credits? How can we account for students who had strong academic performance and transferred to another institution after the first year? We can speculate, but it is essential to further investigate these indicators and the way in which they interact with each other.

Appendix

Model Exploration and Assessment

Conceptual Models

We started our exploratory modeling with conceptual models constructed around three themes: high school preparedness, UH academics, and financial characteristics. We nested the models by adding one predictor to each successive model to compare how each predictor contributes to our understanding of retention. We compared nested models by using the Akaike Information Criterion (AIC). If the model with the additional predictor has an AIC value that is lower than the comparison model, we can conclude that adding this additional predictor results in a better fit. In the example below, adding SAT to a model with high school rank reduces the AIC value by 183 points, which indicates a better fit model.20 Obviously, adding an additional predictor to a model with only one predictor is likely to result in a better fit model. More information will usually result in greater knowledge, assuming the predictor has an observed relationship with the dependent variable.

Demographic nested models
Dependent variable:
retained
(1) (2) (3) (4) (5)
African American (ref. White) 0.010 -0.016 0.026 0.106 0.137
(0.084) (0.084) (0.085) (0.086) (0.089)
Asian/PI 0.891*** 0.883*** 0.932*** 0.861*** 0.877***
(0.076) (0.076) (0.077) (0.078) (0.078)
Hispanic -0.011 -0.025 0.080 0.041 0.058
(0.061) (0.061) (0.065) (0.066) (0.067)
Other 0.218* 0.207* 0.287** 0.086 0.094
(0.098) (0.098) (0.102) (0.117) (0.117)
Male (ref. female) -0.312*** -0.320*** -0.342*** -0.342***
(0.048) (0.048) (0.048) (0.049)
First Generation (ref. Not first gen.) -0.238*** -0.256*** -0.222***
(0.053) (0.054) (0.057)
Generation unknown -0.289** -0.409*** -0.364***
(0.095) (0.100) (0.102)
Adjacent counties (ref. Harris County) 0.023 0.011
(0.061) (0.062)
Other Texas counties -0.424*** -0.434***
(0.067) (0.067)
Out-of-state -0.720*** -0.733***
(0.135) (0.135)
International 0.454* 0.461*
(0.191) (0.199)
Pell eligible (ref. Not eligible) -0.140*
(0.059)
Pell unknown -0.105
(0.072)
Constant 1.516*** 1.694*** 1.760*** 1.899*** 1.955***
(0.048) (0.056) (0.058) (0.067) (0.073)
Observations 13,927 13,927 13,927 13,927 13,927
Log Likelihood -5,811.283 -5,790.077 -5,778.233 -5,740.986 -5,737.990
Akaike Inf. Crit. 11,632.570 11,592.150 11,572.470 11,505.970 11,503.980
Note: @ p<0.1; * p<0.05; ** p<0.01; *** p<0.001


High school preparedness nested models
Dependent variable:
retained
(1) (2) (3) (4) (5) (6)
High school rank 0-19 (ref. Top 10%) -1.166*** -1.258*** -1.129*** -1.054*** -0.977*** -0.898***
20-39% -0.833*** -0.870*** -0.722*** -0.655*** -0.598*** -0.512***
40-59% -0.620*** -0.665*** -0.534*** -0.475*** -0.444*** -0.368***
60-79% -0.460*** -0.500*** -0.413*** -0.374*** -0.364*** -0.319***
80-89% -0.319*** -0.316*** -0.259*** -0.241** -0.237** -0.206**
Not ranked -0.514*** -0.578*** -0.463*** -0.406*** -0.371*** -0.262**
SAT score 0.019*** 0.010*** 0.010*** 0.009*** 0.005*
Test credits at entry 0.029*** 0.032*** 0.030*** 0.028***
Transfer credits at entry 0.009*** 0.008*** 0.008***
January-March applicant (ref. August-December) -0.362*** -0.253***
April applicant -0.490*** -0.265@
May-July applicant -0.791*** -0.387**
April orientation (ref. May-June) 0.784***
July orientation -0.369***
August orientation -0.684***
Constant 2.097*** -0.123 0.657** 0.643** 0.765** 1.360***
Observations 13,927 13,615 13,615 13,615 13,615 13,569
Log Likelihood -5,860.412 -5,677.645 -5,636.873 -5,622.382 -5,589.476 -5,510.871
Akaike Inf. Crit. 11,734.830 11,371.290 11,291.750 11,264.760 11,204.950 11,053.740
Note: @ p<0.1; * p<0.05; ** p<0.01; *** p<0.001


University academic nested models
Dependent variable:
retained
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
Full-time last term (ref. Part-time) 0.976*** 0.916*** 0.924*** 0.860*** 0.783*** 0.217* 0.202@ 0.189@ 0.185@ 0.183@ 0.180@
Honors (ref. Not Honors) 0.980*** 0.939*** 0.883*** 0.763*** 0.286** 0.241* 0.281* 0.331** 0.321** 0.334**
STEM (ref. Not STEM) 0.151** 0.142** 0.145** 0.205*** 0.241*** 0.241*** 0.194*** 0.185** 0.195***
UHin4 (ref. Not UHin4) 0.331*** 0.202*** 0.133* 0.158** 0.148* 0.129* 0.129* 0.129*
Fall hours 0.127*** 0.092*** 0.115*** 0.112*** 0.110*** 0.109*** 0.107***
Fall GPA 0.939*** 0.620*** 0.560*** 0.442*** 0.441*** 0.431***
Total DWF -0.179*** -0.167*** -0.158*** -0.158*** -0.157***
English core earned credit (ref. Did not earn credit) 0.787*** 0.684*** 0.686*** 0.693***
English core not attempted 0.624*** 0.576*** 0.575*** 0.574***
Math core earned credit (ref. Did not earn credit) 0.805*** 0.807*** 0.811***
Math core not attempted 0.211@ 0.208@ 0.222@
Enrolled in major college (ref. Not enrolled in major college) 0.052 0.069
Changed major college (ref. Did not change) 0.512**
Constant 0.804*** 0.775*** 0.717*** 0.557*** -1.136*** -2.535*** -1.641*** -2.127*** -2.357*** -2.371*** -2.347***
Observations 13,927 13,927 13,927 13,927 13,927 13,927 13,927 13,927 13,927 13,922 13,922
Log Likelihood -5,871.363 -5,811.287 -5,806.844 -5,786.106 -5,753.781 -4,914.214 -4,855.030 -4,827.362 -4,774.423 -4,773.121 -4,767.645
Akaike Inf. Crit. 11,746.730 11,628.570 11,621.690 11,582.210 11,519.560 9,842.429 9,726.060 9,674.724 9,572.845 9,572.242 9,563.291
Note: @ p<0.1; * p<0.05; ** p<0.01; *** p<0.001


Feasible Solutions Algorithm

We also utilized a machine learning algorithm to identify variables that optimize models of one-year retention. We used rFSA, an R package for finding the best subsets of predictors. The “algorithm searches a data space for models of user-specified form that are statistically optimal under a measure of model quality,” (p. 295).21 Lambert, J., et al. 2018. “rFSA: An R Package for Finding Best Subsets and Interactions.” The R Journal Vol. 10.Link to paper. In other words, the researcher identifies the number of predictors desired for each model and the algorithm runs thousands of predictor combinations to find a model that optimizes the best fit. In our study, the algorithm adjudicates candidate models using AIC as the measure of model quality. The table below outlines the optimal feasible solutions of candidate models with specifications for 10, 15, and 20 predictors. We used these candidate models to inform our model selection process. The algorithm does not incorporate any a priori knowledge on the process of retention. It runs simply on mathematics. In addition to what the algorithm suggested, our final models also include important predictors we know from prior research and theory to be influential in predicting retention.

Feasible Solutions Algorithm Models (10, 15, 20 predictors)
Dependent variable:
retained
(1) (2) (3)
African American (ref. White) 0.674*** 0.656*** 0.678***
Asian/PI 0.531*** 0.534*** 0.522***
Hispanic 0.260** 0.230* 0.240*
Other 0.259 0.259 0.259
Adjacent counties (ref. Harris County) -0.173* -0.150@ -0.157@
Other Texas counties -0.651*** -0.668*** -0.663***
Out-of-state -0.909*** -0.847*** -0.769***
International -0.346 -0.330 -0.178
Fall GPA -0.657*** -0.768*** -0.625***
Fall hours passed -0.060***
First term good academic standing (ref. Warning) 0.311* 0.317*
Transfer credits at entry 0.009*** 0.008**
High school rank 0-19 (ref. Top 10%) -0.210
20-39% 0.128
40-59% 0.342**
60-79% 0.162
80-89% 0.034
Not ranked 0.105
SAT Score 0.001 0.004 -0.0004
Total DWF -0.408*** -0.397*** -0.420***
English core earned credit (ref. Did not earn credit) 0.172
English core not attempted 0.202
Math core earned credit (ref. Did not earn credit) 0.093 0.208@
Math core not attempted -0.021 0.101
April orientation (ref. May-June) 0.386@
July orientation -0.015
August orientation -0.204@
Unmet need (thousands) -0.003 -0.006 -0.007
Total aid (excluding loans) 1-4,999 (ref. 0) -0.268*
5,000-9,999 0.044
10,000-14,999 0.001
15,000-19,999 0.128
20,000+ -0.355@
Severance of Service Balance Spring (thousands) -0.178*** -0.178*** -0.173***
Lost scholarship (ref. Scholarship retained) -0.687*** -0.643***
No scholarship -0.211@ -0.260@
Not enrolled spring (ref. No major change) -1.862*** -1.388*** -1.276***
Changed major college spring 0.079 0.050 0.024
Cumulative GPA 0.595*** 0.537*** 0.478***
Total hours taken for progress 0.163*** 0.157*** 0.171***
Part-time spring (ref. Full-time) -0.129 -0.074
Not enrolled spring -0.647* -0.648*
Constant -1.824*** -1.675** -1.276*
Observations 12,445 12,444 12,405
Log Likelihood -3,274.270 -3,245.585 -3,216.313
Akaike Inf. Crit. 6,584.541 6,543.170 6,518.626
Note: @ p<0.1; * p<0.05; ** p<0.01; *** p<0.001

Final Model Assessments

We used various statistical assessments to develop our final models. We tested each model for multicollinearity, defined as when a correlation exists between three or more variables in a model even if no pair of variables are highly correlated.22 See this source for more on multicollinearity. This indicates redundancy in the model. We calculated variance inflation factors (VIF) that measure how much variance of a predictor is inflated due to multicollinearity in the model. For example, we found high VIF values for cumulative GPA and total DWF indicating a redundancy in the model. We opted to keep cumulative GPA in our model as the more comprehensive predictor of academic achievement.

We also opted for the most parsimonious models: the simplest models with the greatest predictive power. That means we removed certain predictors that were not statistically significant and did not add any significant predictive power in the model. We did this by comparing model AIC values to evaluate the impact of these predictors on candidate models. However, we did retain some predictors that are theoretically important to an analysis of retention despite not being statistically significant. These include SAT score and high school rank. SAT scores were not statistically significant, but are one of the few measures of high school preparedness that are available. High school rank was statistically significant in the fall model, but not significant in subsequent models. This illustrates how measures of high school preparedness become insignificant once students progress in their post-secondary education.

We calculated goodness-of-fit using McFadden’s Pseudo R2. Unlike linear regression, there is no comparable R2 statistic in logistic regressions that indicates the proportion of variance in retention explained by our individual predictors. McFadden’s Pseudo R2 measures the log-likelihood value of a fitted model and compares that to the log likelihood for a null model with only the intercept as a predictor. Pseudo R2 values range from 0 to 1. Higher values indicate greater predictive power. Models with a pseudo-R2 close to or greater than 0.4 indicate a very good fit (pg 35). Our fits improve from the Fall Model to the Spring Model and the Full-Year Model (see table following next section).

We also evaluated how well our models predicted retention on out-of-sample observations by calculating model classification rates.23 See this link for an overview of model evaluation and diagnostics. We achieved this by partitioning our data set into training and test sets. We split our data by randomly selecting 60% (n=8,434) of our observations for our training set and 40% (n=5,493) for our test set. We estimated our models using the training set and applied these models to predict values on our test data set. The table below shows how well our training models predict retention in the test set:

Model Pseudo-R2 True Positives True Negatives False Positives False Negatives AUC
Fall 0.086 85.0% 0.4% 13.9% 0.7% 67.8%
Spring 0.294 84.2% 5.3% 9.0% 1.5% 81.3%
Full-Year 0.348 84.0% 6.3% 8.0% 1.7% 85.1%

Here is how these values are defined:

  • True positives are students we predicted would be retained and were actually retained.
  • True negatives are students we predicted would not be retained and were actually not retained.
  • False positives are students we predicted would be retained, but they were not actually retained (Type I error).
  • False negatives are students we predicted would not be retained, but they actually were retained (Type II error).

The results show our models have high true positive rates around 84.4%. However, our Fall model has the highest false positive rate of 13.9%. We can visually display the rate at which we correctly predict retention and the rate of incorrectly predicting retention using the receiving operating characteristic (ROC) curve below.24 Click here for a video explaining ROC Curves and AUC. The area under the ROC curve, or AUC, represents how well our models classify students as being retained or not retained. The greater the area below the curve, the better our models are at correctly predicting retention. The figure below shows the classifier performance improves from fall, spring, and full year models.

Model Tables

Fall Model Logistic Regression Table.


Observations: 12,406
Log-Likelihood: -4,775.782
AIC: 9,609.563
Pseudo-R2: 0.086
AUC: 0.678

* p<0.05; ** p<0.01; *** p<0.001

Variable Estimate Odds Ratio Std. Error P-Value Sig. level
Intercept -0.1709 0.8429 0.4057 0.6736
African-American (ref. White) 0.5503 1.7338 0.0980 0.0000 ***
Asian/PI 0.7633 2.1453 0.0864 0.0000 ***
Hispanic 0.1568 1.1698 0.0746 0.0356 *
Other 0.1482 1.1597 0.1295 0.2523
Male (ref. Female) -0.2022 0.8169 0.0546 0.0002 ***
First-Generation (ref. Not First Gen.) -0.1050 0.9003 0.0612 0.0863
Generation unknown -0.1986 0.8199 0.1120 0.0761
Adjacent counties (ref. Harris County) 0.0657 1.0679 0.0688 0.3396
Other Texas counties -0.4637 0.6290 0.0728 0.0000 ***
Out-of-state -0.4759 0.6213 0.1535 0.0019 **
International 0.4977 1.6449 0.2707 0.0660
High school rank 0-19% (ref. Top 10%) -0.7992 0.4497 0.1937 0.0000 ***
20-39% -0.5914 0.5536 0.1318 0.0000 ***
40-59% -0.3959 0.6731 0.0998 0.0001 ***
60-79% -0.3017 0.7396 0.0794 0.0001 ***
80-89% -0.1830 0.8328 0.0799 0.0220 *
Not ranked -0.3030 0.7386 0.0957 0.0016 **
SAT score 0.0001 1.0001 0.0026 0.9675
Test credits at entry 0.0244 1.0247 0.0036 0.0000 ***
Transfer credits at entry 0.0106 1.0107 0.0019 0.0000 ***
April orientation (ref. May-June) 0.5861 1.7970 0.1869 0.0017 **
July orientation -0.2745 0.7600 0.0639 0.0000 ***
August orientation -0.6772 0.5080 0.0832 0.0000 ***
UHin4 (ref. Not UHin4) 0.1482 1.1597 0.0612 0.0154 *
Fall hours taken 0.1444 1.1553 0.0189 0.0000 ***
Total loans (thousands) -0.0115 0.9886 0.0051 0.0245 *
Unmet need (thousands) -0.0161 0.9840 0.0037 0.0000 ***
Deferment fall (ref. No deferment) -0.5128 0.5988 0.0673 0.0000 ***

Spring Model Logistic Regression Table.


Observations: 12,406
Log-Likelihood: -3,689.392
AIC: 7,448.784
Pseudo-R2: 0.294
AUC: 0.813

* p<0.05; ** p<0.01; *** p<0.001

Variable Estimate Odds Ratio Std. Error P-Value Sig. level
Intercept -0.9202 0.3984 0.5424 0.0898
African-American (ref. White) 0.6694 1.9531 0.1150 0.0000 ***
Asian/PI 0.6316 1.8806 0.0988 0.0000 ***
Hispanic 0.2170 1.2423 0.0881 0.0138 *
Other 0.2673 1.3064 0.1556 0.0858
Male (ref. Female) -0.0577 0.9439 0.0640 0.3671
First-Generation (ref. Not First Gen.) 0.0046 1.0046 0.0720 0.9492
Generation unknown -0.1514 0.8595 0.1314 0.2493
Adjacent counties (ref. Harris County) -0.0868 0.9169 0.0804 0.2801
Other Texas counties -0.6176 0.5392 0.0845 0.0000 ***
Out-of-state -0.6305 0.5323 0.1820 0.0005 ***
International 0.0615 1.0634 0.3239 0.8493
High school rank 0-19% (ref. Top 10%) -0.2268 0.7971 0.2289 0.3218
20-39% -0.0518 0.9495 0.1600 0.7460
40-59% 0.1595 1.1729 0.1222 0.1917
60-79% 0.0468 1.0479 0.0945 0.6204
80-89% -0.0102 0.9899 0.0923 0.9122
Not ranked 0.0285 1.0289 0.1139 0.8028
SAT score -0.0023 0.9977 0.0033 0.4855
Test credits at entry 0.0059 1.0059 0.0040 0.1448
Transfer credits at entry 0.0092 1.0092 0.0022 0.0000 ***
April orientation (ref. May-June) 0.6404 1.8972 0.2064 0.0019 **
July orientation -0.1174 0.8892 0.0757 0.1210
August orientation -0.3220 0.7247 0.1020 0.0016 **
UHin4 (ref. Not UHin4) 0.0711 1.0737 0.0722 0.3247
Fall hours taken 0.0838 1.0874 0.0226 0.0002 ***
Total loans (thousands) -0.0079 0.9921 0.0060 0.1868
Unmet need (thousands) -0.0106 0.9895 0.0045 0.0194 *
Deferment fall (ref. No deferment) -0.2325 0.7925 0.0871 0.0076 **
Part-time spring (ref. Full-time) -1.2425 0.2887 0.1057 0.0000 ***
Not enrolled spring -3.7121 0.0244 0.1556 0.0000 ***
Scholarship at risk (ref. Good standing) -0.1800 0.8353 0.1193 0.1314
No scholarship -0.0019 0.9981 0.1077 0.9861
Fall GPA 0.7300 2.0751 0.0351 0.0000 ***
Deferment spring (ref. No deferment) -0.1341 0.8745 0.1118 0.2306

Full-Year Model Logistic Regression Table.


Observations: 12,406
Log-Likelihood: -3,404.767
AIC: 6,881.534
Pseudo-R2: 0.348
AUC: 0.851

* p<0.05; ** p<0.01; *** p<0.001

Variable Estimate Odds Ratio Std. Error P-Value Sig. level
Intercept -3.6432 0.0262 0.5389 0.0000 ***
African-American (ref. White) 0.6877 1.9891 0.1200 0.0000 ***
Asian/PI 0.5323 1.7028 0.1031 0.0000 ***
Hispanic 0.2870 1.3324 0.0925 0.0019 **
Other 0.2984 1.3477 0.1633 0.0676
Male (ref. Female) 0.0226 1.0229 0.0669 0.7361
First-Generation (ref. Not First Gen.) 0.0685 1.0709 0.0752 0.3621
Generation unknown -0.0625 0.9394 0.1369 0.6479
Adjacent counties (ref. Harris County) -0.1427 0.8670 0.0837 0.0880
Other Texas counties -0.6590 0.5174 0.0879 0.0000 ***
Out-of-state -0.8366 0.4332 0.1947 0.0000 ***
International -0.2108 0.8099 0.3325 0.5260
High school rank 0-19% (ref. Top 10%) -0.1151 0.8913 0.2422 0.6345
20-39% 0.1039 1.1095 0.1671 0.5341
40-59% 0.3376 1.4016 0.1276 0.0081 **
60-79% 0.1718 1.1874 0.0984 0.0807
80-89% 0.0539 1.0554 0.0960 0.5745
Not ranked 0.1455 1.1566 0.1190 0.2214
SAT score -0.0034 0.9966 0.0035 0.3260
Test credits at entry 0.0000 1.0000 0.0042 0.9909
Transfer credits at entry 0.0082 1.0082 0.0023 0.0003 ***
April orientation (ref. May-June) 0.3995 1.4911 0.2179 0.0668
July orientation -0.0230 0.9773 0.0788 0.7706
August orientation -0.1784 0.8366 0.1054 0.0906
UHin4 (ref. Not UHin4) -0.0866 0.9170 0.0723 0.2312
Total loans (thousands) -0.0022 0.9978 0.0063 0.7241
Unmet need (thousands) -0.0065 0.9935 0.0047 0.1715
Deferment fall (ref. No deferment) -0.1323 0.8761 0.0903 0.1427
Part-time spring (ref. Full-time) -0.2255 0.7981 0.1296 0.0817
Not enrolled spring -1.7203 0.1790 0.2136 0.0000 ***
Deferment spring (ref. No deferment) -0.1589 0.8531 0.1157 0.1697
Cumulative hours taken (full year) 0.1286 1.1372 0.0097 0.0000 ***
Cumulative GPA (full year) 0.8866 2.4269 0.0387 0.0000 ***
Change college major (ref. No change) 0.0066 1.0066 0.1795 0.9705
Lost scholarship (ref. Good standing) -0.7361 0.4790 0.1290 0.0000 ***
No scholarship -0.2801 0.7557 0.1257 0.0258 *


All Models (Fall, Spring, Full Year)
Dependent variable:
retained
Fall Spring Full-Year
(1) (2) (3)
African American (ref. White) 0.550*** 0.669*** 0.688***
Asian/PI 0.763*** 0.632*** 0.532***
Hispanic 0.157* 0.217* 0.287**
Other 0.148 0.267@ 0.298@
Male (ref. female) -0.202*** -0.058 0.023
First Generation (ref. Not first gen.) -0.105@ 0.005 0.068
Generation unknown -0.199@ -0.151 -0.063
Adjacent counties (ref. Harris County) 0.066 -0.087 -0.143@
Other Texas counties -0.464*** -0.618*** -0.659***
Out-of-state -0.476** -0.631*** -0.837***
International 0.498@ 0.062 -0.211
High school rank 0-19 (ref. Top 10%) -0.799*** -0.227 -0.115
20-39% -0.591*** -0.052 0.104
40-59% -0.396*** 0.160 0.338**
60-79% -0.302*** 0.047 0.172@
80-89% -0.183* -0.010 0.054
Not ranked -0.303** 0.028 0.146
SAT score 0.0001 -0.002 -0.003
Test credits at entry 0.024*** 0.006 -0.00005
Transfer credits at entry 0.011*** 0.009*** 0.008***
April orientation (ref. May-June) 0.586** 0.640** 0.400@
July orientation -0.275*** -0.117 -0.023
August orientation -0.677*** -0.322** -0.178@
UHin4 (ref. Not UHin4) 0.148* 0.071 -0.087
Fall credit hours 0.144*** 0.084***
Total loans (thousands) -0.012* -0.008 -0.002
Unmet need (thousands) -0.016*** -0.011* -0.006
Fall deferment (ref. No deferment) -0.513*** -0.232** -0.132
Part-time spring (ref. Full-time) -1.242*** -0.226@
Not enrolled spring -3.712*** -1.720***
Scholarship at risk (ref. Good standing) -0.180
No scholarship -0.002
Fall GPA 0.730***
Deferment spring (ref. No deferment) -0.134 -0.159
Cumulative hours (full year) 0.129***
Cumulative GPA (full year) 0.887***
College major change 0.007
Lost scholarship (ref. Scholarship retained) -0.736***
No scholarship -0.280*
Constant -0.171 -0.920@ -3.643***
Observations 12,406 12,406 12,406
Log Likelihood -4,775.782 -3,689.392 -3,404.767
Akaike Inf. Crit. 9,609.563 7,448.784 6,881.534
Note: @ p<0.1; * p<0.05; ** p<0.01; *** p<0.001

Variable Definitions

Click here for PDF version.

Category Variable Definition Categories
ADMISSIONS High School Rank High school rank percentile 0-19; 20-39; 40-59; 60-79; 80-89; Top 10%; Not ranked
Test Credit Total test credits at entry Continuous (test credits)
Transfer Credit Total transfer credits at entry Continuous (transfer credits)
SAT/ACT Score SAT score or concordant ACT score Continuous (score)
Application Date Month in which application was submitted August-December, January-March, April, May-July
Orientation date Month in which orientation was attended April, May-June, July, August
First Choice College Enrolled in first choice college on application Yes, No
ACADEMIC Hours Taken Credit hours taken for progress Continuous (credit hours)
Hours Passed Credit hours passed for progress Continuous (credit hours)
Term GPA Current term GPA Continuous (grade points)
Cumulative GPA Cumulative GPA Continuous (grade points)
Academic Standing Academic Standing Academic Probation, Academic Warning, Good Standing
D/W/F Grades Number of courses earning D/W/F grades Continuous (number of courses)
Full-time/Part-time Enrollment level for spring term Full-time, Part-time, Not Enrolled
STEM Major Enrolled in STEM major STEM major, Non-STEM major
College Change Change of major to new college Change, No change
Math Core Credit Completion of Math core course Earned credit, Did not earn credit, Did not attempt
English Core Credit Completion of English core course Earned credit, Did not earn credit, Did not attempt
First Year Math Level Level of the first math course taken at UH Developmental (MATH 1100-1300), Introductory-Intermediate (MATH 1310-1330), Advanced-Upper (MATH 1431+)
Core 1101 Enrolled in Core 1101 course Core 1101, Non-Core 1101
DEMOGRAPHIC Gender Gender of student Male, Female
Race/Ethnicity Race/ethnicity of student African American, Asian/Pacific Islander, Hispanic, White, Other/Unknown
First Generation First generation college student First generation, Not first generation, Unknown
UHin4 Participant in UHin4 UHin4, Not UHin4
County of Residence Harris County, Adjacent Counties, Other Texas Counties, Out-of-State, International
FINANCIAL Estimated Family Contribution (EFC) Estimated family contribution Continuous (dollar amount)
Unmet Financial Need Total amount of unmet need $0; $1-2,499; $2,500-4,999; $5,000-7,499; $7,500-9,999; $10,000-12,499; $12,500-14,999; $15,000-17,499; $17,500-19,999; $20,000-29,999; $30,000+; Unknown COA
Scholarship Size Dollar value of scholarship Small scholarship, Large scholarship, No Scholarship
Scholarship Lost Scholarship lost or at risk of being lost Lost scholarship, Retained scholarship, No scholarship
Financial Aid Amount Total amount of financial aid package (excludes loans) Continuous (dollar amount)
Loan Amount Total amount of loans from any source Continuous (dollar amount)
FAFSA Verification Selection for FAFSA verification Not selected, Selected and completed, Selected and not completed
Independent FAFSA Student filed FAFSA as an independent Yes, No
Pell Eligibility Eligible for Pell grant per FAFSA Pell eligible, Not Pell eligible, Unknown
Financial Delinquency Amount of past due balance at severance of service $0; $1-999; $1,000-1,999; $2,000-2,999; $3,000-3,999; $4,000-4,999; $5,000-5,999; $6,000-9,999; $10,000+
Payment Plan Student selected a deferred payment plan Yes, No