March 2, 2020

(Click right of slide or use arrow keys)

Introduction

  • Strategic plan for the UH Centennial 2027: UH100 Dare to Dream
  • How are we performing and how can we improve?
“Our University is too important on so many levels to be left without a strong vision for the next eight years. We are light years ahead of where we were eight years ago, and in another eight years, we will be equally advanced from where we are today.”
- Provost Paula Myrick Short

Why is benchmarking important?

  • Benchmarking is the process where policymakers compare the performance, practices, and policies of institutions or groups of institutions to gain insight (Betsinger et al, 2013)
“An institution that does not routinely evaluate all aspects of the organization and make the changes necessary to address its shortcomings, from the curriculum to the physical plant, is jeopardizing its future.”

Barbara Bender
Rutgers University

Why is benchmarking important?

  • Answers questions like “What makes an institution highly ranked?”
“Through analyzing the best practices of peer institutions, then adapting and developing programs for their own campuses, higher education leaders can improve the quality of programs and services that they provide,” (Benchmarking 2002: P 119).

Why is benchmarking important?

  • A tool to overcome resistance to change
“Benchmarking can help overcome resistance to change that can be very strong in conservative organizations, such as colleges and universities, that have had little operational change in many years,” (Alstete 1995: P 25).
“Using the appropriate institutions and programs as models, especially schools with higher reputational ratings, can generate an enthusiasm for change that can transform an institution,” (Benchmarking 2002: P 116).

Benchmarking as an evidence-based tool

  1. to encourage policymakers to learn to think like scientists, and
  2. to learn how to solve problems primarily with reference to the evidence generated by professional, scientific, and technical methods of inquiry (Witting 2017: P 2)

How to identify peers

  • Texas accountability peers (Emerging research)
  • Nationally recognized peers (USNWR)
  • Public/Private
  • Financial peers (faculty salary, tuition)

Types of peers

  • Comparable: Similar institutional level (two or four-year), control (public, private) and enrollment characteristics (size, race)
  • Aspirational: Similar yet significantly different in several key performance indicators (graduation rate, research expenditures)
  • Competitors: Similar yet more students choose to attend a competing institution
  • Consortium: Belonging to a specific organization (AAU, Power 5, College Completion Summit)

Data

  • 2018 Integrated Postsecondary Education Data System (IPEDS)
  • Filter variables
    • Sector: Public, 4-year or above
    • Carnegie Classification 2018 Basic: Doctoral Universities - Very High Research Activity
    • Size Category: 20,000 and above
  • Final Sample: 85 institutions where row=observation column=variable

Data

Cluster Variables

  • Percent admitted
  • Admissions yield
  • SAT/ACT 25th & 75th percentiles
  • Undergraduate & graduate enrollment
  • Percent Race/Ethnicity
  • Percent Women
  • FTE for last academic year, 2017-18
  • Percent full-time, first-time awarded Pell
  • Student-to-Faculty Ratio
  • All academic rank faculty
  • Six-year graduation rate
  • Total degrees awarded
  • Core revenues and expenses
  • Instruction, research, and student service expenses as percent of core
  • Endowment assets
  • In-state, out-of-state tuition
  • Total price of attendance in-state on campus

Cluster analysis

  • Data Preparation
  • Calculate difference/distance between institutions by their characteristics
  • Cluster so that difference is minimized within clusters and maximized between clusters
  • No response variable = unsupervized method
  • Tool: R

Step 1: Prepare Data

1A - Remove or estimate missing data

  • Handle missing data by removing or estimating them
  • 9 of the 85 institutions contain at least one missing value (11%)
  • Of the 2,890 data points, 50 data points were imputed (1.7%)
  • Use knnImputation from the DMwR package to identify k-closest observations based on euclidian distance and computes weighted averages
  • Remove missing peer_data <- na.omit(peer_data)

Step 1: Prepare Data

1B - Standardize data

  • Remove influence of different scales: $, %, N
  • Transform data so that each variable has \(\mu\)=0 and \(\sigma\)=1
  • Use scaled <- scale(peer_data)

Step 1: Prepare Data

1B - Standardize data

head(peer_data)
## # A tibble: 6 x 45
##   tuition_in_state tuition_out_sta… pct_native pct_black pct_hispanic pct_white
##              <dbl>            <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1             8568            19704          0        22            3        59
## 2            10780            29230          0        11            5        76
## 3             9624            28872          0         6            3        76
## 4            10104            27618          1         3           19        48
## 5            10467            31688          1         4           24        49
## 6             7384            23422          1         4            8        74
## # … with 39 more variables: pct_two_more <dbl>, pct_nonresident <dbl>,
## #   pct_asian_pi <dbl>, pct_women <dbl>, stu_fac_ratio <dbl>,
## #   pct_admitted <dbl>, yield <dbl>, price_instate_oncampus <dbl>,
## #   enrollment_undg <dbl>, enrollment_grad <dbl>, degree_ba <dbl>,
## #   degree_ma <dbl>, degree_phd_research <dbl>, degree_phd_prof <dbl>,
## #   degree_phd_other <dbl>, six_year_grad <dbl>, pct_pell <dbl>,
## #   revenue_core <dbl>, revenue_pct_tuition <dbl>, expense_core <dbl>,
## #   expense_instructional <dbl>, expense_research <dbl>,
## #   expense_student_service <dbl>, endowment <dbl>, sat_read25 <dbl>,
## #   sat_read75 <dbl>, sat_math25 <dbl>, sat_math75 <dbl>, act_comp25 <dbl>,
## #   act_comp75 <dbl>, fte1718 <dbl>, faculty_total <dbl>,
## #   degree_phd_total <dbl>, degree_total <dbl>, cluster2 <dbl>, cluster5 <dbl>,
## #   us_rank2020 <dbl>, aau_member <dbl>, aau_year <dbl>

Step 1: Prepare Data

1B - Standardize data

scaled <-scale(peer_data)
head(as_tibble(scaled))
## # A tibble: 6 x 45
##   tuition_in_state tuition_out_sta… pct_native pct_black pct_hispanic pct_white
##              <dbl>            <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1          -0.398           -1.20       -0.280     2.85        -0.809     0.291
## 2           0.339            0.0767     -0.280     0.803       -0.651     1.26 
## 3          -0.0461           0.0288     -0.280    -0.127       -0.809     1.26 
## 4           0.114           -0.139       0.910    -0.685        0.459    -0.334
## 5           0.235            0.406       0.910    -0.499        0.855    -0.277
## 6          -0.793           -0.701       0.910    -0.499       -0.413     1.14 
## # … with 39 more variables: pct_two_more <dbl>, pct_nonresident <dbl>,
## #   pct_asian_pi <dbl>, pct_women <dbl>, stu_fac_ratio <dbl>,
## #   pct_admitted <dbl>, yield <dbl>, price_instate_oncampus <dbl>,
## #   enrollment_undg <dbl>, enrollment_grad <dbl>, degree_ba <dbl>,
## #   degree_ma <dbl>, degree_phd_research <dbl>, degree_phd_prof <dbl>,
## #   degree_phd_other <dbl>, six_year_grad <dbl>, pct_pell <dbl>,
## #   revenue_core <dbl>, revenue_pct_tuition <dbl>, expense_core <dbl>,
## #   expense_instructional <dbl>, expense_research <dbl>,
## #   expense_student_service <dbl>, endowment <dbl>, sat_read25 <dbl>,
## #   sat_read75 <dbl>, sat_math25 <dbl>, sat_math75 <dbl>, act_comp25 <dbl>,
## #   act_comp75 <dbl>, fte1718 <dbl>, faculty_total <dbl>,
## #   degree_phd_total <dbl>, degree_total <dbl>, cluster2 <dbl>, cluster5 <dbl>,
## #   us_rank2020 <dbl>, aau_member <dbl>, aau_year <dbl>

Step 2: Calculate distance

Select a distance measure

  • Many measures: Euclidean, Manhattan, Perason, Spearman, or Kendall correlation distances
  • Euclidean Distance (n-dimensions):

    \(d_{euc}(x,y)=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}\tag{1}\)

Step 2: Calculate distance

Euclidean distance 1-dimension

\(d_{euc}(x)=(x_1-x_2)\tag{1a}\)

Step 2: Calculate distance

Euclidean distance 2-dimensions (n=2)

\(d_{euc}(x,y)=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}\tag{1b}\)

Step 2: Calculate distance

Euclidean distance 3-dimensions (n=3 )

\(d_{euc}(x,y,z)=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2+(z_2-z_1)^2}\tag{1c}\)

Step 2: Calculate distance

Euclidean distance n-dimensions

\(d_{euc}(x,y)=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}\tag{1}\)

Step 2: Calculate distance

Euclidean distance matrix

library(factoextra)
distance <- get_dist(sample25) # computes distance matrix
fviz_dist(distance, gradient=list(low="#00AFBB", mid="white", high="#FC4E07"))

Step 2: Calculate distance

Euclidean distance matrix

Step 3: Cluster Institutions

K-Means Clustering

  • Partition data set into k-clusters
  • High intra-cluster similiarity and low inter-cluster similarity
  • Each cluster contains a centroid (mean of distances in cluster)
  • Hartigan-Wong algorithm (1979):
    \[W(C_k)=\sum_{x_i \in C_k}(x_i- \mu_k)^2\tag{2}\] where:
    • \(x_i\) is an institution belonging to Cluster \(C_k\)
    • \(\mu_k\) is the mean value of the points in cluster \(C_k\)

Step 3: Cluster Institutions

K-Means Clustering: Total within-cluster sum of squres

\[tot.withinnes=\sum^k_{k=1}{W(C_k)}=\sum^k_{k=1}\sum_{x_i \in C_k}(x_i- \mu_k)^2\tag{3}\]

Step 3: Cluster Institutions

K-Means Clustering

  • Specify number of clusters k
  • Randomly select k institutions from data to initialize centers
  • Assign each institution to their closest centroid
  • Update cluster centroid by calculating new mean values of the data in cluster
  • Iterate assignment and updates until clusters stop changing (or smallest total withinness is met, Eq. 3)

Step 3: Cluster Institutions

K-Means Clustering

Step 3: Cluster Institutions

K-Means Clustering

Step 3: Cluster Institutions

K-Means Clustering: k=3

Step 3: Cluster Institutions

K-Means Clustering: k=3

Step 3: Cluster Institutions

K-Means Clustering: k=3

Step 3: Cluster Institutions

K-Means Clustering: k=3

Step 3: Cluster Institutions

K-Means Clustering: k=3 (solution @ 4 iterations)

K-means clustering

k2 <- kmeans(scaled, centers=2, nstart = 25) # performs clustering on matrix
fviz_cluster(k2, geom="point", data=scaled)

K-means clustering

K-means clustering

K-means clustering

Selecting optimal clusters

1. Elbow Method

  • Define clusters such that total intra-cluster variation is minimized: \[minimize\Bigg(\sum^k_{k=1}{W(C_k)}\Bigg)\tag{3a}\]
  • Total within-cluster sum of squares (wss) measures compactness of clustering

Selecting optimal clusters

1. Elbow Method

  • Steps:
    • Run clustering algorithm for different values of k
    • For each k, calculate total within-cluster sum of squares (wss)
    • Plot curve for each value of k
    • Bend of plot indicates optimal number of k

Selecting optimal clusters

1. Elbow Method

set.seed(123)
fviz_nbclust(scaled, kmeans, method="wss")

Selecting optimal clusters

2. Average Silhouette Method

  • Measures quality of clustering
  • How close each point in one cluster is to points in the neighboring clusters
  • How far away is each cluster
  • Range of -1 to 1 where:
    • +1 is far away from neighboring cluster
    • 0 on the fence
    • -1 assigned to wrong clusters

Selecting optimal clusters

2. Average Silhouette Method

fviz_nbclust(scaled, kmeans, method="silhouette")






fviz_cluster(k2, geom="point", data=scaled)

Selected K=5 Cluster

  • Mixed support from Elbow Method (k=5 or 6)
  • Second best option by Silhouette Method (k=2 is best)
  • k=5 shows clear separation of UH cluster (2) from neighbors
  • WSS for UH cluster smallest in k=5 clustering
    • k=2 (1,349.7)
    • k=3 (565.3)
    • k=4 (468.7)
    • k=5 (276.1)

Cluster 2 Summary

Cluster 2 Summary

Aspirational Peers

Similar yet significantly different in several key performance indicators

  • Similar demographic and input characteristics
    • Percent admitted
    • Admissions yield
    • SAT Reading and Math 25th percentile scores
    • Undergrduate enrollment
    • Race/ethnicity
    • Percent women
    • Percent award Pell
    • FTE 2017-18
    • All academic rank faculty
    • In-state and out-of-state tuition

  • Exceptional output
    • Six-year graduation rate
    • Endowment
    • % Expenses as Research
    • Graduate enrollment
  • Which cluster institutions out-perform UH (+1 s.d.)?

Aspirational Peers

Aspirational Peers

University of Houston Aspirational Peers

Conclusion

  • Why benchmarking is important
  • Identified and transformed data set for analysis
  • Unbiased, replicable way of identifying institutional peers using K-means clustering
  • Comparable Peers
    • Florida International University
    • Georgia State University
    • University of Central Florida
    • University of Illinois at Chicago
    • University of Nevada - Las Vegas
    • University of North Texas
    • University of South Florida
    • University of Texas at Arlington
    • University of Texas at El Paso

  • Aspirational Peers
    • University of California - Davis
    • University of California - Irvine
    • University of California - Riverside
    • University of California - San Diego
    • University of California - Santa Barbara
    • University of Texas at Dallas
    • Stony Brook University

Website: jxmartinez.com

Resources

Alstete, Jeffrey W. 1995. “Benchmarking in Higher Education: Adapting Best Practices to Improve Quality.” ASHE-ERIC Higher Educaiton Report No. 5. Washington D.C.: The George Washington University Graduate School of Education and Human Development.

Andrew, Luna. 2018. “Selecting Peer Institutions Using Cluster Analysis.” Austin Peay State University.

Bender, Barbara E. 2002. “Chapter 8: Benchmarking as an Administrative Tool for Institutional Leaders.” New Directions for Higher Education 188: 113-120.

Betsinger, Alicia et al. 2013. “Peer Selection: Methodology and Models.” Texas Association for Institutional Research 35th Annual Conference. February, 2013.

Boehmke, Bradley. 2017. “UC Business Analytics R Programming Guide: k-Means Cluster Analysis.” University of Cincinnati.

Lang, Daniel W. and Qiang Zha. 2004. “Comparing Universities: A Case Study between Canada and China.” Higher Education Policy 17(4).

Shueler, Brian. 2016. “University of Wyoming Peer Institutions.” University of Wyoming.

Witting, Antje. 2017. “Insights from ‘Policy Learning’ on How to Enhance the Use of Evidence by Policymakers.” Palgrave Communications(49)3: 1-9.