March 2, 2020

(Click right of slide or use arrow keys)

Introduction

  • Strategic plan for the UH Centennial 2027: UH100 Dare to Dream
  • How are we performing and how can we improve?
“Our University is too important on so many levels to be left without a strong vision for the next eight years. We are light years ahead of where we were eight years ago, and in another eight years, we will be equally advanced from where we are today.”
- Provost Paula Myrick Short

Why is benchmarking important?

  • Benchmarking is the process where policymakers compare the performance, practices, and policies of institutions or groups of institutions to gain insight (Betsinger et al, 2013)
“An institution that does not routinely evaluate all aspects of the organization and make the changes necessary to address its shortcomings, from the curriculum to the physical plant, is jeopardizing its future.”

Barbara Bender
Rutgers University

Why is benchmarking important?

  • Answers questions like “What makes an institution highly ranked?”
“Through analyzing the best practices of peer institutions, then adapting and developing programs for their own campuses, higher education leaders can improve the quality of programs and services that they provide,” (Benchmarking 2002: P 119).

Why is benchmarking important?

  • A tool to overcome resistance to change
“Benchmarking can help overcome resistance to change that can be very strong in conservative organizations, such as colleges and universities, that have had little operational change in many years,” (Alstete 1995: P 25).
“Using the appropriate institutions and programs as models, especially schools with higher reputational ratings, can generate an enthusiasm for change that can transform an institution,” (Benchmarking 2002: P 116).

Benchmarking as an evidence-based tool

  1. to encourage policymakers to learn to think like scientists, and
  2. to learn how to solve problems primarily with reference to the evidence generated by professional, scientific, and technical methods of inquiry (Witting 2017: P 2)

How to identify peers

  • Texas accountability peers (Emerging research)
  • Nationally recognized peers (USNWR)
  • Public/Private
  • Financial peers (faculty salary, tuition)

Types of peers

  • Comparable: Similar institutional level (two or four-year), control (public, private) and enrollment characteristics (size, race)
  • Aspirational: Similar yet significantly different in several key performance indicators (graduation rate, research expenditures)
  • Competitors: Similar yet more students choose to attend a competing institution
  • Consortium: Belonging to a specific organization (AAU, Power 5, College Completion Summit)

Data

  • 2018 Integrated Postsecondary Education Data System (IPEDS)
  • Filter variables
    • Sector: Public, 4-year or above
    • Carnegie Classification 2018 Basic: Doctoral Universities - Very High Research Activity
    • Size Category: 20,000 and above
  • Final Sample: 85 institutions where row=observation column=variable

Data

Cluster Variables

  • Percent admitted
  • Admissions yield
  • SAT/ACT 25th & 75th percentiles
  • Undergraduate & graduate enrollment
  • Percent Race/Ethnicity
  • Percent Women
  • FTE for last academic year, 2017-18
  • Percent full-time, first-time awarded Pell
  • Student-to-Faculty Ratio
  • All academic rank faculty
  • Six-year graduation rate
  • Total degrees awarded
  • Core revenues and expenses
  • Instruction, research, and student service expenses as percent of core
  • Endowment assets
  • In-state, out-of-state tuition
  • Total price of attendance in-state on campus

Cluster analysis

  • Data Preparation
  • Calculate difference/distance between institutions by their characteristics
  • Cluster so that difference is minimized within clusters and maximized between clusters
  • No response variable = unsupervized method
  • Tool: R

Step 1: Prepare Data

1A - Remove or estimate missing data

  • Handle missing data by removing or estimating them
  • 9 of the 85 institutions contain at least one missing value (11%)
  • Of the 2,890 data points, 50 data points were imputed (1.7%)
  • Use knnImputation from the DMwR package to identify k-closest observations based on euclidian distance and computes weighted averages
  • Remove missing peer_data <- na.omit(peer_data)

Step 1: Prepare Data

1B - Standardize data

  • Remove influence of different scales: $, %, N
  • Transform data so that each variable has \(\mu\)=0 and \(\sigma\)=1
  • Use scaled <- scale(peer_data)

Step 1: Prepare Data

1B - Standardize data

head(peer_data)
## # A tibble: 6 x 45
##   tuition_in_state tuition_out_sta… pct_native pct_black pct_hispanic pct_white
##              <dbl>            <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1             8568            19704          0        22            3        59
## 2            10780            29230          0        11            5        76
## 3             9624            28872          0         6            3        76
## 4            10104            27618          1         3           19        48
## 5            10467            31688          1         4           24        49
## 6             7384            23422          1         4            8        74
## # … with 39 more variables: pct_two_more <dbl>, pct_nonresident <dbl>,
## #   pct_asian_pi <dbl>, pct_women <dbl>, stu_fac_ratio <dbl>,
## #   pct_admitted <dbl>, yield <dbl>, price_instate_oncampus <dbl>,
## #   enrollment_undg <dbl>, enrollment_grad <dbl>, degree_ba <dbl>,
## #   degree_ma <dbl>, degree_phd_research <dbl>, degree_phd_prof <dbl>,
## #   degree_phd_other <dbl>, six_year_grad <dbl>, pct_pell <dbl>,
## #   revenue_core <dbl>, revenue_pct_tuition <dbl>, expense_core <dbl>,
## #   expense_instructional <dbl>, expense_research <dbl>,
## #   expense_student_service <dbl>, endowment <dbl>, sat_read25 <dbl>,
## #   sat_read75 <dbl>, sat_math25 <dbl>, sat_math75 <dbl>, act_comp25 <dbl>,
## #   act_comp75 <dbl>, fte1718 <dbl>, faculty_total <dbl>,
## #   degree_phd_total <dbl>, degree_total <dbl>, cluster2 <dbl>, cluster5 <dbl>,
## #   us_rank2020 <dbl>, aau_member <dbl>, aau_year <dbl>

Step 1: Prepare Data

1B - Standardize data

scaled <-scale(peer_data)
head(as_tibble(scaled))
## # A tibble: 6 x 45
##   tuition_in_state tuition_out_sta… pct_native pct_black pct_hispanic pct_white
##              <dbl>            <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1          -0.398           -1.20       -0.280     2.85        -0.809     0.291
## 2           0.339            0.0767     -0.280     0.803       -0.651     1.26 
## 3          -0.0461           0.0288     -0.280    -0.127       -0.809     1.26 
## 4           0.114           -0.139       0.910    -0.685        0.459    -0.334
## 5           0.235            0.406       0.910    -0.499        0.855    -0.277
## 6          -0.793           -0.701       0.910    -0.499       -0.413     1.14 
## # … with 39 more variables: pct_two_more <dbl>, pct_nonresident <dbl>,
## #   pct_asian_pi <dbl>, pct_women <dbl>, stu_fac_ratio <dbl>,
## #   pct_admitted <dbl>, yield <dbl>, price_instate_oncampus <dbl>,
## #   enrollment_undg <dbl>, enrollment_grad <dbl>, degree_ba <dbl>,
## #   degree_ma <dbl>, degree_phd_research <dbl>, degree_phd_prof <dbl>,
## #   degree_phd_other <dbl>, six_year_grad <dbl>, pct_pell <dbl>,
## #   revenue_core <dbl>, revenue_pct_tuition <dbl>, expense_core <dbl>,
## #   expense_instructional <dbl>, expense_research <dbl>,
## #   expense_student_service <dbl>, endowment <dbl>, sat_read25 <dbl>,
## #   sat_read75 <dbl>, sat_math25 <dbl>, sat_math75 <dbl>, act_comp25 <dbl>,
## #   act_comp75 <dbl>, fte1718 <dbl>, faculty_total <dbl>,
## #   degree_phd_total <dbl>, degree_total <dbl>, cluster2 <dbl>, cluster5 <dbl>,
## #   us_rank2020 <dbl>, aau_member <dbl>, aau_year <dbl>

Step 2: Calculate distance

Select a distance measure

  • Many measures: Euclidean, Manhattan, Perason, Spearman, or Kendall correlation distances
  • Euclidean Distance (n-dimensions):

    \(d_{euc}(x,y)=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}\tag{1}\)

Step 2: Calculate distance

Euclidean distance 1-dimension

\(d_{euc}(x)=(x_1-x_2)\tag{1a}\)

Step 2: Calculate distance

Euclidean distance 2-dimensions (n=2)

\(d_{euc}(x,y)=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}\tag{1b}\)

Step 2: Calculate distance

Euclidean distance 3-dimensions (n=3 )

\(d_{euc}(x,y,z)=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2+(z_2-z_1)^2}\tag{1c}\)

Step 2: Calculate distance

Euclidean distance n-dimensions

\(d_{euc}(x,y)=\sqrt{\sum_{i=1}^n (x_i-y_i)^2}\tag{1}\)

Step 2: Calculate distance

Euclidean distance matrix

library(factoextra)
distance <- get_dist(sample25) # computes distance matrix
fviz_dist(distance, gradient=list(low="#00AFBB", mid="white", high="#FC4E07"))

Step 2: Calculate distance

Euclidean distance matrix

Step 3: Cluster Institutions

K-Means Clustering

  • Partition data set into k-clusters
  • High intra-cluster similiarity and low inter-cluster similarity
  • Each cluster contains a centroid (mean of distances in cluster)
  • Hartigan-Wong algorithm (1979):
    \[W(C_k)=\sum_{x_i \in C_k}(x_i- \mu_k)^2\tag{2}\] where:
    • \(x_i\) is an institution belonging to Cluster \(C_k\)
    • \(\mu_k\) is the mean value of the points in cluster \(C_k\)

Step 3: Cluster Institutions

K-Means Clustering: Total within-cluster sum of squres

\[tot.withinnes=\sum^k_{k=1}{W(C_k)}=\sum^k_{k=1}\sum_{x_i \in C_k}(x_i- \mu_k)^2\tag{3}\]

Step 3: Cluster Institutions

K-Means Clustering

  • Specify number of clusters k
  • Randomly select k institutions from data to initialize centers
  • Assign each institution to their closest centroid
  • Update cluster centroid by calculating new mean values of the data in cluster
  • Iterate assignment and updates until clusters stop changing (or smallest total withinness is met, Eq. 3)

Step 3: Cluster Institutions

K-Means Clustering

Step 3: Cluster Institutions

K-Means Clustering