Using FAFSA Data to Define Competitor Density

[This article was first published on Data Twirling » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been thinking a lot about how to define and discuss competition at the undergraduate level.   I will save the chat on which dataset is better (ASQ, Student Clearinghouse, social media, etc.) for another day.

One common question I get as an analyst in Enrollment Management is how to “define” competition. While it’s never an easy question, from a marketing perspective we often have to subset competition into a few levels: core, secondary, aspirant, regional, etc. Even before this, though, I believe it is critically important to understand “Competitor Density.”

Using a statistical lens, Competitor Density is rather straight forward. Simply, it is the cumulative density of students covered by “N” schools.  For illustration, refer to the chart below, which is filtered on domestic + admitted students over the last 3 applicant pools.

The plot above reveals two very interesting facts:

  1. A small set of competitors represent a large share of the “core” competition.  While the plot above assumes that a student was admitted at every institution they listed on the form, this basic assumption allows us to broadly define the consideration set for an applicant.
  2. After appending on other information from our student information systems (aggregated), we can start to answer some pretty complex questions about how students finalize their list of schools to which they eventually apply.
In a future post, I intend to highlight how analysts in highered can manipulate FAFSA data using association rules and network theory.


In the interim, I will leave you with some basic stats on the plot above.  If you stumble across this post and you work in highered, feel free to comment and post comparable stats.  I would love to see how these data vary across different institutions.

Please remember that the “host” institution was removed.  Only competitor schools were included in the plot.

  • 652 distinct schools were included over 3 applicant terms (fall only)
  • Top 2 schools = 10.6% of all admitted students
  • Top 10 schools = 34.3%
  • Top 25 schools = 52.1%
  • Top 50 schools = 67%
  • Top 75 schools = 76%
  • Top 100 schools = 82%
  • Top 228 schools = 95%

Stepping back, 72 schools account for 75% of the competition.  That’s a pretty “easy” way to define a set of schools considering that there are over 3,000 highered institutions listed in IPEDS.



To leave a comment for the author, please follow the link and comment on their blog: Data Twirling » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)