Web Scraping the NFL Draft

Posted on February 21, 2017 by Trenton Jerde in R bloggers | 0 Comments

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

The National Football League (NFL) is big business. How big? The average value of each of the 32 teams is $3.2 billion.

Clearly, Americans love pro football. Indeed, the author of this blog is a football fanatic. Don’t believe me? See below.

Figure 1. The author preparing to attend a Minnesota Vikings game years ago. The cane was for style.

NFL teams are based in cities across the country, such as the Miami Dolphins, Carolina Panthers, and Pittsburgh Steelers.

NFL players come in different sizes and mentalities. Let’s take a look at the different positions, which will set up our discussion of the draft below.

Anatomy of an NFL Team

Each NFL team has 53 players, mainly comprising the offensive and defensive positions. The main positions are given below.

Offense: Quarterback (QB), running back (FB, TB), tight end (TE), wide receiver (WR), guard (OG), tackle (OT), Center (C).
- Goal: Move the ball and score points against the other team’s defense.

Defense: End (DE), tackle (DL), linebacker (LB), cornerback (CB), safety (S).
- Goal: stop the other team’s offense from moving the ball and scoring points.

The figure below shows how the offensive and defensive players line up against each other.

Figure 2: The basic football positions on offense (red) and defense (blue).

The players at each position have specific physical and mental attributes. For example, some players are huge:

Offense: Linemen (OT, OG, C) block the defensive players and protect the QB.
Defense: Linemen (DE, DL) crash into the offensive players and smash the QB and running backs.

Figure 3. Green Bay Packers offensive Lineman Daryn Colledge (#73) is a big dude.

Some players are fast:

Offense: Wide receivers (WR) sprint down the field and catch the ball thrown by the QB.
Defense: Cornerbacks (CB) guard the wide receivers and try to prevent them from catching the ball.

Figure 4. Arizona Cardinels wide receiver Larry Fitzgerald (#11) is tall, fast, and has tremendous eye-hand coordination.

Some players are strong, athletic, and fast:

Offense: Running backs (TB, FB) take the ball from the QB and run with it.
Defense: Linebackers (LB) attack the QB or drop back to defend passes, depending on the given play.

Figure 5. Denver Broncos linebacker Von Miller (#58) is a defensive stud who is strong and fast.

Some players are smart and poised:

Offense: Quarterback (QB) and Center (C) call out signals in real time based on the defensive team’s alignment on a given play.
Defense: Middle linebacker (LB) and Safety (S) change the formations of the defense according to how the offensive team lines up on a given play.

Figure 6. New England Patriot’s QB Tom Brady (#12) is unflappable under pressure and can read defensive alignments and make adjustments on the fly.

Where Do the Players Come From?

The players are drafted from college teams, which are organized in conferences that are regionally established.

For example, the Big Ten conference consists mainly of teams from the Midwest, with a couple of east coast teams thrown in.

University of Iowa, University of Minnesota, Ohio State University, University of Michigan, Michigan State University, University of Wisconsin, Northwestern University, Purdue University, Indiana University, University of Illinois, University of Nebraska, Penn State University, Rutgers University, University of Maryland

Other prominent conferences include the Southeastern Conference (SEC), Atlantic Coast Conference (ACC), Big 12, Pac 10, Pac 12, Big East, and so on.

The NFL Draft

Every Spring, the NFL holds its draft. Teams select college players and supplement their rosters for the upcoming season.

There are 7 rounds, with 32 picks per round, usually one pick per team.
The order of selection is based on a team’s record of wins and losses from the previous season. The worst team selects first in each round, the best team selects last in each round, and the other teams select according to their rankings.
Generally, each team drafts one player per round, yielding 8-10 college players who will join the NFL teams that draft them.

Figure 7. The NFL draft is spectacle of sports beauty.

The Draft Decision Making Process

NFL Teams invest a lot of money in trying to select players who will make their teams better.

NFL teams have personnel called scouts who assess college players.
Additionally, prior to the draft, college players perform physical and mental drills at the yearly NFL combine.
The overall player selection process is therefore highly multivariate in nature, taking many variables into account:
- Physical attributes: size, strength, speed, eye-hand coordination, quickness.
- Mental attributes: football intelligence, ability to memorize a playbook, ability to be coached, attitude, personality.

Even given these measurables, it is incredibly difficult to forecast how well a college player will perform in the NFL.

Figure 8. The typical NFL team has a “war room” during the draft. Scouts, coaches, the general manager and others discuss the players selected by other teams and prepare to make their picks. Note the big board in the background, which contains the team’s ranking of college players.

Enter Data Science

What can data science tell us about the NFL draft?

It can tell us a lot, but I want to start with one fundamental question: What positions and conferences are selected preferentially within the 7 rounds?

That is, from the first player selected in the first round until the last pick (~ pick # 224) in the seventh round, are there patterns in the selections of players at different positions and from different conferences?
Intuitively, an informed football fan holds certain opinions, for example, that:
- The Big Ten Conference produces a bountiful supply of excellent offense linemen.
- The Southeastern Conference produces outstanding players at the speed positions, such as wide receiver and cornerback.

Can data science help us to evaluate such intuitions? And can data science show us new patterns in the selection of players in the NFL draft?

Data Science Methods

Using Scrapy, a web scraping package in the Python language, I acquired data from the past ten years of NFL drafts from Wikipedia (e.g., https://en.wikipedia.org/wiki/2016_NFL_Draft ). This data allowed me to construct a spreadsheet that included the following information about every player selected in the drafts over ten years: round number, pick number, NFL team, player name, position, college team, and conference.

Given this detailed spreadsheet, I used the R programming language for analysis and visualization of the data. The figure below illustrates the draft selection process by position and conference.

Figure 9. Violin plot, color coded by conference and arranged by position. The circles represent individual observations, i.e., individual players selected at that pick across the past decade. The top of the Y axis (0) reflects the first player selected, and proceeds downwards to the last player selected in the 7th round. Therefore, players selected in the topmost parts of each plot are the highest picks and considered the most valuable.

Observations

From this plot, we see that some intuitions are borne out. For example, the Big Ten conference has produced a relatively large number of offensive tackles who were selected in the early rounds. We also see a strong representation of the SEC at this position.

At cornerback (CB), a skill position in which speedy players excel, the ACC and SEC dominate. This finding again matches our intuition, since an informed fan believes that many excellent CBs come from colleges in these conferences, such as Alabama and Florida.

Interestingly, and less intuitively, other trends can be discerned in this plot. Consider quarterback (QB), the most valuable position on any team. Here, the conference with the most QBs taken in high rounds is the Big 12. Also note that there is a cluster of QB picks in the middle rounds for this conference, but hardly any selections in the later rounds.

This application of data science suggests other questions. For example, suppose my favorite team, the Minnesota Vikings, is considering drafting a QB from the Pac 12 conference. I might look at the figure (P=12, gold in Figure 9) and note that not a single QB from this conference has been selected in the first round over the past ten years.

Of course, this does not mean that the QB in question will not be good. Indeed, this figure tells us nothing about how specific college players actually performed in the NFL. Rather, it shows us trends in the data, which may help us to better understand how players have been valued and selected in the past. Of course, we can create other figures that include metrics on player performance once they play in the NFL; this is something I am working on.

Overall, a project like this one may help an NFL team build a better model of how player positions and college conferences are represented in the draft over time. As we noted above, there is big money in the NFL, and data science is a goldmine.

The post Web Scraping the NFL Draft appeared first on NYC Data Science Academy Blog.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Web Scraping the NFL Draft

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)