# An Analysis of Texas High School Academic Competition Results, Part 4 – Schools

**r on Tony ElHabr**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Having investigated individuals elsewhere, let’s now take a look at the schools.

**NOTE:**

* Although I began the examinations of competitions and individuals by
looking at volume of participation (to provide context), I’ll skip an
analogous discussion here because the participation of schools is shown
indirectly through those analyses.) *

## School Scores

Let’s begin by looking at some of the same metrics shown for individual
students, but aggregated across all students for each school. In order
to give the reader some insight into school performance, I’ll rank and
show schools by a singular metric of performance. To be consistent, I’ll
use the same metric used for ranking the individuals–summed percentile
rank of scores (`prnk_sum`

).

**NOTE:** * For the same reason stated before for showing my own
scores among the individuals, I’ll include the numbers for my high
school (“CLEMENS”) in applicable contexts. *

rnk | school | city | n | prnk_sum | prnk_mean | n_defeat_sum | n_defeat_mean | n_advanced_sum | n_state_sum |
---|---|---|---|---|---|---|---|---|---|

1 | ARGYLE | ARGYLE | 168 | 159.01 | 0.95 | 867 | 5.16 | 109 | 53 |

2 | CLEMENTS | SUGAR LAND | 174 | 149.88 | 0.86 | 936 | 5.38 | 109 | 47 |

3 | LINDSAY | LINDSAY | 154 | 134.39 | 0.87 | 791 | 5.14 | 93 | 40 |

4 | KLEIN | KLEIN | 152 | 131.13 | 0.86 | 783 | 5.15 | 87 | 30 |

5 | DULLES | SUGAR LAND | 155 | 129.02 | 0.83 | 825 | 5.32 | 90 | 37 |

6 | WYLIE | ABILENE | 156 | 124.70 | 0.80 | 636 | 4.08 | 91 | 31 |

7 | GARDEN CITY | GARDEN CITY | 144 | 122.77 | 0.85 | 823 | 5.72 | 85 | 33 |

8 | HIGHLAND PARK | DALLAS | 149 | 121.71 | 0.82 | 655 | 4.40 | 85 | 25 |

9 | SALADO | SALADO | 127 | 103.31 | 0.81 | 605 | 4.76 | 73 | 30 |

10 | WESTWOOD | AUSTIN | 130 | 102.67 | 0.79 | 546 | 4.20 | 67 | 9 |

231 | CLEMENS | SCHERTZ | 77 | 43.35 | 0.56 | 233 | 3.03 | 17 | 0 |

**Note:** ^{1} # of total rows: 1,436

Admittedly, there’s not a lot of insight to extract from this summary regarding individual schools. Nonetheless, it provides some useful context regarding the magnitude of performance metric values aggregated at the school level.

To begin gaining some better understanding this list of top-performing schools, let’s break down school performance by year.

Also, let’s combine the performance metric values with coordinate data to visualize where the best schools are located.

Now, let’s visualize school dominance across years.

We saw elsewhere that there is no significant temporal trend for
competition types or competition level, but is there some kind of
temporal trend for schools? My intuition says that there should **not**
be any kind of significant relationship between year and performance.
Rather, I would guess that–going with the theory that certain schools
tend to do well all of the time–the school itself should have some
non-trivial relationship with performance. (If this is true, this would
imply that the top-performing schools have students that are better
suited for these academic competitions, perhaps due to a strong support
group of teachers, demographics, house income, or some other factor not
quantified directly here.) Also, I hypothesize that recent performance
is probably the strongest indicator of current performance, as it is in
many different contexts. I should note that I think these things may
only be shown to be true when also factoring in competition type–it
seems more likely that schools are “elite” for certain competition
types, as opposed to all competitions in aggregate.

To put these ideas together more plainly, I am curious to know if the
success of a school in any given year can be predicted as a function of
the school itself, the year, and the school’s performance in the
previous year. ^{1} As before, my preference for quantifying performance
is percent rank sum (`prnk_sum`

) of team score (relative to other
schools at a given competition level). Also, I think it’s a good idea to
“re-scale” the year value to have a first value of 1 (corresponding to
the first year in the scraped data–2004), with subsequent years taking
on subsequent integer values. (This variable is named `year_idx`

).

So, to be explicit, a linear regression model of the following form is calculated for each unique school and competition type. (Accounting for competition type allows us to properly model the reality that a given school may excel in some competition types but not others.)

$$
prnk_sum = intercept + prnk_sum*{year-1} * \beta*{1} + year_idx * \beta_{2}
$$

*p**r**n**k*_*s**u**m* = *i**n**t**e**r**c**e**p**t* + *p**r**n**k*_*s**u**m*_{*year* − 1}** * β_{1} + *y**e

**a**r*_*i

**d**x* *

*β*

_{2}

Note that, because this formula is applied to each school-competition type pair, the intercept term corresponds to the school entity itself.

The distribution of p-values for each term in the model provide some insight regarding the predictive power of the variables. Visually, it does seem like two of my hypotheses are valid:

Recent performance does seem to be predictive of school performance in a given competition type in any given year.

Year itself is not predictive (meaning that there is no temporal trend indicating that performance improves or worsens over time).

However, my other thought that school itself has some kind of predictive
value does **not** appear to be true. ^{2}

Perhaps the deduction that, in general, individual schools do **not**
tend to dominate the rest of the competition can be comprehended in
another way. The distribution of the percentage of possible opponent
schools defeated at each competition level for each school should
re-enforce this inference.

Indeed, observing that the histograms do **not** show any noticeable
skew to the right supports the notion that, in general, individual
schools are not dominating specific competition types. If this theory
were true, we would see some non-trivial right-hand skew. This
possibility is closest to being true (albeit not that close) with the
District level of competition (i.e. the lowest level of competition).
This observation is not all so surprising given that if it were true
that schools do dominate at some level of competition, it is most likely
to be true at the lowest level of competition.

## Wrap-Up

Certainly analysis of schools in these academic UIL competitions deserves some more attention than that given here, but I think some of the biggest questions about school performance have been answered.

- Actually, I don’t specifically enforce the criteria that theprevious year is used. Rather, I use the most recent year’s value, which may or may not be the previous year if the school did not compete in the previous year.
^{^} - For more information regarding interpretation of p-value distributions, I recommend reading David Robinson’s very helpful blog post on the topic.
^{^}

**leave a comment**for the author, please follow the link and comment on their blog:

**r on Tony ElHabr**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.