An Analysis of Texas High School Academic Competition Results, Part 4 – Schools
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Having investigated individuals elsewhere, let’s now take a look at the
schools.
NOTE:
Although I began the examinations of competitions and individuals by
looking at volume of participation (to provide context), I’ll skip an
analogous discussion here because the participation of schools is shown
indirectly through those analyses.)
School Scores
Let’s begin by looking at some of the same metrics shown for individual
students, but aggregated across all students for each school. In order
to give the reader some insight into school performance, I’ll rank and
show schools by a singular metric of performance. To be consistent, I’ll
use the same metric used for ranking the individuals–summed percentile
rank of scores (prnk_sum
).
NOTE: For the same reason stated before for showing my own
scores among the individuals, I’ll include the numbers for my high
school (“CLEMENS”) in applicable contexts.
rnk  school  city  n  prnk_sum  prnk_mean  n_defeat_sum  n_defeat_mean  n_advanced_sum  n_state_sum 

1  ARGYLE  ARGYLE  168  159.01  0.95  867  5.16  109  53 
2  CLEMENTS  SUGAR LAND  174  149.88  0.86  936  5.38  109  47 
3  LINDSAY  LINDSAY  154  134.39  0.87  791  5.14  93  40 
4  KLEIN  KLEIN  152  131.13  0.86  783  5.15  87  30 
5  DULLES  SUGAR LAND  155  129.02  0.83  825  5.32  90  37 
6  WYLIE  ABILENE  156  124.70  0.80  636  4.08  91  31 
7  GARDEN CITY  GARDEN CITY  144  122.77  0.85  823  5.72  85  33 
8  HIGHLAND PARK  DALLAS  149  121.71  0.82  655  4.40  85  25 
9  SALADO  SALADO  127  103.31  0.81  605  4.76  73  30 
10  WESTWOOD  AUSTIN  130  102.67  0.79  546  4.20  67  9 
231  CLEMENS  SCHERTZ  77  43.35  0.56  233  3.03  17  0 
Note: ^{1} # of total rows: 1,436
Admittedly, there’s not a lot of insight to extract from this summary
regarding individual schools. Nonetheless, it provides some useful
context regarding the magnitude of performance metric values aggregated
at the school level.
To begin gaining some better understanding this list of topperforming
schools, let’s break down school performance by year.
Also, let’s combine the performance metric values with coordinate data
to visualize where the best schools are located.
Now, let’s visualize school dominance across years.
We saw elsewhere that there is no significant temporal trend for
competition types or competition level, but is there some kind of
temporal trend for schools? My intuition says that there should not
be any kind of significant relationship between year and performance.
Rather, I would guess that–going with the theory that certain schools
tend to do well all of the time–the school itself should have some
nontrivial relationship with performance. (If this is true, this would
imply that the topperforming schools have students that are better
suited for these academic competitions, perhaps due to a strong support
group of teachers, demographics, house income, or some other factor not
quantified directly here.) Also, I hypothesize that recent performance
is probably the strongest indicator of current performance, as it is in
many different contexts. I should note that I think these things may
only be shown to be true when also factoring in competition type–it
seems more likely that schools are “elite” for certain competition
types, as opposed to all competitions in aggregate.
To put these ideas together more plainly, I am curious to know if the
success of a school in any given year can be predicted as a function of
the school itself, the year, and the school’s performance in the
previous year. ^{1} As before, my preference for quantifying performance
is percent rank sum (prnk_sum
) of team score (relative to other
schools at a given competition level). Also, I think it’s a good idea to
“rescale” the year value to have a first value of 1 (corresponding to
the first year in the scraped data–2004), with subsequent years taking
on subsequent integer values. (This variable is named year_idx
).
So, to be explicit, a linear regression
model of the following
form is calculated for each unique school and competition type.
(Accounting for competition type allows us to properly model the reality
that a given school may excel in some competition types but not others.)
$$
prnk_sum = intercept + prnk_sum{year1} * \beta{1} + year_idx * \beta_{2}
$$
*prnk*_*sum* = *intercept* + *prnk*_*sum*_{*year* − 1} * β_{1} + *year*_*idx* * β_{2}
Note that, because this formula is applied to each schoolcompetition
type pair, the intercept term corresponds to the school entity itself.
The distribution of pvalues
for each term in the model provide some insight regarding the predictive
power of the variables. Visually, it does seem like two of my hypotheses
are valid:

Recent performance does seem to be predictive of school performance
in a given competition type in any given year. 
Year itself is not predictive (meaning that there is no temporal
trend indicating that performance improves or worsens over time).
However, my other thought that school itself has some kind of predictive
value does not appear to be true. ^{2}
Perhaps the deduction that, in general, individual schools do not
tend to dominate the rest of the competition can be comprehended in
another way. The distribution of the percentage of possible opponent
schools defeated at each competition level for each school should
reenforce this inference.
Indeed, observing that the histograms do not show any noticeable
skew to the right supports the notion that, in general, individual
schools are not dominating specific competition types. If this theory
were true, we would see some nontrivial righthand skew. This
possibility is closest to being true (albeit not that close) with the
District level of competition (i.e. the lowest level of competition).
This observation is not all so surprising given that if it were true
that schools do dominate at some level of competition, it is most likely
to be true at the lowest level of competition.
WrapUp
Certainly analysis of schools in these academic UIL competitions
deserves some more attention than that given here, but I think some of
the biggest questions about school performance have been answered.
 Actually, I don’t specifically enforce the criteria that theprevious year is used. Rather, I use the most recent year’s value, which may or may not be the previous year if the school did not compete in the previous year.
^{^}  For more information regarding interpretation of pvalue distributions, I recommend reading David Robinson’s very helpful blog post on the topic.
^{^}
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.