**r on Tony ElHabr**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Let’s take a look at individual competitors in the academic UIL

competitions.

## Individual Participation

The first question that comes to mind is that of participation–which

individuals have competed the most?

**NOTE:**

* To give some context to the values for individual participants, I’ll
include the numbers for myself (“Elhabr, Anthony”) in applicable
contexts. *

rnk | name | school | city | conf | n |
---|---|---|---|---|---|

1 | Jansa, Wade | GARDEN CITY | GARDEN CITY | 1 | 57 |

2 | Chen, Kevin | CLEMENTS | SUGAR LAND | 5 | 56 |

3 | Hanson, Dillon | LINDSAY | LINDSAY | 1 | 53 |

4 | Gee, John | CALHOUN | PORT LAVACA | 4 | 47 |

5 | Zhang, Mark | CLEMENTS | SUGAR LAND | 5 | 47 |

6 | Robertson, Nick | BRIDGE CITY | BRIDGE CITY | 3 | 46 |

7 | Ryan, Alex | KLEIN | KLEIN | 5 | 46 |

8 | Strelke, Nick | ARGYLE | ARGYLE | 3 | 45 |

9 | Niehues, Taylor | GARDEN CITY | GARDEN CITY | 1 | 44 |

10 | Bass, Michael | SPRING HILL | LONGVIEW | 3 | 43 |

1722 | Elhabr, Anthony | CLEMENS | SCHERTZ | 4 | 13 |

**Note:** ^{1} # of total rows: 123,409

Although the names here may not provide much insight, the counts provide

some context regarding the limits of individual participation.

Given that counts of overall participation may not be indicative of

anything directly, it may be a better idea to break it down by

conference.

It seems that there has not been as much invdividual participation in

the 6A conference (`conf 6`

)– which is the conference with largest high

schools (according to student body size).

I hypothesize that this phenomenon can be attributed to “pre-filtering

of talent” by these large schools. In other words, conference 6A schools

may be more likely to designate their individual competitors to compete

in only specific competitions and prevent any student who may be

capable, yet not fully prepared, from entering a competition. High

standards and expectations of aptitude are relatively common at very

large schools, even if what may be deemed “unacceptable” at such a

school would be very satisfactory at a smaller school. By comparison,

schools in all other conferences may be more willing to let individual

students compete in as many competition types as they desire, even if

they have not prepared for them whatsoever.

Such a phenomenon might be evident in lower scores (in aggregate) for

conferences where participation is greater. In fact, this is exactly

what is observed. ^{1} On average, conference 6A has the highest scores,

while conference 1A has the lowest.

So what about people’s scores? Who did best according to score? In order

to simplify the data, let’s look at a couple of statistics based on

score, aggregating across all scores for each individual. In particular,

let’s look at the average and sum of placing percent rank (`prnk`

) and

of individual competitors “defeated” (`n_defeat`

). (Note that

competitors defeated is defined as the number of scores less that that

of a given individual for a unique competition, and a unique competition

is defined as a unique combination of year, competition level, and

competition type.)

rnk | name | school | city | n | prnk_mean | n_defeat_mean |
---|---|---|---|---|---|---|

1 | Hanson, Dillon | LINDSAY | LINDSAY | 53 | 0.97 | 29.17 |

2 | Chen, Kevin | CLEMENTS | SUGAR LAND | 56 | 0.91 | 29.80 |

3 | Jansa, Wade | GARDEN CITY | GARDEN CITY | 57 | 0.89 | 29.86 |

4 | Niehues, Taylor | GARDEN CITY | GARDEN CITY | 44 | 0.96 | 31.20 |

5 | Gee, John | CALHOUN | PORT LAVACA | 47 | 0.90 | 25.21 |

6 | Zhang, Mark | CLEMENTS | SUGAR LAND | 47 | 0.89 | 29.15 |

7 | Strelke, Nick | ARGYLE | ARGYLE | 45 | 0.93 | 26.56 |

8 | Robertson, Nick | BRIDGE CITY | BRIDGE CITY | 46 | 0.88 | 25.33 |

9 | Ryan, Alex | KLEIN | KLEIN | 46 | 0.86 | 26.20 |

10 | Xu, Steven | DAWSON | PEARLAND | 43 | 0.88 | 26.81 |

2608 | Elhabr, Anthony | CLEMENS | SCHERTZ | 13 | 0.60 | 16.62 |

**Note:** ^{1} # of total rows: 117,684

Also, I think it’s interesting to look at the distribution of counts for

competitors defeated, advancement, and state competition appearances.

The heavily right skewed distribution of values gives an indication of

the difficulty of succeeding consistently.

For comparison’s sake, let’s visualize the same metrics aggregated at

the school level. Keep in mind that while the sample of students should

have larger counts for number of advancements and state competition

appearances (y-axis) for any given number of occurences (x-axis) because

there are many more students than schools, schools are more likely to

have a wider range of occurrences (x-axis) because there are less

schools in each competition (compared to the number of individuals).

To understand why this is true, let’s take an example: Say there is a

District level competition where there are 8 schools and 40 individuals

competing. It is more likely that a given school advances to the next

level of competition (as a result of having a total score that is higher

than the scores of the other 7 schools) than any single individual, who

if not from the school that advances, can only advance as a result of

having a top “n” (e.g. 3) score.

We see that the distributions are skewed towards the right here as well,

although not quite as “evenly”. This indicates that some schools tend to

perform well at a more consistent rate than individuals themselves.

Intuitively, this makes sense. It can be very difficult for individuals

alone to beat out the competition, especially if they have an “off” day.

On the other hand, schools, relying on teams of individuals, are placed

according to the sum of the top “n” (e.g. 3) of individual competitor

scores. Thus, because school scores are dependent on groups of

individuals– who will tend to perform more consistently in aggregate

than any one individual– school placings are more likely to be similar

across years, meaning that schools that are observed to do well in any

given year are more likely to do well in other years as well (relative

to individual competitors).

So it should be obvious that it is difficult to make it the highest

level of competition–State. But exactly how difficult is it? Let’s

identify those people (and their scores) who have made the State

competition level four times–which is the upper limit for a typical

high school student ^{2}– for a given competition type.

Clearly, these individuals represent a very small subset of the total

sample. They might be considered the “elite”. Of these individuals, who

has appeared in State competitions for more than one type of

competition?

name | school | city | conf | n |
---|---|---|---|---|

Chen, Kevin | CLEMENTS | SUGAR LAND | 5 | 4 |

Jansa, Wade | GARDEN CITY | GARDEN CITY | 1 | 4 |

Hanson, Dillon | LINDSAY | LINDSAY | 1 | 3 |

Strelke, Nick | ARGYLE | ARGYLE | 3 | 3 |

Bass, Michael | SPRING HILL | LONGVIEW | 3 | 2 |

Deaver, Matthew | SILSBEE | SILSBEE | 3 | 2 |

Liu, Jason | DAWSON | PEARLAND | 4 | 2 |

Ryan, Alex | KLEIN | KLEIN | 5 | 2 |

Williams, Tyler | POOLVILLE | POOLVILLE | 1 | 2 |

Xu, Steven | DAWSON | PEARLAND | 4 | 2 |

**Note:** ^{1} # of total rows: 11

I would consider those individuals appearing here to be the “elite of

the elite”.

## Individual Performance

Now, I want to try to answer a somewhat ambiguous question: Which

individuals were most “dominant”?

### Evaluating “Dominance”

Because the term “dominance” is fairly subjective, it must be defined

explicitly. Here is my definition/methodology, along with some

explanation.

First, I assign a percent rank to individual placings in all

competitions based on score relative to other scores in that

competition. I choose to use percent rank–which is a always a value

between 0 and 1–because it inherently accounts for the wide range of

number of competitors across all competitions. (For this context, a

percent rank of 1 corresponds to the highest score in a given

competition ^{3}, and, conversely, a value of 0 corresponds to the lowest

score.)

I should note that I evaluated some other metrics for gauging individual

success, including the total number of individuals defeated in

competitions. Percent rank based on score and number of defeats attempt

to quantify the same underlying variable, but I think percent rank is a

little more “natural” to interpret because it contextualizes number of

competitors with its unit range. By comparison, the interpretation of

number of defeats is less direct because the number of other competitors

is not accounted directly.

Then, to come up with a final set of ranks, one for each unique

competitor, based on the percent ranks for individual competitions, I

simply sum up the percent ranks for each individual.

The sum is used instead of an average ^{4} because rankings based on

averages –and inferences made upon them–are sensitive to individuals

who do not compete in many competitions, yet place very well in them. A

final ranking based on a summed value does not suffer from this pitfall,

although it can be sensitive to the sample size of each participant.

(i.e. An individual might participate in a high number of competitions

and under-perform relative to the average in all of them, yet their

final ranking, based on summed percent ranks, might indicate that they

are an above-average performer.)

name | school | conf | rnk_sum_prnk | rnk_mean_prnk | rnk_sum_n_defeat | rnk_mean_n_defeat |
---|---|---|---|---|---|---|

Hanson, Dillon | LINDSAY | 1 | 1 | 191 | 3 | 5205 |

Chen, Kevin | CLEMENTS | 5 | 2 | 809 | 2 | 4834 |

Jansa, Wade | GARDEN CITY | 1 | 3 | 1298 | 1 | 4827 |

Niehues, Taylor | GARDEN CITY | 1 | 4 | 200 | 4 | 3612 |

Gee, John | CALHOUN | 4 | 5 | 1114 | 14 | 10361 |

Zhang, Mark | CLEMENTS | 5 | 6 | 1272 | 5 | 5213 |

Strelke, Nick | ARGYLE | 3 | 7 | 562 | 12 | 8340 |

Robertson, Nick | BRIDGE CITY | 3 | 8 | 1598 | 15 | 10277 |

Ryan, Alex | KLEIN | 5 | 9 | 2561 | 8 | 8754 |

Xu, Steven | DAWSON | 4 | 10 | 1661 | 16 | 8160 |

Elhabr, Anthony | CLEMENS | 4 | 2330 | 30385 | 2934 | 34266 |

**Note:** ^{1} # of total rows: 123,337

Some of the same individuals from the participation-based ranking of

competitors also appear among the top of the ranks by my evaluation of

domination. This is somewhat expected due to the nature of my

methodology (in particular, my choice to use a sum instead of an average

or some other statistic). Certainly some additional statistical analysis

could be done here to investigate other methods of quantifying dominance

(beyond my analysis). ^{5} Because there is no well-agreed upon metric or

method anywhere for quantifying dominance for this particular topic,

it’s difficult to really judge the findings here.

In any matter, the difference between the ranks based on the average and

the sum of percent ranks is not all too great–I found that the

correlation between the two is anywhere between ~0.75 and ~0.95 when

using either score percent rank or number of defeats as the metric for

ranking.

rowname | rnk_sum_prnk | rnk_mean_prnk | rnk_sum_n_defeat | rnk_mean_n_defeat |
---|---|---|---|---|

rnk_sum_prnk | NA | 0.8192 | 0.9488 | 0.7375 |

rnk_mean_prnk | 0.8192 | NA | 0.7840 | 0.8829 |

rnk_sum_n_defeat | 0.9488 | 0.7840 | NA | 0.8449 |

rnk_mean_n_defeat | 0.7375 | 0.8829 | 0.8449 | NA |

### Evaluating “Carrying”

Another way of identifying superb ability is to compare the scores of

individuals with those of their teammates. Individuals with high scores

relative to their teammates might be said to have “carried” their

teammates. Although this kind of evaluation is dependent on the skill of

each team (independent of the competition setting), I think that it is

another interesting way of evaluating skill.

I was hoping that I might see myself appearing among the most dominant

by this measure of skill, but it does not say anything necessarily bad

about myself that I don’t. I competed with other individuals who I

considered to be very knowledgeable and who often scored better than me.

Also, from the opposite point of view, I don’t think I was a poor

performer who relied upon teammates to boost the team’s overall score.

This is why the data points corresponding to me show up in the middle of

the pack in the previous visual.

### “Improvement”

I think something else that would be interesting to look at is personal

“improvement” between years. Theoretically, if we assume that

individuals improve their academic ability and competition skills every

year, then we should see individual scores for a given competition type

and level increase from one year to next. I would be very surprised to

find that this is not true.

To evaluate improvement, we can simply reduce the whole data set to just

those who have competed in the same competition type and level in more

than one year and check whether or not their scores increased or

decreased from one year to the next. ^{6} Actually, in order to account

for variance in competition difficulty across years, it’s better to use

the percent rank of the individual’s placing (based on score) rather

than score itself.

improve | n |
---|---|

FALSE | 23,022 |

TRUE | 35,305 |

**Note:** ^{1} # of total rows: 2

So it is true that individual scores–actually, percent rank of

placings–do tend to improve as the individual ages. But is this trend

statistically significant? That’s easy enough to answer–we can simply

perform a binomial

test–where the null

hypothesis is that the

distribution of “TRUE” and “FALSE” regarding improvement is truly a 50 %

– 50 % split. If the p-value of

the test is below a threshold value–let’s say 0.05–then we can deny

the null hypothesis and say that there is a non-trivial trend of

individual improvement.

metric | value |
---|---|

estimate | 0.6053 |

p.value | 0.0000 |

conf.low | 0.6013 |

conf.high | 0.6093 |

In fact, this is exactly what is observed.

Now, let’s reduce the set to just those who have appeared in

competitions at a given competition type four times (for the sake of

visualization) and plot the scores across years for each individual.

It is evident visually that people do have a tendency to improve over

time.

## Wrap-up

I’ll leave the discussion of individuals at that, although there is much

more that could be explored.

- See the previous post.

^{^} - The assumption here is that each student takes four years to complete high school.

^{^} - A unique competition is defined as one having a unique year, competition type, and competition level.

^{^} - The average may also be considered a valid means of aggregating the values for each individual.

^{^} - I’ll leave further analysis for another person and/or time.

^{^} - I don’t think it is relevant to require that the scores be in consecutive years, so I don’t enforce that criteria.

^{^}

**leave a comment**for the author, please follow the link and comment on their blog:

**r on Tony ElHabr**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.