2011 Perth City to Surf Stats

September 6, 2011
By

(This article was first published on Matt's Stats n stuff » R, and kindly contributed to R-bloggers)

Like every year, August sees the thousands taking part in the Perth City to Surf, and with that comes the chance for some stats. Why? Curiosity more than anything, and to convince myself that my time in the 12km run of 1 hour 9 mins and 36 seconds wasn’t so bad given I came down with the dreaded man flu just 36 hours prior to the race starting.

Despite the papers saying around 42,000 competed, the official results site lists results for 32,243 (excluding <10 people listed as NA for age and sex). 15,424 males (45.1%) and 18,791 females (54.9%)*.

Individual results are available at Perth Now.

FemaleMaleTotal
4km Walk3118 (65.6%)1638 (34.4%)4756
4km Run1904 (55.9%)1503 (44.1%)3407
12km Walk6058 (74.4%)2087 (25.6%)8145
12km Run6507 (45.5%)7805 (54.5%)14312
Half Marathon1005 (36.8%)1728 (63.2%)2733
Marathon199 (23.1%)663 (76.9%)862

Next of interest is the finishing times. Here we have both the median and mean for each sex for each event. Means are 2.5% trimmed, which means the fastest and slowest 2.5% of people were removed before calculating the mean, due to the mean being heavily influenced by outliers.

FemaleMaleTotal
4km Walk47 : 46.647 : 46.446.8 : 46.5
4km Run29 : 30.627 : 28.628.4 : 29.7
12km Walk125 : 124.9122 : 122.1124.4 : 124.2
12km Run80 : 82.868 : 70.573.7 : 76.1
Half Marathon125 : 126.2111 : 113.1116.3 : 117.9
Marathon251 : 251.7232 : 237.8237.2 : 241

format being “Median : Mean (trimmed)”

From here you might be interested in how you stacked up against your peers by age. Below is a series of graphs and small tables for each sex for each event. These graphs are called frequency polygons, they’re nothing to be afraid of, they are essentially histograms but with lines rather than bars. This way we can see multiple groups plotted on the same axis with hopefully less clutter. For some of these graphs I can cut off at arbitrary upper limits to remove a few… ‘stragglers’.

4km Walk

For the females:

AgePeopleMedian : Mean (trimmed)
0-1892645.6 : 45.2
19-2949745.6 : 46
30-3965247.9 : 47.9
40-4961546.2 : 46.3
50-7942848.3 : 49
Total311846.7 : 46.6

For the Males:

AgePeopleMedian : Mean (trimmed)
0-1862046.1 : 45
19-2915746.7 : 46.3
30-3930847.5 : 47.5
40-4932247.5 : 47.4
50-9923147.3 : 48.1
 Total163847 : 46.4

Not a lot of variation here, as expected though with this event. Twice as many females in this than males and a similar spread across the ages for each sex. Good to see in the young group, especially with the males, there’s a small group that ran ahead (see the first peak for the pink line in the male graph).

4km Run

For the females:

AgePeopleMedian : Mean (trimmed)
0-1871829.1 : 30.2
19-2940228.8 : 29.7
30-3937429.5 : 31.1
40-4933729.6 : 31.2
50-797331.2 : 34.8
Total190429.2 : 30.6

For the males:

AgePeopleMedian : Mean (trimmed)
0-1864326.2 : 27.4
19-2919625.4 : 26.9
30-3923529.4 : 31.3
40-4932127.4 : 29.1
50-7910828.5 : 32.1
Total150327.1 : 28.6

Here we see a bit more variation, but times are still very close across age groups. Little variation within the females between the 30-39 and 40-49 group, and in the males the 40-49 group sitting slightly faster than the 30-39 group. I personally wouldn’t read too much into this given the nature of the 4km events, having large numbers does make this representative though of what is going on. Given the skewed distribution (tail to the right) the medians might tell a better story here as being more representative where the peak lies.

12km Walk

For the females:

AgePeopleMedian : Mean (trimmed)
0-18676126.8 : 126.8
19-291952123.9 : 123.5
30-391234125.6 : 125
40-491170124.5 : 124.8
50-791026126 : 126.6
Total6058125.1 : 124.9

For the males:

AgePeopleMedian : Mean (trimmed)
0-18333126.7 : 125.2
19-29421121 : 119.7
30-39388124.5 : 125.1
40-49375122 : 120.7
50-99567119.6 : 120.8
Total2087122.2 : 122.1

Again with the walkers, as you would expect, this is very tight. Another event dominated by the females, not much more to say other than looks good they all walked together.

12 km run

For the females:

AgePeopleMedian : Mean (trimmed)
0-1861888.3 : 90.3
19-29258580 : 82.4
30-39182578.8 : 80.8
40-49108779.6 : 81.9
50-7939281.8 : 85
Total650780.5 : 82.8

For the males:

AgePeopleMedian : Mean (trimmed)
0-1889570.2 : 73.2
19-29250166.2 : 69
30-39217166.4 : 68.8
40-49142668.6 : 71
50-9981272.7 : 75.6
Total780567.8 : 70.5

This was the big event. And I was really surprised, and impressed, to see that there was little variation with age. The 19-29 and 30-39 groups for the guys pulled up slightly faster, the 40-49 for the females held their own as well. I really expected to see more of a staggering to the right with increasing age.

Half Marathon

For the females:

AgePeopleMedian : Mean (trimmed)
0-24139120.7 : 124.6
25-34434123.3 : 124.6
35-44285126.8 : 127.5
45-54126128.4 : 129.2
55-9921133.4 : 138.9
Total1005125.2 : 126.2

For the males:

AgePeopleMedian : Mean (trimmed)
0-24217109.3 : 111.1
25-34633108.5 : 111.2
35-44533112.4 : 113.2
45-54258113.4 : 115.6
55-9987126 : 125.2
Total1728111.1 : 113.1

These graphs are a little more jagged due to the slightly lower numbers and wider spread of times. Again very consistent. The females have less of a sharp peak, suggesting they didn’t run in together in a big group like the males.

Marathon

Everyone

For the females:

AgePeopleMedian : Mean (trimmed)
0-2419254.9 : 255.4
25-3483243.2 : 248.8
35-4454244.1 : 243.1
45-9943259.1 : 266.7
Total199251.1 : 251.7

For the males:

AgePeopleMedian : Mean (trimmed)
0-2472229.4 : 236.3
25-34206230.6 : 237
35-44210228.2 : 234.5
45-99175239.9 : 243.8
Total663231.6 : 237.8

On average the guys ran in 20 minutes ahead of the gals. For context 4 hours is 240 minutes, so the males were 10 minutes faster than that and the females 10 minutes slower, on average.

Other stats

Anything else you’d like to see, statistic wise or graphed just let me know in a comment below. Or if you, heaven forbid, spot an error.

Thanks to all those who participated, see you again next year!

Geek speak

These statistics and graphs were produced in R, graphs using the ggplot2 package using geom_frequency. The code used for this is available here. It’s not pretty by any means. The data was manually scrapped from the results site given it loads in separate pages for each sex/age group.

* 28 wheel chair participants excluded as I couldn’t easily get their data from the site.


To leave a comment for the author, please follow the link and comment on his blog: Matt's Stats n stuff » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.