# Wingspan Data Analysis

**R Archives - Dan Oehm | Gradient Descending**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Wingspan is a great game even though I’ve only played it a few times. The mechanics are great, there are lots of bird varitions, and a bunch of different strategies to try. There are 170 birds, and I’ve probably only seen 30 of them. So, true to form, I’ve dabbled in a bit of data analysis to get a view of all the different types of cards in the game.

Open source wins again since the {wingspan} R package exists. It contains the details of each bird in the core, European, Oceania, and swift start sets. I’ll only be using the core set for this analysis since that’s the only one I’m semi familiar with.

## What’s the most common food type?

There are five food types: invertebrate (let’s be honest, it’s grub, and I’ll choose 1 syllable over 4 any day), seed, fruit, fish, and rat (it’s rat). Grubs are definitely more common as a food cost, but how much more?

Seeds and grubs are 2.5-3x more common food cost than the other three food types when summed across all 170 cards. If you’re not looking for specific food types, choosing grubs and seeds from the bird feeder will give you more options. If you’ve played it a few times, this becomes obvious pretty quickly.

## What is the average egg capacity?

The average egg capacity is 2.85, although the distribution of egg capacity and the relationship with victory points is more interesting.

- A bird with 4 victory points and an egg capacity of 2 is the most common.
- There are only 4 birds with an egg capacity of 6.
- Each egg is worth a victory point at the end, so let’s consider victory points + egg capacity:
- The bird with the most is the Wild Turkey at 13.
- 17 birds (10%) have a victory point capacity of 10 or more.

This is useful to know in terms of the odds of picking up valuable cards from the tray or deck. Of course, some of the lower value cards will have great activations, but at the end game, you’ll be looking for the big ones.

## What is the habitat distribution?

There are almost equal numbers of birds across the habitats: 83 birds in the forest and grassland and 85 in the wetland.

The breakdown is mildly interesting:

- There are 45 solely wetland birds, which is the largest group
- There are 27 birds that can be played in any habitat
- There are only 2 birds that can be played in either the forest or wetland, but not the grassland

## What is the most common power?

Flocking cards (or tuck cards) are the most common power other than ‘Other’, which tends to include drawing more bonus cards or moving the bird to another habitat.

There are only 6 birds without powers, which are all high VP birds. I was surprised that cards with egg laying, card drawing, or food from the supply powers account for only 11% of the cards each.

## Predicting victory points

I expect victory points to correlate with egg capacity, food cost, activation power, and habitat. Fitting a model to predict the number of victory points allows us to see which cards have a good bang for buck.

Or, what I actually expect is that cards with fewer victory points than expected have strong activation powers to compensate. However, I am making an assumption here that there has been a lot of play testing and that the cards have been adjusted to be balanced.

### Data setup

A couple of things to note regarding the data setup:

- I filtered the birds to those with a single cost, e.g., not those where you can pay either a grub or seed.
- The birds without a power category were encoded to ‘No power’ rather than left as NA and removed from the model.

### Fitting the model

I’ve fit a GLM with victory points as the response and the food cost, egg capacity, habitat, and power category as predictors. I’ve removed the intercept from the model formula because it makes interpreting the coefficients easier.

library(wingspan) library(tidyverse) df <- birds |> rename(vp = victory_points) |> filter( set == "core", !food_cost_div ) |> mutate(power_category = replace_na(power_category, "No power")) |> mutate_at(c("forest", "grassland", "wetland"), as.numeric) mod <- lm(vp ~ invertebrate + seed + fruit + fish + rodent + any_food + egg_capacity + forest + grassland + wetland + power_category - 1, data = df) summary(mod) Call: lm(formula = vp ~ egg_capacity + invertebrate + seed + fruit + fish + rodent + any_food + forest + grassland + wetland + power_category - 1, data = df) Residuals: Min 1Q Median 3Q Max -2.04909 -0.54122 -0.03641 0.43533 2.15785 Coefficients: Estimate Std. Error t value Pr(>|t|) egg_capacity -0.42837 0.06754 -6.342 4.15e-09 *** invertebrate 1.57877 0.14491 10.895 < 2e-16 *** seed 1.53325 0.13819 11.096 < 2e-16 *** fruit 1.89461 0.17910 10.578 < 2e-16 *** fish 1.94935 0.19816 9.837 < 2e-16 *** rodent 1.88318 0.18670 10.086 < 2e-16 *** any_food 1.26317 0.17042 7.412 1.92e-11 *** forest -0.56666 0.18740 -3.024 0.00305 ** grassland -0.39201 0.17402 -2.253 0.02610 * wetland 0.03936 0.18564 0.212 0.83243 power_categoryNo power 5.94229 0.54650 10.873 < 2e-16 *** power_categoryCaching Food 2.09541 0.51713 4.052 9.05e-05 *** power_categoryEgg-laying 1.64914 0.41463 3.977 0.00012 *** power_categoryCard-drawing 2.80260 0.40526 6.915 2.43e-10 *** power_categoryFlocking 1.83560 0.38535 4.763 5.38e-06 *** power_categoryFood from Supply 2.96775 0.40531 7.322 3.05e-11 *** power_categoryHunting/Fishing 3.10179 0.42053 7.376 2.31e-11 *** power_categoryFood from Birdfeeder 3.09215 0.41705 7.414 1.90e-11 *** power_categoryOther 2.15066 0.40827 5.268 6.16e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.8583 on 120 degrees of freedom Multiple R-squared: 0.973, Adjusted R-squared: 0.9687 F-statistic: 227.6 on 19 and 120 DF, p-value: < 2.2e-16

This is pretty neat; almost all are significant predictors of victory points. The takeaways:

- The higher the egg capacity, the fewer victory points. Makes sense since each egg on the card counts for a VP.
- The higher the cost, the more VPs. Makes sense.
- Fruit, fish, and rats account for more VPs than grubs and seeds. This makes sense since fewer birds require fruit, fish, or rats.
- Forest and grassland birds account for fewer VPs, but there’s no difference for wetland birds. Nice!
- Birds with no powers have far more VPs (5.9) than the other birds. Makes sense if they have no other VP generating potential.
- Birds with egg laying powers contribute the fewest VPs (1.6) since the power contributes high VP potential by laying an egg with each activation.
- Birds with flocking powers contribute the second fewest VPs (1.8) given their VP generating potential.
- Birds with card caching contribute 2.1 VPs.
- Birds with card drawing, food from the supply, hunting, or food from the birdfeeder powers contribute ~3 VPs. Their powers don’t directly generate VPs but allow you to play birds sooner.

I also fit the power colour into the model, but it wasn’t a significant predictor. That surprised me since I would expect brown and pink powers to have fewer VPs than white. You can see where they place in the residual plot below.

### Residual plot

By plotting the victory points by the residuals, we can see if the number of victory points is higher or lower than expected. Those above the line have fewer VPs than expected, and those below the line have fewer VPs than expected. I chose the grub to score birds with an ‘or’ condition in their food cost.

By inspection, the birds with weaker or activation powers for all players are typically above the line. They have more VPs to be worth playing. Those below the line typically have some pretty sweet powers given the cost.

“The Power 4” as they are colloquially known, are the:

- Common Raven
- Chihuahuan Raven
- Franklin’s Gull and
- Killdeer

All four are well below the line, which is some evidence to support my theory that birds with fewer VPs than expected have strong activation powers. The common raven is a bit higher suggesting it’s the pick of the four. This is cool because it potentially allows you to identify other strong cards you’ve overlooked.

This isn’t always the case, though. For example, the bird with 5 VPs at the bottom of the column, below the Common Raven, is the Indigo Bunting. It costs a grub, a seed, and a fruit. Its power is to gain a grub or a fruit from the birdfeeder. Not as good as discarding 1 egg to gain 2 whatevers, or even gaining a single grub from the supply. In this case, I’d say it either needs to be cheaper or have another VP, or both. Probably not worth paying the cost in my opinion.

The bird at the bottom of the 4 VP column is the Brown Pelican. It costs 2 fish; when played, you get 3 fish from the supply. That’s it. In my opinion, it needs more VPs or better activation.

The Northern Bobwhite is a great card: 5 victory points, estimated VPs of 3.3 (good bang for the buck), an egg capacity of 6, and an activation power to lay an egg on the card. It’s a great card at any stage of the game.

This analysis doesn’t dictate the clear best and worst cards, but I have found it useful to determine whether a card is a great bang for your buck or a bit expensive.

Every bird in the charts above is also in the look-up table below. ‘Est. VPs’ is the model estimated VPs, and ‘res’ is the residual (VPs – Est. VPs). It’s a great look-up table to compare birds.

Follow the link to view the table in a new window.

## Final thoughts

There are a few interesting things that have come out of the analysis, particularly with the model. The predicted VPs and the residual plot is useful for critically assessing each card and if it’s worth the cost. I’ve already referred to the table way more than I expected.

It would be amazing to have data on game stats, such as the final boards, which birds were played, what turn each bird was played, the final VPs, who won, etc. That would uncover some pretty cool stuff, I reckon. If you know of such a dataset, let me know!

Anyway, happy bird watching!

The post Wingspan Data Analysis appeared first on Dan Oehm | Gradient Descending.

**leave a comment**for the author, please follow the link and comment on their blog:

**R Archives - Dan Oehm | Gradient Descending**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.