Visualizing the Asian Cup with R!

January 10, 2019
By

(This article was first published on R by R(yo), and kindly contributed to R-bloggers)

Another year, another big soccer/football tournament! This time it’s the
top international competition in Asia, the Asian Cup hosted in the
U.A.E. In this blog post I’ll be covering (responsible) web-scraping, data wrangling
(tidyverse FTW!), and of course, data visualization with ggplot2.

Let’s get started!

Packages

pacman::p_load(tidyverse, scales, lubridate, ggrepel, stringi, magick, 
               glue, extrafont, rvest, ggtextures, cowplot, ggimage, polite)
# Roboto Condensed font (from hrbrmstrthemes)
loadfonts()

Top Goalscorers of the Asian Cup

The first thing I looked at was, “Who are the top goalscorers in the
history of the Asian Cup?”

Here I use the polite package to
take a look at the robots.txt for the web page and see if it is OK to
web scrape from it. First you pass the URL to the bow() function, check that you are
indeed allowed to scrape, then use scrape() to retrieve data, and the
rest is the usual rvest web-scraping workflow.

topg_url <- "https://en.wikipedia.org/wiki/AFC_Asian_Cup_records_and_statistics"

session <- bow(topg_url)

ac_top_scorers <- scrape(session) %>%
  html_nodes("table.wikitable:nth-child(29)") %>% 
  html_table() %>% 
  flatten_df() %>% 
  select(-Ref.) %>% 
  set_names(c("total_goals", "player", "country"))

For brevity, let’s only take a look at the top 5 goal scorers. I’ll also
mutate() in a nice image of a soccer ball for the data points on the
plot.

ac_top_scorers <- ac_top_scorers %>% 
  head(5) %>% 
  mutate(image = "https://www.emoji.co.uk/files/microsoft-emojis/activity-windows10/8356-soccer-ball.png")

I made something slightly different to your standard bar graph as I
use the geom_isotype_col() function from ggtextures to create a bar
of soccer ball images. Compared to other functions in ggtextures,
geom_isotype_col() allows each image to correspond to the value of the
variable you are plotting, in this case 1 ball = 1 goal!

ac_top_graph <- ac_top_scorers %>% 
  ggplot(aes(x = reorder(player, total_goals), y = total_goals,
             image = image)) +
  geom_isotype_col(img_width = grid::unit(1, "native"), img_height = NULL,
    ncol = NA, nrow = 1, hjust = 0, vjust = 0.5) +
  coord_flip() +
  scale_y_continuous(breaks = c(0, 2, 4, 6, 8, 10, 12, 14),
                     expand = c(0, 0), 
                     limits = c(0, 15)) +
  ggthemes::theme_solarized() +
  labs(title = "Top Scorers of the Asian Cup",
       subtitle = "Most goals in a single tournament: 8 (Ali Daei, 1996)",
       y = "Number of Goals", x = NULL,
       caption = glue("
                      Source: Wikipedia
                      By @R_by_Ryo")) +
  theme(text = element_text(family = "Roboto Condensed"),
        plot.title = element_text(size = 22),
        plot.subtitle = element_text(size = 14),
        axis.text = element_text(size = 14),
        axis.title.x = element_text(size = 16),
        axis.line.y = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_blank(),
        axis.ticks.y = element_blank())

ac_top_graph

OK, not bad. However, wouldn’t it be nice to add a bit more context? Specifically,
which country these players came from. So let’s add some flags along the y-axis!

There are lots of different ways to do this (like geom_flag() from the
ggimage package) but I ended up doing it the cowplot way. I had to
tweak the scales a bit as the flags came in different sizes. When you
plot, you just insert the image strip into the bar plot with
axis_canvas() and combine all the parts together with ggdraw()!

axis_image <- axis_canvas(ac_top_graph, axis = 'y') + 
  draw_image("https://upload.wikimedia.org/wikipedia/commons/c/ca/Flag_of_Iran.svg", 
             y = 13, scale = 1.5) +
  draw_image("https://upload.wikimedia.org/wikipedia/commons/0/09/Flag_of_South_Korea.svg", 
             y = 10, scale = 1.7) +
  draw_image("https://upload.wikimedia.org/wikipedia/en/9/9e/Flag_of_Japan.svg", 
             y = 7, scale = 1.7) +
  draw_image("https://upload.wikimedia.org/wikipedia/commons/f/f6/Flag_of_Iraq.svg", 
             y = 4, scale = 1.6) +
  draw_image("https://upload.wikimedia.org/wikipedia/commons/a/aa/Flag_of_Kuwait.svg", 
             y = 1, scale = 1.2)

ggdraw(insert_yaxis_grob(ac_top_graph, axis_image, position = "left"))

Ideally I wanted the soccer balls to be the official balls from the
tournament that the player scored in. However, I couldn’t find a nice
emoji-fied/icon-ized version and there was also the “small” problem in
that there was no “official” Asian Cup ball until the 2004 tournament in
China! You can take a look at the official Asian Cup balls
here.

Winners of the Asian Cup

We saw that the top goal scorers came from Iran, South Korea, Japan,
Iraq, and Kuwait but did their goal scoring exploits lead their nations
to glory? Let’s find out!

When web-scraping I really like using flatten_df() after
html_table() as I don’t have to use the awkward looking .[[1]]
within my piped workflow.

acup_url <- "https://en.wikipedia.org/wiki/AFC_Asian_Cup"

session <- bow(acup_url)

acup_winners_raw <- scrape(session) %>% 
  html_nodes("table:nth-child(31)") %>% 
  html_table() %>% 
  flatten_df()

Now I can use the clean_names() function to quickly clean up my column names
(mainly when I can’t be bothered to set_names() them myself…).

The next steps are splitting up the number of times a team placed
between 1st and 3rd and the year that occurred with separate(). Then variants of mutate() are used to tidy the string columns of the data into numeric type. I use gather() so each team will have a row for each of the rank positions (1st-3rd). Finally, I arrange the data in a way that the facets will be ordered in the way that I want.

acup_winners_clean <- acup_winners_raw %>% 
  janitor::clean_names() %>% 
  slice(1:8) %>% 
  select(-fourth_place, -semi_finalists, -total_top_four) %>% 
  separate(winners, into = c("Champions", "first_place_year"), 
           sep = " ", extra = "merge") %>% 
  separate(runners_up, into = c("Runners-up", "second_place_year"), 
           sep = " ", extra = "merge") %>% 
  separate(third_place, into = c("Third Place", "third_place_year"), 
           sep = " ", extra = "merge") %>% 
  mutate_all(funs(str_replace_all(., "–", "0"))) %>% 
  mutate_at(vars(contains("num")), funs(as.numeric)) %>% 
  mutate(team = if_else(team == "Israel1", "Israel", team)) %>% 
  gather(key = "key", value = "value", -team, 
         -first_place_year, -second_place_year, -third_place_year) %>% 
  mutate(key = key %>% 
           fct_relevel(c("Champions", "Runners-up", "Third Place"))) %>% 
  arrange(key, value) %>% 
  mutate(team = as_factor(team),
         order = row_number())

I plot using facets on the “key” variable (containing the rank data) so
that we can see how many times each team placed as Champions to Third
Place. I also use the glue() function here to format the multi-line
captions and titles in a neat way.

acup_winners_clean %>% 
  ggplot(aes(value, team, color = key)) +
  geom_point(size = 5) +
  scale_color_manual(values = c("Champions" = "#FFCC33",
                                "Runners-up" = "#999999",
                                "Third Place" = "#CC6600"),
                     guide = FALSE) +
  labs(x = "Number of Occurrence",
       title = "Winners & Losers of the Asian Cup!",
       subtitle = glue("
                       Ordered by number of Asian Cup(s) won.
                       Four-time Champions, Japan, only won their first in 1992!"),
       caption = glue("
                      Note: Israel was expelled by the AFC in 1974 while Australia joined the AFC in 2006.
                      Source: Wikipedia
                      By @R_by_Ryo")) +
  facet_wrap(~key) +
  theme_minimal() +
  theme(text = element_text(family = "Roboto Condensed"),
        title = element_text(size = 18),
        plot.subtitle = element_text(size = 12),
        axis.title.y = element_blank(),
        axis.title.x = element_text(size = 12),
        axis.text.y = element_text(size = 14),
        axis.text.x = element_text(size = 12),
        plot.caption = element_text(hjust = 0, size = 10),
        panel.border = element_rect(fill = NA, colour = "grey20"),
        panel.grid.minor.x = element_blank(),
        strip.text = element_text(size = 16)) 

Goals per Game

One new thing I learned very recently, while working on this viz in
fact, was using magrittr aliases! In this workflow I always wind up having to use .[x] or
.[[x]] but now I can just use extract() or extract2() respectively
to do the same thing!

wiki_url <- "https://en.wikipedia.org"
session <- bow(wiki_url)
acup_url <- "https://en.wikipedia.org/wiki/AFC_Asian_Cup"
session_cup <- bow(acup_url)

cup_links <- scrape(session_cup) %>% 
  html_nodes("br+ i a") %>% 
  html_attr("href") %>% 
  magrittr::extract(-17:-18)

acup_df <- cup_links %>% 
  as_data_frame() %>% 
  mutate(cup = str_remove(value, "\\/wiki\\/") %>% str_replace_all("_", " ")) %>% 
  rename(link = value)

Another cool thing I found while scraping this data was the jump_to()
function that allows you to navigate to a new URL. This makes
map()-ing over multiple URL links from a base URL very easy! Here, the
base URL is the AFC Asian Cup Wikipedia page and the function iterates
over each of the URL links of the respective tournament pages.
Another way that I could’ve done this was to map() over the different
dates of the tournaments as the Wikipedia page of each edition of the
Asian Cup only differed in the “year” appended at the beginning of the
URL.

goals_info <- function(x) {
  goal_info <- scrape(session) %>% 
    jump_to(x) %>% 
    html_nodes(".vcalendar") %>% 
    html_table(header = FALSE) %>% 
    flatten_df() %>% 
    spread(key = X1, value = X2) %>% 
    select(`Goals scored`) %>% 
    mutate(`Goals scored` = str_remove_all(`Goals scored`, pattern = ".*\\(") %>% 
             str_extract_all("\\d+\\.*\\d*") %>% as.numeric)
}

team_num_info <- function(x) {
  team_num_info <- scrape(session) %>% 
    jump_to(x) %>% 
    html_nodes(".vcalendar") %>% 
    html_table(header = FALSE) %>% 
    flatten_df() %>% 
    spread(key = X1, value = X2) %>% 
    select(`Teams`) %>% 
    mutate(`Teams` = as.numeric(`Teams`))
}

match_num_info <- function(x) {
  match_num_info <- scrape(session) %>% 
    jump_to(x) %>% 
    html_nodes(".vcalendar") %>% 
    html_table(header = FALSE) %>% 
    flatten_df() %>% 
    spread(key = X1, value = X2) %>% 
    janitor::clean_names() %>% 
    select(matches_played) %>% 
    mutate(matches_played = as.numeric(matches_played))
}

# all together:
goals_data <- acup_df %>% 
  mutate(goals_per_game = map(acup_df$link, goals_info) %>% unlist,
         team_num = map(acup_df$link, team_num_info) %>% unlist,
         match_num = map(acup_df$link, match_num_info) %>% unlist)

Next, I clean it up a bit and add in the number of teams that participated
in each tournament.

ac_goals_df <- goals_data %>% 
  mutate(label = cup %>% str_extract("[0-9]+") %>% str_replace("..", "'"),
         team_num = case_when(
           is.na(team_num) ~ 16,
           TRUE ~ team_num
         )) %>% 
  arrange(cup) %>% 
  mutate(label = factor(label, label),
         team_num = c(4, 4, 4, 5, 6, 6, 10, 10, 10, 8, 12, 12, 16, 16, 16, 16))

glimpse(ac_goals_df)
## Observations: 16
## Variables: 6
## $ link            "/wiki/1956_AFC_Asian_Cup", "/wiki/1960_AFC_Asi...
## $ cup             "1956 AFC Asian Cup", "1960 AFC Asian Cup", "19...
## $ goals_per_game  4.50, 3.17, 2.17, 3.20, 2.92, 2.50, 3.17, 1.83,...
## $ team_num        4, 4, 4, 5, 6, 6, 10, 10, 10, 8, 12, 12, 16, 16...
## $ match_num       6, 6, 6, 10, 13, 10, 24, 24, 24, 16, 26, 26, 32...
## $ label           '56, '60, '64, '68, '72, '76, '80, '84, '88, '9...

Now we make a line graph but with lots of annotate() code to add in
comments, labels, and segments for the labels. At the end I use
geom_emoji() to add a soccer ball to the plot for each of the data
points.

plot <- ac_goals_df %>% 
  ggplot(aes(x = label, y = goals_per_game, group = 1)) +
  geom_line() +
  scale_y_continuous(limits = c(NA, 5.35),
                     breaks = c(1.5, 2, 2.5, 3, 3.5, 4, 4.5)) +
  labs(x = "Tournament (Year)", y = "Goals per Game") +
  theme_minimal() +
  theme(text = element_text(family = "Roboto Condensed"),
        axis.title = element_text(size = 12),
        axis.text = element_text(size = 12)) +
  annotate(geom = "label", x = "'56", y = 5.15, family = "Roboto Condensed",
           color = "black", 
           label = "Total Number of Games Played:", hjust = 0) +
  annotate(geom = "text", x = "'60", y = 4.9, 
           label = "6", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 1, xend = 3, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'68", y = 4.9, 
           label = "10", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 3.8, xend = 4.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'72", y = 4.9, 
           label = "13", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 4.8, xend = 5.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'76", y = 4.9, 
           label = "10", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 5.8, xend = 6.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'84", y = 4.9, 
           label = "24", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 7, xend = 9, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = "'92", y = 4.9, 
           label = "16", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 9.8, xend = 10.2, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = 11.5, y = 4.9, 
           label = "26", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 11, xend = 12, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = 14.5, y = 4.9, 
           label = "32", family = "Roboto Condensed") +
  annotate(geom = "segment", x = 13, xend = 16, y = 4.8, yend = 4.8) +
  annotate(geom = "text", x = 9, y = 4, family = "Roboto Condensed",
           label = glue("
                        Incredibly low amount of goals in Group B
                        (15 in 10 Games) and in Knock-Out Stages
                        (4 goals in 4, only 1 scored in normal time)")) +
  annotate(geom = "segment", x = 9, xend = 9, y = 1.65, yend = 3.75,
           color = "red") +
  ggimage::geom_emoji(aes(image = '26bd'), size = 0.03) 

plot

ggsave(filename = glue("{here::here('Asian Cup 2019')}/gpg_plot_final.png"), 
       width = 8, height = 7, dpi = 300)
plot <- image_read(glue("{here::here('Asian Cup 2019')}/gpg_plot_final.png"))

However, I’m not finished yet! I wanted to try to make this look a bit
more “official” so I attempted to add the Asian Cup logo on the top
right corner. There are probably alternative ways to how I did it below,
especially by using grobs, but I was reminded of
this blog post by Daniel
Hadley
who used the magick package
to add a footer with a logo onto a ggplot object. I’ve used magick
before for animations and this was a good chance to try it out for image
editing. Compared to Daniel Hadley’s example I needed to have the logo
on the right corner so I had to create a blank canvas with image_blank() and then placing everything on top of that with image_composite() and image_append().

logo_raw <- image_read("https://upload.wikimedia.org/wikipedia/en/a/ad/2019_afc_asian_cup_logo.png")

logo_proc <- logo_raw %>% image_scale("600")

# create blank canvas
a <- image_blank(width = 1000, height = 100, color = "white")
# combine with logo image and shift logo to the right
b <- image_composite(image_scale(a, "x100"), image_scale(logo_proc, "x75"), 
                     offset = "+880+25")
# add in the title text
logo_header <- b %>% 
  image_annotate(text = glue("Goals per Game Throughout the History of the Asian Cup"),
                 color = "black", size = 24, font = "Roboto Condensed",
                 location = "+63+50", gravity = "northwest")

# combine it all together! 
final2_plot <- image_append(image_scale(c(logo_header, plot), "1000"), stack = TRUE)

# image_write(final2_plot,
#             glue("{here::here('Asian Cup 2019')}/gpg_plot_final.png"))

final2_plot

All in all it took a while to tweak the positions of the text and logo
image but for my first try it worked well. There is definitely room for
improvement in regards to sizing and scaling though.

Ultimately, I couldn’t find much information on why those tournaments in
the 80s in particular were such low scoring affairs. I wasn’t alive to
watch those games on TV nor could I find any illuminating articles or
blog posts on the style of Asian football back then… This was also
before Japan really got into soccer so there wasn’t anything I could
find in Japanese either.

Japan’s Record vs. Historical Rivals and Group D Opponents

Japan is the most successful team in the competition with 4
championships but who are their opponents in the group stages and how
have they fared against them in the past? While I’m at it I will also check Japan’s
records against long-time continental rivals such as Iran, South Korea,
Saudi Arabia and more recently, Australia.

The data I’m going to use comes from
Kaggle
which has all international football results from 1872 to the World Cup
final last year. To add in the federation affiliation (UEFA, AFC, etc.)
for each of the countries I slightly modified some code from one of the
kernels, “A Journey Through The History of
Soccer”

by PH Julien.

federation_files <- Sys.glob("../data/federation_affiliations/*")

df_federations = data.frame(country = NULL, federation = NULL)
for (f in federation_files) {
    federation = basename(f)
    content = read.csv(f, header=FALSE)
    content <- cbind(content,federation=rep(federation, dim(content)[1]))
    df_federations <- rbind(df_federations, content)
}

colnames(df_federations) <- c("country", "federation")

df_federations <- df_federations %>% 
  mutate(country = as.character(country) %>% str_trim(side = "both"))

Now to load the results data and then join it with the affiliations
data.

results_raw <- read_csv("../data/results.csv")

results_japan_raw <- results_raw %>% 
  filter(home_team == "Japan" | away_team == "Japan") %>% 
  rename(venue_country = country, 
         venue_city = city) %>% 
  mutate(match_num = row_number())

# combine with federation affiliations
results_japan_home <- results_japan_raw %>% 
  left_join(df_federations, 
            by = c("home_team" = "country")) %>% 
  mutate(federation = as.character(federation)) %>% 
  rename(home_federation = federation) 

results_japan_away <- results_japan_raw %>% 
  left_join(df_federations, 
            by = c("away_team" = "country")) %>% 
  mutate(federation = as.character(federation)) %>% 
  rename(away_federation = federation)

# combine home-away
results_japan_cleaned <- results_japan_home %>% 
  full_join(results_japan_away)

Next I need to edit some of the continents for teams that didn’t have a
match in the federation affiliation data set, for example, “South Korea”
is “Korea Republic” in the Kaggle data set.

results_japan_cleaned <- results_japan_cleaned %>% 
  mutate(
    home_federation = case_when(
      home_team %in% c(
        "China", "Manchukuo", "Burma", "Korea Republic", "Vietnam Republic",
        "Korea DPR", "Brunei") ~ "AFC",
      home_team == "USA" ~ "Concacaf",
      home_team == "Bosnia-Herzegovina" ~ "UEFA",
      TRUE ~ home_federation),
    away_federation = case_when(
      away_team %in% c(
        "China", "Manchukuo", "Burma", "Korea Republic", "Vietnam Republic",
        "Korea DPR", "Brunei", "Taiwan") ~ "AFC",
      away_team == "USA" ~ "Concacaf",
      away_team == "Bosnia-Herzegovina" ~ "UEFA",
      TRUE ~ away_federation
    ))

Now that it’s nice and cleaned up I can reshape it so that the data is
set from Japan’s perspective.

results_jp_asia <- results_japan_cleaned %>% 
  # filter only for Japan games and AFC opponents
  filter(home_team == "Japan" | away_team == "Japan",
         home_federation == "AFC" & away_federation == "AFC") %>% 
  select(-contains("federation"), -contains("venue"),
         -neutral, -match_num,
         date, home_team, home_score, away_team, away_score, tournament) %>% 
  # reshape columns to Japan vs. opponent
  mutate(
    opponent = case_when(
      away_team != "Japan" ~ away_team,
      home_team != "Japan" ~ home_team),
    home_away = case_when(
      home_team == "Japan" ~ "home",
      away_team == "Japan" ~ "away"),
    japan_goals = case_when(
      home_team == "Japan" ~ home_score,
      away_team == "Japan" ~ away_score),
    opp_goals = case_when(
      home_team != "Japan" ~ home_score,
      away_team != "Japan" ~ away_score)) %>% 
  # label results from Japan's perspective
  mutate(
    result = case_when(
      japan_goals > opp_goals ~ "Win",
      japan_goals < opp_goals ~ "Loss",
      japan_goals == opp_goals ~ "Draw"),
    result = result %>% as_factor() %>% fct_relevel(c("Win", "Draw", "Loss"))) %>% 
  select(-contains("score"), -contains("team"))

With all that done we can take a look at how Japan have done against
certain opponents by using filter().

results_jp_asia %>% 
  filter(opponent == "Jordan",
         tournament == "AFC Asian Cup")
## # A tibble: 3 x 7
##   date       tournament    opponent home_away japan_goals opp_goals result
##                                       
## 1 2004-07-31 AFC Asian Cup Jordan   home                1         1 Draw  
## 2 2011-01-09 AFC Asian Cup Jordan   home                1         1 Draw  
## 3 2015-01-20 AFC Asian Cup Jordan   home                2         0 Win

Unfortunately, this data set doesn’t go into extra-time or penalty wins
as Japan’s Quarter-Final meeting with Jordan in 2004 ended with Japan
securing a route to the semis, 4-3 on penalties!

I can create a function that’ll filter for certain opponents and
tournaments and aggregate the results. With the second argument being
..., tidyeval allows me to input any kind of filter condition for an
opponent, tournament, etc. The if else statement protects against
cases where Japan never had that type of result against an opponent and
makes sure that a column populated by 0s is created.

japan_versus <- function(data, ...) {
  # filter 
  filter_vars <- enquos(...)
  
  jp_vs <- data %>% 
    filter(!!!filter_vars) %>% 
    # count results type per opponent
    group_by(result, opponent) %>% 
    mutate(n = n()) %>% 
    ungroup() %>% 
    # sum amount of goals by Japan and opponent
    group_by(result, opponent) %>% 
    summarize(j_g = sum(japan_goals),
              o_g = sum(opp_goals),
              n = n()) %>% 
    ungroup() %>% 
    # spread results over multiple columns
    spread(result, n) %>% 
    # 1. failsafe against no type of result against an opponent
    # 2. sum up counts per opponent
    group_by(opponent) %>% 
    mutate(Win = if("Win" %in% names(.)){return(Win)} else{return(0)},
         Draw = if("Draw" %in% names(.)){return(Draw)} else{return(0)},
         Loss = if("Loss" %in% names(.)){return(Loss)} else{return(0)}) %>% 
    summarize(Win = sum(Win, na.rm = TRUE),
              Draw = sum(Draw, na.rm = TRUE),
              Loss = sum(Loss, na.rm = TRUE),
              `Goals For` = sum(j_g),
              `Goals Against` = sum(o_g))
  
  return(jp_vs)
}

Now let’s try it out a bit.

japan_versus(data = results_jp_asia, 
             opponent == "China")
## # A tibble: 1 x 6
##   opponent   Win  Draw  Loss `Goals For` `Goals Against`
##                           
## 1 China       14     8    10          54              45

I can put in multiple filter conditions if needed as well.

japan_versus(data = results_jp_asia,
             home_away == "home",
             opponent %in% c("Palestine", "Vietnam", "India"))
## # A tibble: 3 x 6
##   opponent    Win  Draw  Loss `Goals For` `Goals Against`
##                            
## 1 India         2     0     0          13               0
## 2 Palestine     1     0     0           4               0
## 3 Vietnam       1     0     0           1               0

As you can see Japan has never lost or drawn against India, Palestine,
or Vietnam so in the data there wouldn’t have been any rows with “Loss”
in the results column. With the function I created I was able to impute
results that didn’t exist and fill them in with 0s!

Let’s check Japan’s performance against our main rivals in the Asian
Cup. Here I make the tables look a lot nicer with the options in the
kable and kableExtra packages.

results_jp_asia %>% 
  japan_versus(opponent %in% c("Iran", "Korea Republic", "Saudi Arabia"),
               tournament == "AFC Asian Cup") %>% 
  knitr::kable(format = "html",
               caption = "Japan vs. Historic Rivals in the Asian Cup") %>% 
  kableExtra::kable_styling(full_width = FALSE) %>% 
  kableExtra::add_header_above(c(" ", "Result" = 3, "Goals" = 2))
Japan vs. Historic Rivals in the Asian Cup
Result

Goals

opponent Win Draw Loss Goals For Goals Against
Iran 1 2 0 1 0
Korea Republic 0 2 1 2 4
Saudi Arabia 4 0 1 13 4

Now let’s take a look at how Japan have historically played against the
other teams in Group F of this year’s Asian Cup (in all competitions).

results_jp_asia %>% 
  japan_versus(opponent %in% c("Oman", "Uzbekistan", "Turkmenistan")) %>% 
  knitr::kable(format = "html",
               caption = "Japan's Record vs. Group F Teams") %>% 
  kableExtra::kable_styling(full_width = FALSE) %>% 
  kableExtra::add_header_above(c(" ", "Result" = 3, "Goals" = 2))
Japan’s Record vs. Group F Teams
Result

Goals

opponent Win Draw Loss Goals For Goals Against
Oman 8 3 0 19 4
Uzbekistan 6 3 1 28 9

We see no rows here for Turkmenistan. This is due to the fact that until
just this past week Japan had never played against them in a
friendly or competitive game!

Conclusion

In this blog post I went through a few examples of visualizing some very
basic stats on the Asian Cup happening this month. I’ll devote this last
section on my views on this edition of the Asian Cup and Japan’s national
team.

Although Japan’s first game was quite horrible I’m hoping it’ll wake
the players and coaches out of their complacency and not underestimate our
opponents in the next two games. Thankfully, South Korea should be on the other side of the bracket for the knock-out stages and we would also only meet Iran in the semifinals
(provided both teams finish top of their respective groups). Japan could
meet Australia in the Quarters but without Aaron Mooy they’re a much
weaker side as shown in their abject loss to Jordan in their opening
match.

Even with losing our new star, Shoya Nakajima, to injury the fact that we
can replace him with a player of the calibre of Takashi Inui and with
Hannover regular, Genki Haraguchi, stepping up from the bench shows how
much Japanese football has progressed these past 25 years.

It’s a changing of the guard for Japan after the retirement of captain Hasebe
and Keisuke Honda but with more Japanese players headed to Europe from
a young age these are exciting times to be a Japanese football fan. It’s been
quite awe-inspiring seeing how the number of Japanese players playing for
foreign clubs have been steadily increasing since the 1988 Asian Cup squad (Japan’s first
appearance at a major tournament, minus the Olympics).

This tournament is the first hurdle for this new generation of players as they
fight to become regulars for the national team and begin the journey to the next
World Cup in 2022. Here’s hoping for another great month of football!



(Image Source: Nikkan Sports)

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)