Street names

[This article was first published on r.iresmi.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Lyon, vue depuis la basilique Notre-Dame de Fourvière

Lyon – CC-BY-NC-ND by Emmanuel Fromm

Day 2 of 30DayMapChallenge: « Lines » (previously).

We’ll make a map of the street name gender in Lyon. We need a database of french first names where we’ll find the gender. We will extract the Lyon streets from OpenStreetMap.

library(arrow)
library(dplyr)
library(tidyr)
library(readr)
library(purrr)
library(ggplot2)
library(stringr)
library(sf)
library(osmdata)
library(ggspatial)
library(glue)
library(knitr)

set.seed(42)

First names

if (!file.exists("freq_prenoms.rds")) {
  freq_prenoms <- read_parquet("https://www.insee.fr/fr/statistiques/fichier/8205621/prenoms-2023-nat.parquet") |> 
    filter(preusuel != "_PRENOMS_RARES") |> 
    mutate(preusuel = iconv(preusuel, to = "ASCII//TRANSLIT")) |> 
    group_by(preusuel, sexe) |> 
    summarise(n = sum(nombre, na.rm = TRUE),
              .groups = "drop_last") |>
    mutate(total = sum(n)) |> 
    ungroup() |> 
    mutate(sexe = case_when(sexe == 1 ~ "M",
                            sexe == 2 ~ "F",
                            .default = NA_character_)) |> 
    pivot_wider(names_from = sexe, 
                values_from = n,
                values_fill = 0) |> 
    mutate(across(c(M, F), \(x) x / total)) |> 
    write_rds("freq_prenoms.rds")
} else {
  freq_prenoms <- read_rds("freq_prenoms.rds")
}

We have 34234 first names and their gender frequencies since 1900.

Sample of first names
preusuel total M F
ZENABOU 48 0 1
EMILIENE 25 0 1
KINGSLEY 878 1 0
DOLOVAN 73 1 0
ERCOLE 67 1 0
YVA 178 0 1
ISSEY 79 1 0
SAWSSEN 121 0 1
MISBAH 24 0 1
GOHANN 20 1 0

Map data

lyon_bbox <- getbb("Lyon, France", featuretype = "city")

if (!file.exists("osm.rds")) {
  lyon <- opq(lyon_bbox) |>
    add_osm_features(features = c(
      '"highway"="motorway"',
      '"highway"="trunk"',
      '"highway"="primary"',
      '"highway"="secondary"',
      '"highway"="tertiary"',
      '"highway"="motorway_link"',
      '"highway"="trunk_link"',
      '"highway"="primary_link"',
      '"highway"="secondary_link"',
      '"highway"="tertiary_link"',
      '"highway"="motorway_junction"',
      '"highway"="unclassified"',
      '"highway"="service"',
      '"highway"="pedestrian"',
      '"highway"="living_street"',
      '"highway"="residential"')) |> 
    osmdata_sf() |> 
    pluck("osm_lines") |> 
    select(osm_id, name) |> 
    drop_na(name) |> 
    group_by(name) |> 
    summarise() |> 
    write_rds("osm.rds")
} else {
  lyon <- read_rds("osm.rds")
}

Finding first names in street names

We use a brute-force method: for each street we check if a part of it’s label is present in our list of female or male first names. We keep only first names with a high frequency in any of the genders.

female <- freq_prenoms |> 
  filter(F > .8,
         str_length(preusuel) > 1,
         preusuel != "LA") |> 
  pull(preusuel)

male <- freq_prenoms |> 
  filter(M > .8, 
         str_length(preusuel) > 1) |> 
  pull(preusuel)

street_gender <- lyon |> 
  mutate(name = str_to_upper(iconv(name, to = "ASCII//TRANSLIT")),
         m = str_extract_all(name, glue_collapse(male, sep = "\\b|\\b", last = "\\b")),
         f = str_extract_all(name, glue_collapse(female, sep = "\\b|\\b", last = "\\b")),
         gender = unlist(map2(f, m, ~ case_when(length(.y) > length(.x) ~ "male",
                                             length(.x) > length(.y) ~ "female",
                                             identical(.x, character(0)) & 
                                               identical(.y, character(0)) ~ "not concerned",
                                             length(.x) == length(.y) ~ "undecidable",
                                             .default = NA_character_))))
Sample of classification
name geometry m f gender
COURS DE VERDUN RECAMIER LINESTRING (4.830426 45.748… not concerned
IMPASSE DES ANGLAIS LINESTRING (4.795807 45.753… not concerned
RUE DES PROVENCES LINESTRING (4.79335 45.7369… not concerned
CHEMIN DES PEUPLIERS LINESTRING (4.866587 45.801… not concerned
ALLEE DU LEVANT LINESTRING (4.878859 45.759… not concerned
RUE ROPOSTE LINESTRING (4.866353 45.760… not concerned
ALLEE NELLIE BLY LINESTRING (4.84882 45.7429… NELLIE female
QUAI JEAN MOULIN MULTILINESTRING ((4.837853 … JEAN male
LA VIEILLE ROUTE LINESTRING (4.769782 45.720… not concerned
AVENUE DE CHAMPAGNE MULTILINESTRING ((4.796801 … not concerned

Map

street_gender |> 
  mutate(gender = factor(gender, levels = c("female", "male", "undecidable", "not concerned"))) |> 
  st_set_crs("EPSG:4326") |> 
  ggplot() +
  geom_sf(aes(color = gender), 
          linewidth = .5,
          key_glyph = "timeseries") +
  scale_color_manual(values = c("female" = "lightpink1",
                                "male" = "lightskyblue",
                                "undecidable" = "lightyellow4",
                                "not concerned" = "seashell2")) +
  annotation_scale(bar_cols =  c("darkgrey", "white"),
                   line_col = "darkgrey",
                   text_col = "darkgrey",
                   height = unit(0.1, "cm")) +
  coord_sf(xlim = lyon_bbox[c(1, 3)],
           ylim = lyon_bbox[c(2, 4)]) +
  labs(title = "Gender in Lyon street names",
       color = "",
       caption = glue("Map data © OpenStreetMap contributors
                      using INSEE Fichier des prénoms 2023
                      r.iresmi.net - {Sys.Date()}")) +
  theme_void() +
  theme(plot.background = element_rect(color = NA, 
                                       fill = "white"),
        plot.caption = element_text(size = 5,
                                    color = "darkgrey"))

A map of gender in Lyon street names

Lyon

Possible miss-classifications

Lots of bias make this map unreliable, and would need manual editing…

epicenous first names

  • some first names can be male or female (GWEN, CAMILLE, DOMINIQUE)

not concerned

  • street names of people but without the first name (RUE VILLON),
  • title instead of first name (RUE DE L’AMIRAL COURBET),

has a gender but shouldn’t

  • common names used as first name (CHEMIN DE LA POMME), mainly for girls…
  • strange first names (AUTOROUTE DU SOLEIL, Soleil seems to be a girl name…)

accidentally well classified

  • the last name is also a first name (COURS BAYARD)
To leave a comment for the author, please follow the link and comment on their blog: r.iresmi.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)