Site icon R-bloggers

Dimension reduction

[This article was first published on r.iresmi.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Floura – Light Bloom – The Art of Hybycozo – Desert Botanical Garden – CC-BY-NC by Alan English CPA

Day 6 of 30DayMapChallenge: « Dimensions » (previously).

According to Wikipedia, Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction technique. It will allow us to project many dimensions (well, only 3 in this example) onto a 2D plane.

library(sf)
library(umap)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggrepel)
library(glue)

options(scipen = 100)
< section id="data" class="level2">

Data

We’ll use the french communes (get the data from this post).

com <- read_sf("~/data/adminexpress/adminexpress_cog_simpl_000_2022.gpkg",
               layer = "commune") |>
  st_centroid() |>
  mutate(x = st_coordinates(geom)[, 1],
         y = st_coordinates(geom)[, 2])
< section id="umap" class="level2">

UMAP

The dimensions taken into account are: location (x, y) and population. These variables should be scaled but the result is prettier without scaling…

umaps_params <- umap.defaults
umaps_params$random_state <- 20251106

com_umap <- com |>
  st_drop_geometry() |>
  select(x, y, population) |>
  # scale() |> 
  umap(config = umaps_params)

res <- com_umap$layout |>
  as_tibble(.name_repair = "universal") |>
  bind_cols(com) |>
  rename(UMAP1 = 1,
         UMAP2 = 2)
< section id="map" class="level2">

Map

res |>
  ggplot(aes(UMAP1, UMAP2, color = population)) +
  geom_point() +
  geom_text_repel(data = filter(res, 
                                statut %in% c("Préfecture", 
                                              "Préfecture de région",
                                              "Capitale d'état")),
                  aes(label = nom),
                  size = 3, force = .5, force_pull = 0.5, max.overlaps = 1e6,
                  bg.colour = "#ffffffaa", bg.r = .2, alpha = .6) +
  scale_color_viridis_c(trans = "log1p", option = "H",
                        breaks = c(1000, 50000, 500000, 2000000)) +
  coord_equal() +
  labs(title = "Uniform manifold approximation and projection of french communes",
       subtitle = "by location and population",
       caption = glue("https://r.iresmi.net/ - {Sys.Date()}
                      data from IGN Adminexpress 2022")) +
  theme_minimal() +
  theme(plot.caption = element_text(size = 6, 
                                    color = "darkgrey"))
Figure 1: A UMAP representation of the french communes
< !-- -->
To leave a comment for the author, please follow the link and comment on their blog: r.iresmi.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version