Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I was reading Phoenician colonization from its origin to the 7th century BC (Manzano-Agugliaro et al. 2025) and thought it was an interesting dataset, but alas: it is split in four tables, behind a javascript redirect (wtf Taylor & Francis?) and with DMS coordinates (including typos and special characters)… So not easily reusable.
Let’s go build an accessible dataset.
< section id="config" class="level2">Config
library(readr) library(purrr) library(dplyr) library(stringr) library(ggplot2) library(forcats) library(janitor) library(sf) library(rnaturalearth) library(glue) library(parzer) library(leaflet) sf_use_s2(FALSE) knitr::knit_hooks$set(crop = knitr::hook_pdfcrop)
Data
We need to manually download the CSVs (parts 1, 2, 3 and 4) because there is an antiscraping mechanism… Then a little cleaning and coordinates parsing with the very nice {parzer} package let us build a spatial object with {sf}.
sources = list(
c_10_bce = "data_raw/T0001-10.1080_17445647.2025.2528876.csv",
c_09_bce = "data_raw/T0002-10.1080_17445647.2025.2528876.csv",
c_08_bce = "data_raw/T0003-10.1080_17445647.2025.2528876.csv",
c_07_bce = "data_raw/T0004-10.1080_17445647.2025.2528876.csv"
)
phoenician <- sources |>
imap(\(f, c) { read_csv(f) |>
mutate(century_start_bce = parse_number(c))}) |>
list_rbind() |>
clean_names() |>
mutate(lon = parse_lon(str_replace(longitude_e, "−", "-")),
lat = parse_lat(str_replace(latitude_n, ",", "."))) |>
st_as_sf(coords = c("lon", "lat"), crs = "EPSG:4326")
Maps
The resulting layer, mapped on a Natural Earth background, seems good.
world <- ne_countries() |>
st_intersection(phoenician |>
st_bbox() |>
st_as_sfc() |>
st_buffer(4, joinStyle = "MITRE", mitreLimit = 10))
phoenician |>
ggplot() +
geom_sf(data = world) +
geom_sf(aes(color = fct_rev(as_factor(century_start_bce)))) +
theme_void() +
labs(title = "Phoenician colonies",
subtitle = "10th c. BCE - 7th c. BCE",
color = "from\n(century BCE)",
caption = glue("data doi:10.1080/17445647.2025.2528876
https://r.iresmi.net/ {Sys.Date()}")) +
theme_minimal() +
theme(plot.caption = element_text(size = 6),
plot.background = element_rect(fill = "white"))
You want more interactivity?
Using {leaflet}…
phoenician |>
leaflet() |>
addTiles(attribution = r"(
<a href="https://r.iresmi.net/">r.iresmi.net</a>.
data: Manzano-Agugliaro et al. 2025. doi:10.1080/17445647.2025.2528876;
map: <a href="https://www.openstreetmap.org/copyright/">OpenStreetMap</a>)") |>
addCircleMarkers(popup = ~ glue("<b>{settlement}</b><br /><br />
from {century_start_bce}th c. BCE \\
{if_else(!is.na(centuries_of_subsequent_permanence),
paste0('<br />to ', centuries_of_subsequent_permanence), '')}"),
clusterOptions = markerClusterOptions())
Export
We can build a clean Geopackage (and a CSV just in case):
phoenician |>
st_write(
"data/phoenician_settlements.gpkg",
layer = "phoenician_settlements",
layer_options = c(
"IDENTIFIER=Phoenician colonization from its origin to the 7th century BC",
glue("DESCRIPTION=Data from:
Manzano-Agugliaro, F., Marín-Buzón, C., Carpintero-Lozano, S., & López-Castro, J. L. (2025). \\
Phoenician colonization from its origin to the 7th century BC. Journal of Maps, 21(1). \\
https://doi.org/10.1080/17445647.2025.2528876
Available on https://doi.org/10.5281/zenodo.17141060
Extracted on {Sys.Date()} – https://r.iresmi.net/posts/2025/phoenician")),
delete_layer = TRUE,
quiet = TRUE)
phoenician |>
select(-c(latitude_n, longitude_e)) |>
bind_cols(st_coordinates(phoenician)) |>
rename(lon_wgs84 = X,
lat_wgs84 = Y) |>
st_drop_geometry() |>
write_csv("data/phoenician_settlements.csv")
And lastly we store them in a public repository; they are now available on Zenodo and therefore even have a doi:10.5281/zenodo.17141060
< !-- -->References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
