French dataset: population and GPS coordinates

May 24, 2012

(This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers)

A short post today based on recent work by @3wen (Ewen Galic, graduate Student in Rennes, spending a year in Montreal). Since we were working on a detailed French dataset (per commune), we needed a dataset containing a list all communes, with population and location. GPS coordinates were extracted from Google, using the following php file, inspired by on Google geocoding api with php webpage. Population was interpolated from INSEE’s datasets, i.e. (since data are over a 35 year period, from 1975 to 2010, changes have been taken into account as carefully are possible – e.g. merges and splits of cities – based on that description). A spline model has been used for all cities (with three degrees of freedom, and null and negative interpolation became one, since we’ll be using loglinear models afterwards). Names are from that dataset, still on INSEE’s website,

A zipped file can be downloaded here, but it is also possible to use the code below (it is a 24Mo dataset). Since it was hard to find such a dataset online (different files can be found, but we found none with population and location), we have decided to upload that dataset. Please let us know if there are problems with those data…

> base=read.csv(
"", + header=TRUE)

Using that code, it is possible to locate all the communes in France (metropolitan), for instance

> library(maps)
> map("france")
> points(base$long,base$lat,cex=.1,col="red",pch=19)
> points(base$long,base$lat,cex=2*base$pop_2010/
+ max(base$pop_2010),col="blue",pch=19)

Several additional lines of code on that dataset (and also others) will be uploaded, soon.

Cette oeuvre est mise à disposition sous licence Paternité – Partage à l’Identique 3.0 non transposé.
Pour voir une copie de cette licence, visitez Date : 24 mai 2012, par Ewen GALLIC. Sources : INSEE, API Google Maps v3 et GeoHack (coordonnées GPS), propres calculs (estimation de population à partir des données INSEE).

  • reg : code region INSEE (character)
  • dep : code departement INSEE (character, corse 201 et 202 au lieu de 2A et 2B)
  • com : code commune INSEE (character)
  • article : article du nom de la commune (character)
  • com_nom : nom de la commune (character)
  • long : longitude (numeric)
  • lat : latitude (numeric)
  • pop_i : estimation de la population à la date i (ramenée à 1 si <=0), i=1975,…,2010 (numeric)

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , , , , ,

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)