French dataset: population and GPS coordinates

[This article was first published on Freakonometrics - Tag - R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A short post today based on recent work by @3wen (Ewen Galic, graduate Student in Rennes, spending a year in Montreal). Since we were working on a detailed French dataset (per commune), we needed a dataset containing a list all communes, with population and location. GPS coordinates were extracted from Google, using the following php file, inspired by http://www.andrew-kirkpatrick.com/ on Google geocoding api with php webpage. Population was interpolated from INSEE’s datasets, i.e. http://www.insee.fr/ (since data are over a 35 year period, from 1975 to 2010, changes have been taken into account as carefully are possible – e.g. merges and splits of cities – based on that description). A spline model has been used for all cities (with three degrees of freedom, and null and negative interpolation became one, since we’ll be using loglinear models afterwards). Names are from that dataset, still on INSEE’s website, http://www.insee.fr/.

A zipped file can be downloaded here popfr19752010.zip, but it is also possible to use the code below (it is a 24Mo dataset). Since it was hard to find such a dataset online (different files can be found, but we found none with population and location), we have decided to upload that dataset. Please let us know if there are problems with those data…

> base=read.csv(
+ "http://freakonometrics.free.fr/popfr19752010.csv",
+ header=TRUE)

Using that code, it is possible to locate all the communes in France (metropolitan), for instance

> library(maps)
> map("france")
> points(base$long,base$lat,cex=.1,col="red",pch=19)
> points(base$long,base$lat,cex=2*base$pop_2010/
+ max(base$pop_2010),col="blue",pch=19)

Several additional lines of code on that dataset (and also others) will be uploaded, soon.

Cette oeuvre est mise à disposition sous licence Paternité – Partage à l’Identique 3.0 non transposé. Pour voir une copie de cette licence, visitez http://creativecommons.org/. Date : 24 mai 2012, par Ewen GALLIC. Sources : INSEE, API Google Maps v3 et GeoHack (coordonnées GPS), propres calculs (estimation de population à partir des données INSEE).
  • reg : code region INSEE (character)
  • dep : code departement INSEE (character, corse 201 et 202 au lieu de 2A et 2B)
  • com : code commune INSEE (character)
  • article : article du nom de la commune (character)
  • com_nom : nom de la commune (character)
  • long : longitude (numeric)
  • lat : latitude (numeric)
  • pop_i : estimation de la population à la date i (ramenée à 1 si <=0), i=1975,...,2010 (numeric)

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics - Tag - R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)