Crowded Cities, Paris, Hong Kong and Montréal
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the past years, I’ve been living in different cities, all of them being completely different, compared with the others. I have been living in Paris, which is a big city in Europe, with a large neighborhood, too (la banlieue).

Then, I’ve been living in Hong Kong, which is a larger city, in Asia.

It was crowded. I mean, it was the feeling I had, while I was living there. And more recently, I’ve been living in Montréal, in North America. Montreal is a large city. Or to be more specific, and island,

The three cities are quite different. Paris, 2.211 million unhabitants, and 105,4 km² (density 21,057 unhabitants per km²). Montréal, 1.621 million unhabitants, and three times wider 365.1 km² (density 4,441 unhabitants per km²). Hong Kong, 7.234 million unhabitants, and again three times wider 1,104 km² (density 6,553 unhabitants per km²). In Hong Kong, there are several hill where it is not possible to build anything: on a large part of the island, the density is null.
Now, if we want compare the three cities, on the same scale, we need some shapefiles of those cities. For Paris, we can find one on http://cartelec.net/,
>  library(maptools)
>  library(PBSmapping)
>  library(rgdal)
>  library(classInt)
>  library(sp)
>  library(gpclib)
>  library(splancs)
> paris=readShapeSpatial("paris-cartelec.shp")
or (more convenient) http://actualitix.com/
> paris=readShapePoly("75-Paris.shp")
For Montréal, we use http://geocoder.ca/,
> montreal=readShapePoly( "montreal_borough_borders.shp")
and finally, for Hong Kong, http://seisman.info/
> hongkong=readShapePoly("HKG_adm0.shp")
In Montreal and Hong Kong, it is slightly more complicated, since we have islands, and in Paris, a of (small) administrative subregions,
> plot(montreal,border="light green",col="light green",xlim=c(-74.1,-73.5))
> lp=hongkong@polygons[[1]]
> for(i in 1:length(slot(lp,"Polygons"))){
+ hk=slot(slot(lp,"Polygons")[[i]],"coords"  )
+ polygon(hk[,1]-114.2-73.7,hk[,2]-22.25+45.48,col="light blue",border="black",density=25)
+ }
> pr=slot(slot(paris@polygons[[1]],"Polygons")[[1]],"coords"  )
> polygon(pr[,1]-2.34-73.7,pr[,2]-48.86+45.48,col="orange",density=25)

The sizes of the three cities are quite different. The architectures are very different, too. For instance, in Paris

We can extract the information from openstreetmap,
> library(osmar) > src=osmsource_api() > loc.montreal=c(45.5258,-73.5850) > loc.paris=c(48.831327, 2.321628) > loc.hongkong=c(22.27923,114.18508)
If we focus on Paris, to get everything that is in a square with 1.2km large, centered on a specific location,
> city=loc.paris > bb=center_bbox(city[2],city[1], 1200, 1200) > ua=get_osm(bb, source = src)
such as the buildings
> bg_ids=find(ua, way(tags(k=="building"))) > bg_ids=find_down(ua, way(bg_ids)) > bg=subset(ua, ids = bg_ids) > bg_poly=as_sp(bg, "polygons") > plot(bg_poly, col = gray.colors(12)[11],border="gray")
or water ways etc. (if any)
> nat_ids=find(ua, way(tags(k=="waterway"))) > nat_ids=find_down(ua, way(nat_ids)) > nat=subset(ua, ids = nat_ids) > nat_poly=as_sp(nat, "polygons") Error: any(has_data(obj)) is not TRUE
(here, there are no water ways), or parks,
> nat_ids=find(ua, way(tags(k=="leisure"))) > nat_ids=find_down(ua, way(nat_ids)) > nat=subset(ua, ids = nat_ids) > nat_poly=as_sp(nat, "polygons") > plot(nat_poly, col = "#99dd99",add=TRUE,border="#99dd99")
and streets,
> cw_ids=find(ua, way(tags(v %in% c("residential","pedestrian"))))
> cw_ids=find_down(ua, way(cw_ids))
> cw=subset(ua, ids = cw_ids)
> cw_line=as_sp(cw, "lines")
> plot(cw_line, add = TRUE, col = gray.colors(12)[5],lwd=3)

The problem with openstreetmap is that data can be quite heterogeneous. Paris was interesting, because all the buildings are there (almost, I guess). In Hong Kong, it is still fine,

> city=loc.hongkong

But in Montréal, a lot of buildings are missing
> city=loc.montreal


So it might not be useful to use those data to visualize how dense cities can be.
To get a better understanding of densities, we need to get information about the population. The problem, when studying the density of the population, is that it is very sensitive to the granularity. For instance, in Hong Kong, it is difficult to have a small granularity. All I can find, is based on the standard districts
> hongkong=readShapePoly("HKG_adm1.shp")
> plot(hongkong)
> densityhk=read.table("hkdensite.txt")
If we plot it on a map, we get (we need to make sure that the oder of the district in the shapefile matches with the order in the wikipedia dataset)
> library(RColorBrewer) > region=c(1,10,2,11,3,4,5,12,17,6,7,8,18,15,16,13,14,9) > plotclr=brewer.pal(7,"RdYlBu")[7:1] > class=classIntervals(D$V1[region], nclr, style="fixed",dataPrecision=1, + fixedBreaks=c(0,1e4,2e4,3e4,4e4,5e4,6e4,8e4)) > colcode=findColours(class, plotclr) > par(mar=c(1,1,1,1)) > plot(hongkong,col=colcode,border=colcode) > legend(113.85,22.6,legend= names(attr(colcode,"table")), fill=attr(colcode, "palette"), cex=1, bty="n",title="Density (hab/km2)")

Unfortunately, it is rather poor. It is possible to work on much small areas, such as contituency areas : one can find the population in all those regions, but I could not find the appropriate shapefile. I guess that some parts of the new territories have a quite high density.
In Paris, on the other hand, there is a much smaller granularity. To compute the areas, we can use
>  paris=readShapePoly("paris-cartelec.shp")
>  area=lapply(paris@polygons, function(x) sapply(x@Polygons, function(y) y@area)) 
>  va=rep(NA,length(area))
>  for(i in 1:length(va)){va[i]=sum(area[[i]])}
>  basearea=data.frame(A=va,N=paris$BUREAU)
The population per are is available in a file mentioned in some old post,
>  parisp=read.csv("/home/arthur/Dropbox/paris-bv-insee-07.csv",header=TRUE,sep=";")
>  basepop=data.frame(P=parisp$POP,N=parisp$BV)
>  base=merge(basepop,basearea)
>  base$D=base$P/base$A*1e6
Now, we simply have to plot it,
> library(RColorBrewer) > plotclr <- brewer.pal(7,"RdYlBu")[7:1] > class <- classIntervals(base$D, nclr, style="fixed",dataPrecision=1, + fixedBreaks=c(0,1e4,2e4,3e4,4e4,5e4,6e4,8e4)) > colcode <- findColours(class, plotclr) > par(mar=c(1,1,1,1)) > plot(paris,col=colcode,border=colcode) > legend(656274.9, 6867308, legend=names(attr(colcode,"table")), fill=attr(colcode, "palette"), cex=1, bty="n",title="Density (hab/km2)"))

Some areas in Paris have a very high density. But it is not realistic to compare it with Hong Kong, with such a different granularity. If anyone can help me to find the shapefile of contituency areas, that would probabily be much more informative (at least to take into account properly those regions where no one can live).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
