# A brief script on Geographical data analysis in R

(This article was first published on jkunst.com: Entries for category R, and kindly contributed to R-bloggers)

I saw this post and I decided to replicated that good example but with data closer to me, particulary data of my country. So, I've got the shape data of the capital of my country (You can download the data from here). The data comes from the 2002 CENSO… some old but very useful for this propose. The objective of this post is learn how to read, treat and plot this type of data.

The packages what we'll need are:

library(maptools)
library(ggplot2)
library(plyr)


We need the data! Note that some shape file have data.

download.file('http://jkunst.com/media/data/SCL_Shape.zip',
'SCL_Shape.zip') unzip('SCL_Shape.zip')
mapdata <- map@data


Now we prepare the data obtaing the longitudes and latitudes from the data. We use the fortify (ggplot's) function. The result after apply this funtcion we don't now what group corresponds to a specific area (in this example a 'comuna'). Then we obtain the centers of the areas to put the names in a plot (for example).

cordsdata <- fortify(map)
cordsdata <- join(cordsdata,
data.frame(group = unique(cordsdata$group), COMUNA = unique(mapdata$COMUNA)),
type = "left", by = "group")

mapdata <- join(mapdata,
ddply(cordsdata, .(COMUNA),
summarize, CENTER.LONG = mean(long),
CENTER.LAT = mean(lat)))


I this post we study only the comunas in the Santiago province. So, we select this cases. And plot the comunas for a first view to the data.

data_cords <- subset(join(cordsdata, mapdata, type = "left"), PROVINCIA == "SANTIAGO")
data_map <- subset(mapdata, PROVINCIA == "SANTIAGO")

g <- ggplot(data_cords) + theme(legend.position = "none", axis.ticks = element_blank(),
axis.text = element_blank(), axis.title = element_blank())

# Showing the comunas of Santiago
g + geom_polygon(aes(x = long, y = lat, group = group), color = "gray", alpha = 0.7) +
geom_text(aes(x = CENTER.LONG, y = CENTER.LAT, label = COMUNA), size = 2,
color = "white") + ggtitle("The Comunas in Santiago")


Now we plot the previous map but showing the comunas with more density (population/area). The firts of this comunas are Lo Prado o San Ramón, this comunas are places were exists a lot of suburbs.

head(as.character(data_map[order(data_map$DENSIDAD, decreasing = T), ]$COMUNA),
5)

## [1] "LO PRADO"            "SAN RAMON"           "CERRO NAVIA"
## [4] "LO ESPEJO"           "PEDRO AGUIRRE CERDA"

g + geom_polygon(aes(x = long, y = lat, group = group, alpha = POBLACION/AREA),
fill = "darkred", color = "white") + ggtitle("Density of population in each Comuna")


If we plot now the comunas according the numbers of persons. In this case the comunas with more people are Maipú and La Florida, in my country this comunas have the nickname Bedroom Comunas (Comunas dormitorio) becuase this comunas have a large area and them have a lot of housing. So, the most of people in this areas don't work there, instead they have their housgins.

head(as.character(data_map[order(data_map$POBLACION, decreasing = T), ]$COMUNA,
5))

## [1] "MAIPU"      "LA FLORIDA" "LAS CONDES" "PENALOLEN"  "SANTIAGO"
## [6] "PUDAHUEL"

g + geom_polygon(aes(x = long, y = lat, group = group, alpha = POBLACION), fill = "blue",
color = "white") + ggtitle("Population in each Comuna")


Well, this is a first approach to analysis this type of data. I hope make other interesting things soon.