Geographic data to service the needs of a remote employee – part2

July 6, 2016

(This article was first published on Mango Solutions » R Blog, and kindly contributed to R-bloggers)

Ava Yang, Mango Solutions


In part 1 of this post I set out to find a flat to rent based on three simple criteria:

  • Café density
  • Tube station density
  • Monthly rent

So far I have made use of the baidumap and REmap packages to create a nice visualisation of available flats and coffee shops in Shanghai.

Calculation and scoring

Now let’s do some basic math and programming. Three measures ( derived from original variables to quantify my preferences.

For density of café and tube station, the closer the better; the more the better. Geographic distances were calculated by function distm from package geosphere.


load('data/ziroom.rds') # raw data

# 1. Generate names to represent flats
# 2. Extract longitude and lattitude
sh_ziroom <- ziroom %>%
mutate(name=paste("Room", rownames(ziroom), sep="_")) %>%
mutate(lon=getCoordinate(flat, city="上海", formatted = T)[, 'longtitude']) %>%
mutate(lat=getCoordinate(flat, city="上海", formatted = T)[, 'latitude']) %>%
na.omit() %>%
select(c(lon, lat, name, price_promotion, flat))

# distance matrices: between cafe and flat, between station and flat

dist_cafe_flat <- distm(sh_cafe[,c("lon", "lat")], sh_ziroom[,c("lon", "lat")]) %>%

dist_station_flat <- distm(sh_station[,c("lon", "lat")], sh_ziroom[,c("lon", "lat")]) %>%

As an upper limit I’m willing to walk as far as 750 metres (about 0.5 mile) from a café. Thus, cafeidx and stationidx were then given by

For this job I wrote a small custom function called calIdx.

# Function to calculate cafe_idx and station_idx

calIdx <- function(tmpcol) {

tmpcol <- tmpcol[which(tmpcol < 750)]



Rent is a negative indicator, and so rentidx could be obtained from

The weighted score was calculated by

# 1. cafeIdx = 1/log(dis1) + 1/log(dis2) +...+ 1/log(disN)

# 2. stationIdx = 1/log(dis1) + 1/log(dis2) +...+ 1/log(disN)

# 3. rentIdx = 1/log(price_promotion)

# 4. score = 0.3*cafeIdx + 0.2*stationIdx + 0.5*rentIdx

sh_ziroom_top10 <- sh_ziroom %>%

mutate(cafeIdx = sapply(dist_cafe_flat, calIdx)) %>%

mutate(stationIdx = sapply(dist_station_flat, calIdx)) %>%

filter(price_promotion <= 4000) %>%

mutate(rentIdx = 1/log(as.numeric(price_promotion))) %>%

mutate(score = 0.4*cafeIdx + 0.2*stationIdx + 0.4*rentIdx) %>%

arrange(desc(score)) %>%



kable(sh_ziroom_top10[, c("name", "score", "cafeIdx", "stationIdx", "rentIdx")], align="c")

name score cafeIdx stationIdx rentIdx
Room_34 0.6480966 1.3262957 0.3380904 0.1249006
Room_35 0.6470510 1.3262957 0.3380904 0.1222865
Room_80 0.6054141 1.2216344 0.3378458 0.1229781
Room_79 0.6048128 1.2216344 0.3378458 0.1214746
Room_22 0.5729428 1.1430015 0.3349634 0.1218737
Room_24 0.5729428 1.1430015 0.3349634 0.1218737
Room_45 0.5292036 0.9617378 0.4709076 0.1258173
Room_46 0.5284566 0.9617378 0.4709076 0.1239499
Room_59 0.4334636 0.8012006 0.3237803 0.1205684
Room_57 0.3836545 0.6721137 0.3302977 0.1218737

Done! See above for the top 10 room candidates. The mechanism I used is not difficult and makes my life so much easier. Moving to a new area which fulfils all my social needs is no longer such a big challenge!


To leave a comment for the author, please follow the link and comment on their blog: Mango Solutions » R Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)