**analytics for fun**, and kindly contributed to R-bloggers)

After searching for a few hours on the web, I’ve been able to get my R code working and plot breast cancer data on a world map. It might not the best looking map possible (R graphics is incredible!), but I am happy with that for now.

To produce the map I used the “maps” package available through CRAN repository. And of course I needed longitude and latitude coordinates for each country, which I searched on the web and added to my original data set. Here are the steps I followed:

**1) Load a .csv file containing lat/long coordinates for all countries**

> countryCoord<- read.csv (“~/Rworkdir/data/countryCoord.csv”)

**2) Add lat/long coordinates to my original breast cancer data set** (dataset is called “gapCleaned”). To do this I used the function “merge”, specifying to merge the two data sets by the variable “country” (both the datasets have this variable in common), and used left outer join (here is a good explanation of merge command)

> mergedCleaned<- merge(gapCleaned, countryCoord, by=”country”, all.x=TRUE)

All right, now I have two new columns in my data set, indicating lat and long coordinates for each country 🙂 Cool, next step is finally drawing a map with the data.

**3) Draw a world map and tell R where to plot breast cancer data**

> library(maps)

> map(“world”,col=”gray90”, fill=TRUE)

I size the breast cancer symbol according to breast cancer value for each country in my data set

> radius <- 3^sqrt (mergedCleaned$breastcancer)

Finally, I give R instructions to plot my breast cancer data over the world map

> symbols(mergedCleaned$lon, mergedCleaned$lat, bg = “blue”, fg = “red”, lwd = 1, circles = radius, inches = 0.175, add = TRUE)

** New cases of breast cancer in the world, 2002**

I am sure we can do prettier plots with R (I know there are other interesting packages suitable for this, such as ggplot), but I am happy for now. I’ve learned something new and been able to visualize and communicate data in a more effective way than just a scatterplot.

**Conclusions:**

Looking at the map, we can quickly identify the countries/areas with the highest number of breast cancer cases and hypothesize patterns. As reported on my last post, these are United States, New Zealand, Israel, Central/Northern Europe and in general highly developed economies rather than developing countries.

**leave a comment**for the author, please follow the link and comment on their blog:

**analytics for fun**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...