Creating a network using R

February 3, 2017
By

(This article was first published on Pachá (Batteries Included), and kindly contributed to R-bloggers)

On January, 10 2016 David Bowie left this earthly realm. Last month I decided to create a network and here is how to do that.

Required packages

You need jsonlite, igraph, network, plyr and R base.

Other tools

D3Plus by Alex Simoes and Dave Landry. Also Google Sheets.

My data is here.

Loading packages

# 1: define the libraries to use
libraries <- c("jsonlite","igraph","network", "data.table", "plyr")

# 2: this is the function to download and or load libraries on the fly
download_and_or_load <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg))
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

# 3: use the function from step 2
download_and_or_load(libraries)

Building the network

D3Plus needs three files: data, edges and nodes to visualize networks.

Data

This is the easy part. I downloaded the sheet named “data” from my spredsheet in CSV format. Then I convert the CSV to JSON with these lines:

data <- read.csv("data.csv")
data <- toJSON(data, pretty = TRUE)
write(data, file = "bowie_data.json")

Edges

Here is a bit trickier.

I downloaded the sheet named “collaborations” from my spredsheet in CSV format. In this matrix \(M\) this is the meaning of the entries:

$$
m_{ij} =
\begin{cases}
1 &\text{if } \text{artist } i \text{ and artist } j \text{ did collaborate to each other}\cr
0 &\text{othewise}
\end{cases}
$$

Then arrange the matrix to fix row names and column names:

bowie_collaborations <- read.csv("collaborations.csv")
rownames(bowie_collaborations) <- bowie_collaborations[,1]
bowie_collaborations <- bowie_collaborations[,-1]
colnames(bowie_collaborations) <- rownames(bowie_collaborations)

With the matrix ready I can create the network. You can try different layouts explained in igraph documentation. This is the code to create the network and display a static version of it:

bowie_gr <- matrix(unlist(bowie_collaborations), ncol = nrow(bowie_collaborations), byrow = TRUE)
rownames(bowie_gr) <- rownames(bowie_collaborations)
colnames(bowie_gr) <- colnames(bowie_collaborations)

bowie_gr <- which(bowie_gr > 0, arr.ind=TRUE)
bowie_gr.graph <- minimum.spanning.tree(graph.data.frame(bowie_gr, directed=F))
bowie_gr.names <- colnames(bowie_collaborations)[as.numeric(V(bowie_gr.graph)$name)]
bowie_gr.graph <- simplify(bowie_gr.graph, remove.multiple = T, remove.loops = T) 

set.seed(1234)
bowie_gr.layout <- layout_with_fr(bowie_gr.graph)
plot(bowie_gr.graph, edge.arrow.size=.3, vertex.label=bowie_gr.names, layout=bowie_gr.layout)



Now I do save the edges (names and ids) and the network layout:

write.graph(bowie_gr.graph, "exported_edges_bowie.csv", format=c("pajek"))
write.csv(bowie_gr.names, "exported_names_bowie.csv")
write.csv(bowie_gr.layout, "exported_coordinates_bowie.csv")

Finally I rearrange the edges to display names instead of numeric ids and save the result in JSON format:

network_names <- read.csv("exported_names_bowie.csv")
setnames(network_names, colnames(network_names), c("source_num","source"))
network_names$target_num <- network_names$source_num
network_names$target <- network_names$source

network_edges <- read.csv("exported_edges_bowie.csv", sep = " ")
network_edges <- network_edges[-1,]
setnames(network_edges, colnames(network_edges), c("source_num","target_num"))
network_edges <- join(network_edges, network_names[,c("source","source_num")], by = "source_num")
network_edges <- join(network_edges, network_names[,c("target","target_num")], by = "target_num")
network_edges <- network_edges[,c("source","target")]
source <- as.data.frame(network_edges$source)
colnames(source) <- "source"
target <- as.data.frame(network_edges$target)
colnames(target) <- "target"

network_edges <- data.frame(matrix(ncol = 1, nrow = nrow(network_edges)))
network_edges$source <- source
network_edges$target <- target
colnames(network_edges$source) <- "Artist"
colnames(network_edges$target) <- "Artist"

network_edges_json = toJSON(network_edges, pretty = TRUE)
write(network_edges_json, "bowie_edges.json")

Nodes

This is easier than the edges part. The code to save the nodes in JSON format with names instead of numeric ids is:

network_nodes <- read.csv("exported_coordinates_bowie.csv")
setnames(network_nodes, colnames(network_nodes), c("target_num","x","y"))
network_nodes <- join(network_nodes, network_names[,c("target","target_num")], by = "target_num")
network_nodes <- network_nodes[,c("target","x","y")]
setnames(network_nodes, colnames(network_nodes), c("Artist","x","y"))

network_nodes_json <- toJSON(network_nodes, pretty=TRUE)
write(network_nodes_json, "bowie_nodes.json")

Put your files in a D3Plus network template

In my case I decided to use bl.ocks.org to show my network. Use this template and edit the links to data, edges and nodes to make it to work.


   charset="utf-8">
   src="https://d3plus.org/js/d3.js">
   src="https://d3plus.org/js/d3plus.js">
           id="network">
href="https://fonts.googleapis.com/css?family=Lato:400,700" rel="stylesheet" type="text/css">

You can also use Roboto, another Google Font or just any typography you want.

Final result

After some edges editing in Atom (just aesthetic changes to put some edges closer to similar artists) the result is here.

To leave a comment for the author, please follow the link and comment on their blog: Pachá (Batteries Included).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Recent popular posts

Most visited articles of the week

  1. How to write the first for loop in R
  2. Installing R packages
  3. Using apply, sapply, lapply in R
  4. R – Sorting a data frame by the contents of a column
  5. How to Make a Histogram with Basic R
  6. How to perform a Logistic Regression in R
  7. How to Make a Histogram with ggplot2
  8. Tutorials for learning R
  9. In-depth introduction to machine learning in 15 hours of expert videos

Sponsors

RSS Jobs for R users

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)