Building a pokemon graph database

December 3, 2016
By

(This article was first published on --Jean Arreola--, and kindly contributed to R-bloggers)

What happens when you combine Pokemon with Neo4j?

I’m a huge Pokemon fan. So, when I found about this awesome post from Joshua Kunst, I just couldn’t wait to throw all that data into Neo4j.

It also happens to be a great way to learn how to build a graph database from scratch. The objective of this exercise is to build a graph database where the nodes are the pokemon and the types, and the relationships are the effectiveness between the pokemon based only on their types.

Getting the data

First of all, be sure to check Joshua’s post to learn how to import all that pokemon data. We will asume that the data is in a data frame called df.

Then, we need to get the relationships between types. The easiest thing for acomplishing that is to scrape the table from pokemondb.net.

library(RNeo4j)
library(rvest)
library(methods)
library(dplyr)

link <- "http://pokemondb.net/type"

link_html <- read_html(link)

types <- link_html %>%
  html_nodes("table") %>%
  .[[1]] %>%
  html_table()

#Give format

names(types)[1] <- "Type"
types$Type <- tolower(types$Type)
names(types)[2:ncol(types)] <- types$Type
types[is.na(types)] <- 1
types[types == ""] <- 1
types[types == "½"] <- 0.5

knitr::kable(types, format = "html")
Type normal fire water electric grass ice fighting poison ground flying psychic bug rock ghost dragon dark steel fairy
normal 1 1 1 1 1 1 1 1 1 1 1 1 0.5 0 1 1 0.5 1
fire 1 0.5 0.5 1 2 2 1 1 1 1 1 2 0.5 1 0.5 1 2 1
water 1 2 0.5 1 0.5 1 1 1 2 1 1 1 2 1 0.5 1 1 1
electric 1 1 2 0.5 0.5 1 1 1 0 2 1 1 1 1 0.5 1 1 1
grass 1 0.5 2 1 0.5 1 1 0.5 2 0.5 1 0.5 2 1 0.5 1 0.5 1
ice 1 0.5 0.5 1 2 0.5 1 1 2 2 1 1 1 1 2 1 0.5 1
fighting 2 1 1 1 1 2 1 0.5 1 0.5 0.5 0.5 2 0 1 2 2 0.5
poison 1 1 1 1 2 1 1 0.5 0.5 1 1 1 0.5 0.5 1 1 0 2
ground 1 2 1 2 0.5 1 1 2 1 0 1 0.5 2 1 1 1 2 1
flying 1 1 1 0.5 2 1 2 1 1 1 1 2 0.5 1 1 1 0.5 1
psychic 1 1 1 1 1 1 2 2 1 1 0.5 1 1 1 1 0 0.5 1
bug 1 0.5 1 1 2 1 0.5 0.5 1 0.5 2 1 1 0.5 1 2 0.5 0.5
rock 1 2 1 1 1 2 0.5 1 0.5 2 1 2 1 1 1 1 0.5 1
ghost 0 1 1 1 1 1 1 1 1 1 2 1 1 2 1 0.5 1 1
dragon 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 0.5 0
dark 1 1 1 1 1 1 0.5 1 1 1 2 1 1 2 1 0.5 1 0.5
steel 1 0.5 0.5 0.5 1 2 1 1 1 1 1 1 2 1 1 1 0.5 2
fairy 1 0.5 1 1 1 1 2 0.5 1 1 1 1 1 1 2 2 0.5 1

Then we need to separate the types of the pokemon.

df %>% select(id, type =  type_1) -> t1
df %>% select(id, type =  type_2) -> t2

rbind(t1,t2) -> tf

poke_df <- df %>% select(-type_1, -type_2) %>% 
  left_join(tf, by = "id") %>% 
  filter(!is.na(type))

We are ready to import to Neo4j, so we need to set the connection.

Then, we create the pokenodes and the type nodes. We set a relationship for the typing.

#Connect to Graph


graph = startGraph(url = url,
                   username = username,
                   password = password)

#Constraints

addConstraint(graph, "Pokemon", "id")
addConstraint(graph, "Type", "type")


#Create nodes and relationships within the same function

pokenodes <- function(x) {
  pokemon <- getOrCreateNode(graph, "Pokemon", id = x["id"], name = x["pokemon"],
                             height = x["height"], weight = x["weight"],
                             attack = x["attack"], defense = x["defense"],
                             hp = x["hp"], special_attack = x["special_attack"],
                             special_defense = x["special_defense"], speed = x["speed"],
                             url_image = x["url_image"], url_icon = x["url_icon"])
  
  type <- getOrCreateNode(graph, "Type", type = x["type"])
  
  createRel(pokemon,"TYPE",type)
}

#Apply to every row

apply(poke_df[1:nrow(poke_df),],1,pokenodes)

We define the desired relationship (effectiveness) using the scraped table

types <- types %>% gather(Type)

names(types)[2] <- "Type_Rel"

effectiveness <- types %>% filter(value != 1)

And we are ready to upload the effectiveness, this time using a transaction. Thanks to Nicloe White for this useful post

#Query for creating relationships for the pokenodes

query = "
MERGE (n:Type {type:{type_1}})
MERGE (m:Type {type:{type_2}})
CREATE (n)-[r:EFECTIVENESS]->(m)
SET r.value = {value}
"

#Transactiopn endpoint
t = newTransaction(graph)

for (i in 1:nrow(effectiveness)) {
  type_1 = effectiveness[i, ]$Type
  type_2 = effectiveness[i, ]$Type_Rel
  value = effectiveness[i, ]$value
  
  appendCypher(t, 
               query, 
               type_1 = type_1, 
               type_2 = type_2, 
               value  = value)
}

commit(t)

It’s time to query our database!!! Let’s check all the pokemon that Salamence is double effective:

library(visNetwork)

#Query to check for effectiveness for Salamence
final_query <- "
match (n:Pokemon)-[t:TYPE]->(l:Type)-[e:EFECTIVENESS]->(s:Type)<-[j:TYPE]-(z:Pokemon) 
where n.name = 'salamence' 
return n.name as poke1, e.value as value, z.name as poke2, n.url_icon as icon1,
z.url_icon as icon2, n.url_image as image1, z.url_image as image2"

#Execute the query
poke_cypher = cypher(graph, final_query)

#Get data for VisNetwork
poke_cypher <- poke_cypher %>%
  mutate(value = as.numeric(value)) %>%
  group_by(poke1, poke2, image1, image2, icon1, icon2) %>%
  summarise(value = prod(value)) %>%
  ungroup()

#Filter by double effective
poke_sp_eft <- poke_cypher %>%
  filter(value == 2)

#More data for VisNetwork
poke <- unique(c(poke_sp_eft$poke1, poke_sp_eft$poke2))
img  <- unique(c(poke_sp_eft$icon1, poke_sp_eft$icon2))

nodes <- data.frame(id = poke, label = poke, image = img, shape = "image")

edges <- poke_sp_eft %>%
  select(from = poke1, to = poke2)

#The VISUALIZATION
visNetwork(nodes, edges, width = "100%")

plot of chunk unnamed-chunk-8

And that’s how you do it! With the RNeo4j it’s so easy to set a graph. Maybe in the future it could be expanded in a recommender system or something like that.

Check out a shiny app for the pokemon database!

To leave a comment for the author, please follow the link and comment on their blog: --Jean Arreola--.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)