Playing with #rstatsnyc, Neo4J and R

[This article was first published on Colin Fay, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A search on Twitter, some R, and just enough Neo4J.

Disclaimer: of course everything here could be done in pure R. But
hey, where’s the fun in that?

Disclaimer bis: this blogpost relies on {neo4r}, a package still under
active development.

Get the tweets

<span class="n">library</span><span class="p">(</span><span class="n">rtweet</span><span class="p">)</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">search_tweets</span><span class="p">(</span><span class="s2">"#rstatsnyc"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3000</span><span class="p">)</span><span class="w">
</span>

Tweets collected at Sys.time() == “2018-04-25 12:41:49 CEST”

<span class="n">nrow</span><span class="p">(</span><span class="n">ny</span><span class="p">)</span><span class="w">
</span>
## [1] 3000

I might not have everything here (as I’ve reached the limit of 3000
tweets), but let’s dive into this anyway.

Prepare for Neo4J

Let’s get some info:

<span class="nf">range</span><span class="p">(</span><span class="n">ny</span><span class="o">$</span><span class="n">created_at</span><span class="p">)</span><span class="w">
</span>
## [1] "2018-04-20 18:52:27 UTC" "2018-04-25 09:57:44 UTC"

Here, every tweet was sent in the same month of the same year, so we can
keep only the day.

<span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">day</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lubridate</span><span class="o">::</span><span class="n">day</span><span class="p">(</span><span class="n">created_at</span><span class="p">))</span><span class="w">
</span>

Also, as the status_id column is composed of large characters of 18
numbers, let’s recode this column:

<span class="n">library</span><span class="p">(</span><span class="n">forcats</span><span class="p">)</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">mutate</span><span class="p">(</span><span class="n">status_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fct_anon</span><span class="p">(</span><span class="n">as_factor</span><span class="p">(</span><span class="n">status_id</span><span class="p">)),</span><span class="w"> 
                    </span><span class="n">status_id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.character</span><span class="p">(</span><span class="n">status_id</span><span class="p">))</span><span class="w">
</span><span class="c1"># Be sure we still have 3000 observations</span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">ny</span><span class="o">$</span><span class="n">status_id</span><span class="p">))</span><span class="w">
</span>
## [1] 3000

The model

Here’s a model of the graph we want to create in Neo4J, made with
http://www.apcjones.com/arrows.

Connect to Neo4J

<span class="n">library</span><span class="p">(</span><span class="n">neo4r</span><span class="p">)</span><span class="w">
</span><span class="n">con</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">neo4j_api</span><span class="o">$</span><span class="n">new</span><span class="p">(</span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"http://localhost:7474/"</span><span class="p">,</span><span class="w"> 
                     </span><span class="n">user</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"neo4j"</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"pouetpouet"</span><span class="p">)</span><span class="w">
</span><span class="c1"># Is the connection working?</span><span class="w">
</span><span class="n">con</span><span class="o">$</span><span class="n">ping</span><span class="p">()</span><span class="w">
</span>
## [1] 200

Create the CSV

Let’s create the CSV that will be sent to Neo4J. To do this, we need to:

  • Select the info
  • Write the csv in my Neo4J home
  • Send a query to Neo4J to retrieve and model these CSV

Note: we’re working on a way to natively send csv with {neo4r}, so you
won’t have to write in Neo4J home.

<span class="n">library</span><span class="p">(</span><span class="n">readr</span><span class="p">)</span><span class="w">
</span><span class="c1"># CSV of tweets </span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">day</span><span class="p">,</span><span class="w"> </span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> 
         </span><span class="n">lang</span><span class="p">,</span><span class="w"> </span><span class="n">favorite_count</span><span class="p">,</span><span class="w"> </span><span class="n">retweet_count</span><span class="p">,</span><span class="w"> 
         </span><span class="n">is_quote</span><span class="p">,</span><span class="w"> </span><span class="n">is_retweet</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">write_csv</span><span class="p">(</span><span class="s2">"~/neo4j/import/ny_tweets.csv"</span><span class="p">)</span><span class="w">

</span><span class="c1"># CSV of users</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">screen_name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">write_csv</span><span class="p">(</span><span class="s2">"~/neo4j/import/ny_users.csv"</span><span class="p">)</span><span class="w">

</span><span class="c1"># CSV of hashtags</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">hashtags</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">tidyr</span><span class="o">::</span><span class="n">unnest</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">na.omit</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">write_csv</span><span class="p">(</span><span class="s2">"~/neo4j/import/ny_hastags.csv"</span><span class="p">)</span><span class="w">

</span><span class="c1"># CSV of mentions</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">mentions_screen_name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">tidyr</span><span class="o">::</span><span class="n">unnest</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">na.omit</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">write_csv</span><span class="p">(</span><span class="s2">"~/neo4j/import/ny_mentions.csv"</span><span class="p">)</span><span class="w">

</span><span class="c1"># CSV of replies</span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">reply_to_status_id</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">na.omit</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">write_csv</span><span class="p">(</span><span class="s2">"~/neo4j/import/ny_replies.csv"</span><span class="p">)</span><span class="w">
</span>

Before reading the file in Neo4J, we should add some constraints to
ensure the nodes are unique. If you’re not familiar with this
terminology, a constraint is a property that will ensure that every
label is unique: for example, here, we will have to ensure that every
status_id is unique.

Hence, if we try to create a node with a status_id that already
exists, this writting process will fail (and that’s the reason why we
are using MERGE for writting the nodes with constraints).

<span class="s1">'CREATE CONSTRAINT ON (t:Tweet) ASSERT t.name IS UNIQUE'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="s1">'CREATE CONSTRAINT ON (d:Day) ASSERT d.name IS UNIQUE'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="s1">'CREATE CONSTRAINT ON (u:User) ASSERT u.name IS UNIQUE'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="s1">'CREATE CONSTRAINT ON (h:Hashtag) ASSERT h.name IS UNIQUE'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.

Note: the messages returned ## No data returned. are due to the fact
that we haven’t retrieved anything from the DB, neither stats about
the call (which could be retrieved with the include_stats arguments)
nor data.

Importing the csv to the DB:

  • With include_stats = TRUE:

<span class="c1"># Tweets but no day</span><span class="w">
</span><span class="s1">'USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///ny_tweets.csv" AS csvLine
MERGE (t:Tweet { name: csvLine.status_id, text: csvLine.text, source: csvLine.source, lang: csvLine.lang, favorite_count: toInteger(csvLine.favorite_count), retweet_count: toInteger(csvLine.retweet_count), is_quote: toBoolean(csvLine.is_quote), is_retweet: toBoolean(csvLine.is_retweet)});'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">include_stats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span>
## No data returned.

## # A tibble: 12 x 2
##    type                   value
##    <chr>                  <dbl>
##  1 contains_updates          1.
##  2 nodes_created          3000.
##  3 nodes_deleted             0.
##  4 properties_set        24000.
##  5 relationships_created     0.
##  6 relationship_deleted      0.
##  7 labels_added           3000.
##  8 labels_removed            0.
##  9 indexes_added             0.
## 10 indexes_removed           0.
## 11 constraints_added         0.
## 12 constraints_removed       0.
  • Without:

<span class="c1"># Days </span><span class="w">
</span><span class="s1">'USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///ny_tweets.csv" AS csvLine
MERGE (d:Day {name : csvLine.day} )
WITH csvLine
MATCH (t:Tweet {name: csvLine.status_id})
MATCH (d:Day {name : csvLine.day} )
MERGE (t) -[:WAS_SENT]->(d);'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="c1"># Users</span><span class="w">
</span><span class="s1">'USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///ny_users.csv" AS csvLine
MERGE (u:User { name: csvLine.screen_name})
WITH csvLine
MATCH (u:User { name: csvLine.screen_name})
MATCH (t:Tweet {name : csvLine.status_id})
MERGE (u) -[:SENT]-> (t);'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="c1"># Hashtags</span><span class="w">
</span><span class="s1">'USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///ny_hastags.csv" AS csvLine
MERGE (h:Hashtag { name: csvLine.hashtags})
WITH csvLine
MATCH (t:Tweet {name : csvLine.status_id})
MATCH (h:Hashtag { name: csvLine.hashtags})
MERGE (t) -[:CONTAINS]-> (h);'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="c1"># Mentions</span><span class="w">
</span><span class="s1">'USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///ny_mentions.csv" AS csvLine
MERGE (m:User { name: csvLine.mentions_screen_name})
WITH csvLine
MATCH (t:Tweet {name : csvLine.status_id})
MATCH (m:User { name: csvLine.mentions_screen_name})
MERGE (t) -[:MENTIONS]-> (m);'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.
<span class="c1"># Replies</span><span class="w">
</span><span class="s1">'USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:///ny_replies.csv" AS csvLine
MERGE (t:Tweet { name: csvLine.reply_to_status_id})
WITH csvLine
MATCH (t:Tweet {name : csvLine.status_id})
MATCH (r:Tweet {name: csvLine.reply_to_status_id})
MERGE (t) -[:REPLIES_TO]-> (r);'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## No data returned.

Let’s see what we’ve got:

<span class="n">con</span><span class="o">$</span><span class="n">get_constraints</span><span class="p">()</span><span class="w">
</span>
## # A tibble: 4 x 3
##   label   type       property_keys
##   <chr>   <chr>      <chr>        
## 1 User    UNIQUENESS name         
## 2 Hashtag UNIQUENESS name         
## 3 Tweet   UNIQUENESS name         
## 4 Day     UNIQUENESS name
<span class="n">con</span><span class="o">$</span><span class="n">get_labels</span><span class="p">()</span><span class="w">
</span>
## # A tibble: 4 x 1
##   labels 
##   <chr>  
## 1 Tweet  
## 2 User   
## 3 Hashtag
## 4 Day
<span class="n">con</span><span class="o">$</span><span class="n">get_relationships</span><span class="p">()</span><span class="w">
</span>
## # A tibble: 5 x 1
##   relationships
##   <chr>        
## 1 WAS_SENT     
## 2 SENT         
## 3 CONTAINS     
## 4 MENTIONS     
## 5 REPLIES_TO

Let’s explore

Check check

Let’s start with a check to see if we have everything:

<span class="c1"># Have we got all the tweets?</span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">ny</span><span class="o">$</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">ny</span><span class="o">$</span><span class="n">reply_to_status_id</span><span class="p">)))</span><span class="w">
</span>
## [1] 3041
<span class="s1">'MATCH (t:Tweet) RETURN COUNT(t) AS tweets_count'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## $tweets_count
## # A tibble: 1 x 1
##   value
##   <int>
## 1  3041
<span class="c1"># Have we got all the Days?</span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">ny</span><span class="o">$</span><span class="n">day</span><span class="p">))</span><span class="w">
</span>
## [1] 6
<span class="s1">'MATCH (d:Day) RETURN COUNT(d) AS days_count'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## $days_count
## # A tibble: 1 x 1
##   value
##   <int>
## 1     6
<span class="c1"># Do we have all the users ? </span><span class="w">
</span><span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">ny</span><span class="o">$</span><span class="n">screen_name</span><span class="p">))</span><span class="w">
</span>
## [1] 1021
<span class="s1">'MATCH (u:User) RETURN COUNT(u) AS Users_count'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## $Users_count
## # A tibble: 1 x 1
##   value
##   <int>
## 1  1088
<span class="c1"># All the hashtags? </span><span class="w">
</span><span class="n">ny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">select</span><span class="p">(</span><span class="n">status_id</span><span class="p">,</span><span class="w"> </span><span class="n">hashtags</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">tidyr</span><span class="o">::</span><span class="n">unnest</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">na.omit</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">distinct</span><span class="p">(</span><span class="n">hashtags</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">nrow</span><span class="p">()</span><span class="w">
</span>
## [1] 177
<span class="s1">'MATCH (h:Hashtag) RETURN COUNT(h) AS Hash_count'</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span>
## $Hash_count
## # A tibble: 1 x 1
##   value
##   <int>
## 1   177

Ok, so now that we have our data ready, let’s explore a little bit.

Who tweeted the most?

<span class="n">library</span><span class="p">(</span><span class="n">purrr</span><span class="p">)</span><span class="w">
</span><span class="s1">'MATCH (t:Tweet) <- [:SENT] - (u:User) 
RETURN u.name AS name, COUNT(u) AS count
ORDER BY COUNT(u) DESC
LIMIT 10'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"user"</span><span class="p">,</span><span class="w"> </span><span class="s2">"n"</span><span class="p">))</span><span class="w">
</span>
## # A tibble: 10 x 2
##    user               n
##    <chr>          <int>
##  1 anushasharma9x   275
##  2 SK_convergence   172
##  3 robinson_es      110
##  4 christinezhang   105
##  5 rstatsbot1234     95
##  6 NoorDinTech       94
##  7 drob              71
##  8 b3njana           55
##  9 jaredlander       48
## 10 LaurusT001        43

You might see something surprising here: why do we have to bind_cols?
By design, {neo4r} does not bind columns for you, for the simple
reason that you can retrieve information that might not fit into a
single tidy data.frame.

Let’s put it straight into a dataviz:

<span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="s1">'MATCH (t:Tweet) <- [:SENT] - (u:User) 
RETURN u.name AS name, COUNT(u) AS count
ORDER BY COUNT(u) DESC
LIMIT 10'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"user"</span><span class="p">,</span><span class="w"> </span><span class="s2">"n"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">aes</span><span class="p">(</span><span class="n">reorder</span><span class="p">(</span><span class="n">user</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">),</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_col</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">viridis</span><span class="o">::</span><span class="n">plasma</span><span class="p">(</span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"user"</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tweets"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w">
</span>

Daily tweets

How many tweets by day?

<span class="s1">'MATCH (t:Tweet) - [:WAS_SENT] -> (d:Day) 
RETURN d.name AS day, COUNT(d) AS count'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"day"</span><span class="p">,</span><span class="w"> </span><span class="s2">"n"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">day</span><span class="w">  </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">day</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">aes</span><span class="p">(</span><span class="n">day</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_col</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">viridis</span><span class="o">::</span><span class="n">cividis</span><span class="p">(</span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"day"</span><span class="p">,</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tweets"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w">
</span>

What are the most used hashtags?

(excluding rstatsnyc, of course)

<span class="s1">'MATCH (t:Tweet) -[r:CONTAINS]->(h:Hashtag) 
WHERE NOT h.name = "rstatsnyc"
RETURN h.name as Hash, COUNT(h) AS count
ORDER BY COUNT(h) DESC
LIMIT 10'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"hashtags"</span><span class="p">,</span><span class="w"> </span><span class="s2">"n"</span><span class="p">))</span><span class="w">
</span>
## # A tibble: 10 x 2
##    hashtags         n
##    <chr>        <int>
##  1 rstats         582
##  2 nycdatamafia   102
##  3 RStatsNYC       73
##  4 rladies         72
##  5 python          42
##  6 tidyverse       36
##  7 Rladies         19
##  8 datascience     17
##  9 rstatsNYC       14
## 10 RforEveryone    10

How many @drob or @robinson_es tweets?

(because apparently they were
fighting 🙂
):

Get the number of tweets:

<span class="s1">'MATCH (t:Tweet) <- [:SENT] - (u:User) 
WHERE u.name = "drob" OR u.name = "robinson_es" 
RETURN COUNT(u) AS count, u.name'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="s2">"user"</span><span class="p">))</span><span class="w">
</span>
## # A tibble: 2 x 2
##       n user       
##   <int> <chr>      
## 1    71 drob       
## 2   110 robinson_es

Get the average number of retweets:

<span class="s1">'MATCH (t:Tweet) <- [:SENT] - (u:User) 
WHERE u.name = "drob" OR u.name = "robinson_es" 
RETURN u.name AS user, COUNT(u) AS count, AVG(t.retweet_count) as mean'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"user"</span><span class="p">,</span><span class="w"> </span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="s2">"mean_RT"</span><span class="p">))</span><span class="w">
</span>
## # A tibble: 2 x 3
##   user            n mean_RT
##   <chr>       <int>   <dbl>
## 1 drob           71    12.8
## 2 robinson_es   110    10.1

Create a function…

… to get the number of tweets by a user

<span class="n">get_tweet_count</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">who</span><span class="p">){</span><span class="w">
  </span><span class="n">paste0</span><span class="p">(</span><span class="s1">'MATCH (t:Tweet) <- [:SENT] - (u:User {name: "'</span><span class="p">,</span><span class="w"> </span><span class="n">who</span><span class="p">,</span><span class="w"> </span><span class="s1">'"}) 
  RETURN COUNT(t) AS '</span><span class="p">,</span><span class="w"> </span><span class="n">who</span><span class="p">,</span><span class="w"> </span><span class="s2">";"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="n">get_tweet_count</span><span class="p">(</span><span class="s2">"RLadiesNYC"</span><span class="p">)</span><span class="w">
</span>
## $RLadiesNYC
## # A tibble: 1 x 1
##   value
##   <int>
## 1    12

Who are the users who…

… are mentionned in a tweet containing the hashtag
#Rladies?

<span class="s1">'MATCH (u:User) <- [m:MENTIONS] - (t:Tweet) - [:CONTAINS]-> (:Hashtag {name : "Rladies"})
RETURN u AS Name, COUNT(u) AS n'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">bind_cols</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">set_names</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="s2">"user"</span><span class="p">,</span><span class="w"> </span><span class="s2">"n"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">top_n</span><span class="p">(</span><span class="m">5</span><span class="p">)</span><span class="w">
</span>
## Selecting by n

## # A tibble: 6 x 2
##   user               n
##   <chr>          <int>
## 1 robinson_es       11
## 2 AnushkaSharma      6
## 3 SK_convergence     6
## 4 RLadiesNYC         6
## 5 drob               5
## 6 jtrnyc             5

… were mentions in a tweet containing the hashtag #rdogs :

<span class="n">library</span><span class="p">(</span><span class="n">ggraph</span><span class="p">)</span><span class="w">
</span><span class="s1">'MATCH (u:User) <- [m:MENTIONS] - (t:Tweet) - [:CONTAINS]-> (:Hashtag {name : "rdogs"})
RETURN u, m, t'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"graph"</span><span class="p">)</span><span class="w">  </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">convert_to</span><span class="p">(</span><span class="s2">"igraph"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggraph</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_edge_link</span><span class="p">()</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_node_label</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> 
                      </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">group</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"#rdogs and #rstatsnyc"</span><span class="p">,</span><span class="w">
       </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"data from Twitter"</span><span class="p">,</span><span class="w">
       </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"@_colinfay"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">theme_graph</span><span class="p">()</span><span class="w"> 
</span>

… mention @RLadiesNYC

<span class="n">library</span><span class="p">(</span><span class="n">ggraph</span><span class="p">)</span><span class="w">
</span><span class="s1">'MATCH (u:User) - [s:SENT] -> (t:Tweet) -[m:MENTIONS]-> (r:User {name:"RLadiesNYC"}) 
RETURN u, s, m, r'</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">call_api</span><span class="p">(</span><span class="n">con</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"graph"</span><span class="p">)</span><span class="w">  </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">convert_to</span><span class="p">(</span><span class="s2">"igraph"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggraph</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">geom_edge_link</span><span class="p">()</span><span class="o">+</span><span class="w">
  </span><span class="n">geom_node_label</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> 
                      </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">group</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Mentions of RLadiesNYC"</span><span class="p">,</span><span class="w">
       </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"data from Twitter"</span><span class="p">,</span><span class="w">
       </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"@_colinfay"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> 
  </span><span class="n">theme_graph</span><span class="p">()</span><span class="w"> 
</span>

{neo4r} on GitHub

To leave a comment for the author, please follow the link and comment on their blog: Colin Fay.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)