Site icon R-bloggers

Querying Neo4j Aura from R with neo2R

[This article was first published on Patrice Godard, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

  • < section id="introduction" class="level2" data-number="1">

    1 Introduction

    Graph databases excel at storing and traversing highly connected data used for recommendation engines, fraud detection, knowledge graphs, and social networks. Neo4j is one of the most widely used graph databases, and with Neo4j Aura, its managed cloud service now makes it easy to spin up a production-grade instance without any infrastructure overhead.

    On the R side, the neo2R package has long been available for querying self-hosted Neo4j instances from R. Version 3.0.0 brings two important changes:

    1. Unified connection model — a single startGraph() call handles both a self-hosted Neo4j instance (http://localhost:7474) and a cloud Neo4j Aura instance (https://<id>.databases.neo4j.io).
    2. httr2 backend — the internal HTTP layer migrated from the deprecated httr package to httr2.

    In this post, we’ll connect to the free Neo4j Aura demo database preloaded with the classic Movie Recommendations dataset, explore the graph with Cypher queries, and finish with an interactive network visualization built with visNetwork.


    < section id="prerequisites" class="level2" data-number="2">

    2 Prerequisites

    install.packages(c("neo2R", "dplyr", "visNetwork"))
    library(neo2R)
    library(dplyr)
    Attaching package: 'dplyr'
    The following objects are masked from 'package:stats':
    
        filter, lag
    The following objects are masked from 'package:base':
    
        intersect, setdiff, setequal, union
    library(visNetwork)
    Note

    neo2R 3.0.0 requires R ≥ 4.1 and httr2 ≥ 1.0.0. Check your versions with packageVersion("neo2R") and packageVersion("httr2").


    < section id="connecting-to-neo4j-aura" class="level2" data-number="3">

    3 Connecting to Neo4j Aura

    < section id="create-and-connect-to-an-aura-instance" class="level3" data-number="3.1">

    3.1 Create and Connect to an Aura Instance

    Neo4j provides a free Aura Free tier (up to 200 k nodes / 400 k relationships).

    Create a free instance at https://console.neo4j.io and get your connection details.

    Connect to your instance with startGraph().

    my_aura <- startGraph(
      url = "https://<INSTANCEID>.databases.neo4j.io",
      database = "INSTANCEID",
      username = "INSTANCEID",
      password = "INSTANCEPASSWORD"
      ## api = "v2" is set automatically for *.databases.neo4j.io URLs
    )
    < section id="the-movie-recommendations-dataset" class="level3" data-number="3.2">

    3.2 The Movie Recommendations Dataset

    Neo4j provides example datasets, and most of them are available as a one-click templates in the Neo4j Aura console.

    The Movie Recommendations dataset is a graph example using a dataset of movie reviews for generating personalized, real-time recommendations. This dataset is also available on a demo server that can be accessed as follows.

    graph <- startGraph(
      url = "https://demo.neo4jlabs.com:7473",
      database = "recommendations",
      username = "recommendations",
      password = "recommendations"
    )
    < section id="exploring-the-schema" class="level2" data-number="4">

    4 Exploring the schema

    The Movie database contains the following node labels and relationship types:

    Node label Key properties
    Movie title, released, imdbId
    Genre name
    Actor name, born, imdbId
    Director name, born, imdbId
    User name


    Relationship type Key properties
    IN_GENRE
    ACTED_IN role
    DIRECTED
    RATED rating || timestamp


    Let’s count the number of these different concepts:

    ## Node types and counts
    cypher(
      graph,
      "
      MATCH (n)
      RETURN labels(n) AS label, count(n) AS n
      ORDER BY n DESC
    "
    ) |>
      as_tibble() |>
      ## filtering out technical nodes
      filter(label %notin% c("_Bloom_Perspective_", "_Bloom_Scene_", ""))
    # A tibble: 6 × 2
      label                           n
      <chr>                       <int>
    1 Actor || Person             14956
    2 Movie                        9125
    3 Director || Person           3604
    4 User                          671
    5 Actor || Director || Person   487
    6 Genre                          20
    ## Relationship types and counts
    cypher(
      graph,
      "
      MATCH ()-[r]->()
      RETURN type(r) AS type, count(r) AS n
      ORDER BY n DESC
    "
    ) |>
      as_tibble() |>
      ## filtering out technical relationships
      filter(type %notin% c("_Bloom_HAS_SCENE_"))
    # A tibble: 4 × 2
      type          n
      <chr>     <int>
    1 RATED    100004
    2 ACTED_IN  35910
    3 IN_GENRE  20340
    4 DIRECTED  10007
    < section id="querying-with-cypher" class="level2" data-number="5">

    5 Querying with Cypher

    < section id="top-prolific-actors" class="level3" data-number="5.1">

    5.1 Top prolific actors

    cypher(
      graph,
      "
      MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
      RETURN p.name AS actor, count(m) AS movies
      ORDER BY movies DESC
      LIMIT 10
    "
    ) |>
      as_tibble()
    # A tibble: 10 × 2
       actor             movies
       <chr>              <int>
     1 Robert De Niro        56
     2 Bruce Willis          49
     3 Samuel L. Jackson     45
     4 Nicolas Cage          45
     5 Michael Caine         40
     6 Clint Eastwood        40
     7 Tom Hanks             38
     8 John Cusack           38
     9 Morgan Freeman        38
    10 Gene Hackman          38
    < section id="movies-and-their-directors" class="level3" data-number="5.2">

    5.2 Movies and their directors

    cypher(
      graph,
      "
      MATCH (d:Person)-[:DIRECTED]->(m:Movie)
      RETURN m.title AS movie, m.released as released, d.name AS director
      ORDER BY m.released IS NOT NULL DESC, m.released DESC
      LIMIT 10
    "
    ) |>
      as_tibble()
    # A tibble: 10 × 3
       movie         released   director            
       <chr>         <chr>      <chr>               
     1 Solace        2016-09-02 "Afonso Poyart"     
     2 Ben-hur       2016-08-12 "Timur Bekmambetov" 
     3 Rustom        2016-08-12 "Tinu Suresh Desai" 
     4 Mohenjo Daro  2016-08-12 "Ashutosh Gowariker"
     5 Suicide Squad 2016-08-05 "David Ayer"        
     6 Shin Godzilla 2016-07-29 "Hideaki Anno"      
     7 Shin Godzilla 2016-07-29 " Shinji Higuchi"   
     8 Jason Bourne  2016-07-29 "Paul Greengrass"   
     9 Star Trek 3   2016-07-22 "Justin Lin"        
    10 Ghostbusters  2016-07-15 "Paul Feig"         
    < section id="parameterised-queries" class="level3" data-number="5.3">

    5.3 Parameterised queries

    neo2R supports named parameters, keeping queries safe from injection and easy to reuse:

    ## Find all co-stars of a given actor
    cypher(
      graph,
      "
      MATCH (a:Person {name: $actor})-[:ACTED_IN]->(m:Movie)<-[:ACTED_IN]-(co:Person)
      RETURN DISTINCT co.name AS co_star, m.title AS shared_movie
      ORDER BY co_star
      ",
      parameters = list(actor = "Tom Hanks")
    ) |>
      as_tibble()
    # A tibble: 114 × 2
       co_star            shared_movie        
       <chr>              <chr>               
     1 Adrian Zmed        Bachelor Party      
     2 Alexander Godunov  Money Pit, The      
     3 Amy Adams          Charlie Wilson's War
     4 Annie Rose Buckley Saving Mr. Banks    
     5 Audrey Tautou      Da Vinci Code, The  
     6 Ayelet Zurer       Angels & Demons     
     7 Barkhad Abdi       Captain Phillips    
     8 Barkhad Abdirahman Captain Phillips    
     9 Barry Pepper       Saving Private Ryan 
    10 Bill Paxton        Apollo 13           
    # ℹ 104 more rows

    < section id="network-visualisation-with-visnetwork" class="level2" data-number="6">

    6 Network visualisation with visNetwork

    The real power of a graph database is visible when you draw the graph. Let’s pull Tom Hanks’s ego network, everyone he has acted alongside, and render it with visNetwork.

    < section id="step-1-fetch-nodes-and-edges" class="level3" data-number="6.1">

    6.1 Step 1 — Fetch nodes and edges

    ## Tom Hanks, his movies, and his co-stars
    hub <- "Tom Hanks"
    nodes_raw <- cypher(
      graph,
      "
      MATCH (hub:Person {name: $hub})-[hr:ACTED_IN]->(m:Movie)
      <-[cr:ACTED_IN]-(co:Person)
      RETURN hub.name AS hub, hr.role AS hub_role,
      m.title AS movie, m.year AS year,
      co.name AS co, cr.role AS co_role
      ",
      parameters = list(hub = hub)
    ) |>
      as_tibble()
    < section id="step-2-shape-data-for-visnetwork" class="level3" data-number="6.2">

    6.2 Step 2 — Shape data for visNetwork

    visNetwork expects two data frames: nodes (with columns id, label, group, …) and edges (with columns from, to, …).

    nodes <- bind_rows(
      nodes_raw |>
        distinct(
          id = hub,
          group = "Hub"
        ),
      nodes_raw |>
        distinct(
          id = co,
          group = "Co-star"
        ),
      nodes_raw |>
        distinct(
          id = movie,
          group = "Movie",
          year
        )
    ) |>
      distinct() |>
      mutate(
        title = sprintf(
          '<b>%s</b>: %s%s',
          group,
          id,
          ifelse(!is.na(year), sprintf("(%s)", year), "")
        ),
        shape = ifelse(group == "Movie", "dot", "star"),
        size = ifelse(group == "Hub", 30, 18)
      ) |> 
        arrange(id)
    
    edges <- bind_rows(
      nodes_raw |>
        distinct(
          from = hub,
          to = movie,
          role = hub_role
        ),
      nodes_raw |>
        distinct(
          from = co,
          to = movie,
          role = co_role
        )
    ) |>
      mutate(
        title = sprintf('<b>Role</b>: %s', role),
        arrows = "to"
      )
    < section id="step-3-draw-the-network" class="level3" data-number="6.3">

    6.3 Step 3 — Draw the network

    visNetwork(nodes, edges) |>
      visGroups(
        groupname = "hub",
        color = list(
          background = "#3B82F6",
          border = "#1D4ED8",
          highlight = "#93C5FD"
        )
      ) |>
      visGroups(
        groupname = "movie",
        color = list(
          background = "#F97316",
          border = "#C2410C",
          highlight = "#FED7AA"
        ),
        shape = "square"
      ) |>
      visGroups(
        groupname = "costar",
        color = list(
          background = "#6B7280",
          border = "#374151",
          highlight = "#D1D5DB"
        )
      ) |>
      visEdges(
        color = list(color = "#CBD5E1", highlight = "#3B82F6"),
        width = 1.5
      ) |>
      visOptions(
        highlightNearest = list(enabled = TRUE, degree = 1, hover = TRUE),
        nodesIdSelection = TRUE
      ) |>
      visLayout(randomSeed = 42) |>
      visPhysics(
        solver = "forceAtlas2Based",
        forceAtlas2Based = list(
          gravitationalConstant = -60,
          springLength = 120,
          springConstant = 0.04
        )
      ) |>
      visLegend(position = "right", main = "Node type")

    Hover over any node to see its label. Use the Select by id dropdown or a node to highlight movies shared with Tom Hanks.

    < section id="conclusion" class="level2" data-number="7">

    7 Conclusion

    neo2R 3.0.0 removes the last friction point for R users who want to work with Neo4j Aura: a single startGraph() call now handles cloud and local instances uniformly, the httr2 backend gives reliable retries and clean error handling, and the Cypher query interface remains exactly as it was.

    < section id="further-reading" class="level3" data-number="7.1">

    7.1 Further reading

    To leave a comment for the author, please follow the link and comment on their blog: Patrice Godard.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Exit mobile version