forked from davidlrosenblum/neo4r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
446 lines (317 loc) · 12.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
---
output: github_document
always_allow_html: yes
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
[![Travis-CI Build Status](https://travis-ci.org/statnmap/neo4r.svg?branch=master)](https://travis-ci.org/statnmap/neo4r)
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
system("docker run --name neo4j --env NEO4J_AUTH=neo4j/password --publish=7474:7474 --publish=7687:7687 -d neo4j")
library(neo4r)
con <- neo4j_api$new(
url = "http://localhost:7474",
user = "neo4j",
password = "password"
)
while (try(con$ping()) != 200){
Sys.sleep(3)
}
```
[![lifecycle](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
> Disclaimer: this package is still under active development. Read the [NEWS.md](NEWS.md) to be informed of the last changes.
Read complementary documentation at [https://neo4j-rstats.github.io/user-guide/](https://neo4j-rstats.github.io/user-guide/)
# neo4r
The goal of {neo4r} is to provide a modern and flexible Neo4J driver for R.
It's modern in the sense that the results are returned as tibbles whenever possible, it relies on modern tools, and it is designed to work with pipes. Our goal is to provide a driver that can be easily integrated in a data analysis workflow, especially by providing an API working smoothly with other data analysis (`{dplyr}` or `{purrr}`) and graph packages (`{igraph}`, `{ggraph}`, `{visNetwork}`...).
It's flexible in the sense that it is rather unopinionated regarding the way it returns the results, by trying to stay as close as possible to the way Neo4J returns data. That way, you have the control over the way you will compute the results. At the same time, the result is not too complex, so that the "heavy lifting" of data wrangling is not left to the user.
The connexion object is also an easy to control R6 method, allowing you to update and query information from the API.
## Server Connection
Please note that __for now, the connection is only possible through http / https__.
## Installation
You can install {neo4r} from GitHub with:
```{r gh-installation, eval = FALSE}
# install.packages("remotes")
remotes::install_github("neo4j-rstats/neo4r")
```
or from CRAN :
```{r, eval = FALSE}
install.packages("neo4r")
```
## Create a connexion object
Start by creating a new connexion object with `neo4j_api$new`
```{r}
library(neo4r)
con <- neo4j_api$new(
url = "http://localhost:7474",
user = "neo4j",
password = "plop"
)
```
This connexion object is designed to interact with the Neo4J API.
It comes with some methods to retrieve information from it. `ping()`, for example, tests if the endpoint is available.
```{r}
# Test the endpoint, that will not work :
con$ping()
```
Being an R6 object, `con` is flexible in the sense that you can change `url`, `user` and `password` at any time:
```{r}
con$reset_user("neo4j")
con$reset_password("password")
con$ping()
```
Other methods:
```{r}
# Get Neo4J Version
con$get_version()
# List constaints (if any)
con$get_constraints()
# Get a vector of labels (if any)
con$get_labels()
# Get a vector of relationships (if any)
con$get_relationships()
# Get index
con$get_index()
```
## Call the API
You can either create a separate query or insert it inside the `call_neo4j` function.
The `call_neo4j()` function takes several arguments :
+ `query` : the cypher query
+ `con` : the connexion object
+ `type` : "rows" or "graph": whether to return the results as a list of results in tibble, or as a graph object (with `$nodes` and `$relationships`)
+ `output` : the output format (R or json)
+ `include_stats` : whether or not to include the stats about the call
+ `meta` : whether or not to include the meta arguments of the nodes when calling with "rows"
### The movie graph
Starting at version 0.1.3, the `play_movie()` function returns the full cypher query to create the movie graph example from the Neo4J examples.
```{r}
play_movies() %>%
call_neo4j(con)
```
### "rows" format
The user chooses whether or not to return a list of tibbles when calling the API. You get as many objects as specified in the RETURN cypher statement.
```{r}
library(magrittr)
'MATCH (tom {name: "Tom Hanks"}) RETURN tom;' %>%
call_neo4j(con)
'MATCH (cloudAtlas {title: "Cloud Atlas"}) RETURN cloudAtlas;' %>%
call_neo4j(con)
"MATCH (people:Person)-[relatedTo]-(:Movie {title: 'Cloud Atlas'}) RETURN people.name, Type(relatedTo), relatedTo" %>%
call_neo4j(con, type = 'row')
```
By default, results are returned as an R list of tibbles. For example here, `RETURN tom` will return a one element list, with object named `tom`. We think this is the more "truthful" way to implement the outputs regarding Neo4J calls.
When you want to return two nodes types, you'll get two results, in the form of two tibbles - the result is a two elements list with each element being labelled the way it has been specified in the Cypher query.
```{r}
'MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies' %>%
call_neo4j(con)
```
Results can also be returned in JSON, for example for writing to a file:
```{r}
tmp <- tempfile(fileext = ".json")
'MATCH (people:Person) RETURN people.name LIMIT 1' %>%
call_neo4j(con, output = "json") %>%
write(tmp)
jsonlite::read_json(tmp)
```
If you turn the `type` argument to `"graph"`, you'll get a graph result:
```{r}
'MATCH (tom:Person {name: "Tom Hanks"})-[act:ACTED_IN]->(tomHanksMovies) RETURN act,tom,tomHanksMovies' %>%
call_neo4j(con, type = "graph")
```
The result is returned as one node or relationship by row.
Due to the specific data format of Neo4J, there can be more than one label and property by node and relationship. That's why the results is returned, by design, as a list-dataframe.
We have designed several functions to unnest the output :
+`unnest_nodes()`, that can unnest a node dataframe :
```{r}
res <- 'MATCH (tom:Person {name:"Tom Hanks"})-[a:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN m AS acted,coActors.name' %>%
call_neo4j(con, type = "graph")
unnest_nodes(res$nodes)
```
Please, note that this function will return `NA` for the properties that aren't in a node.
Also, it is possible to unnest either the properties or the labels :
```{r}
res %>%
extract_nodes() %>%
unnest_nodes(what = "properties")
```
```{r}
res %>%
extract_nodes() %>%
unnest_nodes(what = "label")
```
+ `unnest_relationships()`
There is only one nested column in the relationship table, thus the function is quite straightforward :
```{r}
'MATCH (people:Person)-[relatedTo]-(:Movie {title: "Cloud Atlas"}) RETURN people.name, Type(relatedTo), relatedTo' %>%
call_neo4j(con, type = "graph") %>%
extract_relationships() %>%
unnest_relationships()
```
Note that `unnest_relationships()` only does one level of unnesting.
+ `unnest_graph`
This function takes a graph results, and does `unnest_nodes` and `unnest_relationships`.
```{r}
'MATCH (people:Person)-[relatedTo]-(:Movie {title: "Cloud Atlas"}) RETURN people.name, Type(relatedTo), relatedTo' %>%
call_neo4j(con, type = "graph") %>%
unnest_graph()
```
### Extraction
There are two convenient functions to extract nodes and relationships:
```{r}
'MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood) RETURN DISTINCT hollywood' %>%
call_neo4j(con, type = "graph") %>%
extract_nodes()
```
```{r}
'MATCH p=shortestPath(
(bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"})
)
RETURN p' %>%
call_neo4j(con, type = "graph") %>%
extract_relationships()
```
## Convert for common graph packages
### {igraph}
In order to be converted into a graph object:
+ The nodes should be a dataframe with the first column being a series of unique ID, understood as "names" by igraph - these are the ID columns from Neo4J. Other columns are considered attributes.
+ relationships need a start and an end, *i.e.* startNode and endNode in the Neo4J results.
Here how to create a graph object from a `{neo4r}` result:
```{r}
G <- "MATCH a=(p:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(m:Movie) RETURN a;" %>%
call_neo4j(con, type = "graph")
library(dplyr)
library(purrr)
# Create a dataframe with col 1 being the ID,
# And columns 2 being the names
G$nodes <- G$nodes %>%
unnest_nodes(what = "properties") %>%
# We're extracting the first label of each node, but
# this column can also be removed if not needed
mutate(label = map_chr(label, 1))
head(G$nodes)
```
We then reorder the relationnship table:
```{r}
G$relationships <- G$relationships %>%
unnest_relationships() %>%
select(startNode, endNode, type, everything()) %>%
mutate(roles = unlist(roles))
head(G$relationships)
```
```{r}
graph_object <- igraph::graph_from_data_frame(
d = G$relationships,
directed = TRUE,
vertices = G$nodes
)
plot(graph_object)
```
This can also be used with `{ggraph}` :
```{r}
library(ggraph)
graph_object %>%
ggraph() +
geom_node_label(aes(label = label)) +
geom_edge_link() +
theme_graph()
```
### {visNetwork}
`{visNetwork}` expects the following format :
#### nodes
- "id" : id of the node, needed in edges information
- "label" : label of the node
- "group" : group of the node. Groups can be configure with visGroups
- "value" : size of the node
- "title" : tooltip of the node
#### edges
- "from" : node id of begin of the edge
- "to" : node id of end of the edge
- "label" : label of the edge
- "value" : size of the node
- "title" : tooltip of the node
(from `?visNetwork::visNetwork`).
`visNetwork` is smart enough to transform a list column into several label, so we don't have to worry too much about this one.
Here's how to convert our `{neo4r}` result:
```{r eval = FALSE}
G <-"MATCH a=(p:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(m:Movie) RETURN a;" %>%
call_neo4j(con, type = "graph")
# We'll just unnest the properties
G$nodes <- G$nodes %>%
unnest_nodes(what = "properties")
head(G$nodes)
# Turn the relationships :
G$relationships <- G$relationships %>%
unnest_relationships() %>%
select(from = startNode, to = endNode, label = type)
head(G$relationships)
visNetwork::visNetwork(G$nodes, G$relationships)
```
## Sending data to the API
You can simply send queries has we have just seen, by writing the cypher query and call the api.
### Transform elements to cypher queries
+ `vec_to_cypher()` creates a list :
```{r}
vec_to_cypher(iris[1, 1:3], "Species")
```
+ and `vec_to_cypher_with_var()` creates a cypher call starting with a variable :
```{r}
vec_to_cypher_with_var(iris[1, 1:3], "Species", a)
```
This can be combined inside a cypher call:
```{r}
paste("MERGE", vec_to_cypher(iris[1, 1:3], "Species"))
```
### Reading and sending a cypher file :
+ `read_cypher` reads a cypher file and returns a tibble of all the calls:
```{r}
read_cypher("data-raw/create.cypher")
```
+ `send_cypher` reads a cypher file, and send it the the API. By default, the stats are returned.
```{r}
send_cypher("data-raw/constraints.cypher", con)
```
### Sending csv dataframe to Neo4J
The `load_csv` sends an csv from an url to the Neo4J browser.
The args are :
+ `on_load` : the code to execute on load
+ `con` : the connexion object
+ `url` : the url of the csv to send
+ `header` : whether or not the csv has a header
+ `periodic_commit` : the volume for PERIODIC COMMIT
+ `as` : the AS argument for LOAD CSV
+ `format` : the format of the result
+ `include_stats` : whether or not to include the stats
+ `meta` : whether or not to return the meta information
Let's use Neo4J `northwind-graph` example for that.
```{r}
# Create the query that will create the nodes and relationships
on_load_query <- 'CREATE (n:Product)
SET n = row,
n.unitPrice = toFloat(row.unitPrice),
n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder),
n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");'
# Send the csv
load_csv(url = "http://data.neo4j.com/northwind/products.csv",
con = con, header = TRUE, periodic_commit = 50,
as = "row", on_load = on_load_query)
```
### Using the Connection Pane
`{neo4r}` comes with a Connection Pane interface for RStudio.
Once installed, you can go to the "Connections", and use the widget to connect to the Neo4J server:
![](readmefigs/connectionpane.png)
## Sandboxing in Docker
You can get an RStudio / Neo4J sandbox with Docker :
```
docker pull colinfay/neo4r-docker
docker run -e PASSWORD=plop -e ROOT=TRUE -d -p 8787:8787 neo4r
```
## CoC
Please note that this project is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md).
By participating in this project you agree to abide by its terms.
```{r include = FALSE}
system("docker stop neo4j && sleep 2 && docker rm neo4j")
```