Code examples for the Spark+AI Summit Europe 2019 talk "Maps and Meaning: Graph-bases Entity resolution". Dive into a toy example via the notebooks (ER-Graphframes & ER-GraphX).
For further details, or if you'd like to try this on a specific use case, please do get in touch.
Easy Ways to get started:
-
Get the pyspark Docker container (with GraphFrames preinstalled). You can get it on Docker Hub here
-
The
Dockerfile
explains the extra layer on top of thejupyter/pyspark-notebook
base container (install of GraphFrames) -
The container can be launched via
docker-compose up
-
Jupyter notebooks are running on
localhost:8888
(seedocker-compose.yml
) -
You can also run the test of the
gfresolver
to make sure everything works well from within the container.