You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Presently, mazerunner provides the ability to perform graph analysis on the data already stored in Neo4j Server. However, one important feature is the ability to store the data streaming out of Spark into Neo4j in real time. And also, perform operation on that.
Can you please provide an example of how this integration might work? What is your input to Spark? What is the output? What's the acceptance criteria for this feature?
Hello, an example might be... I have Terabytes of data in HDFS. This data is comprised of Ad Impressions, Ad Clicks, ROI events driven by the interactions of Impressions / Clicks. There are concepts of a Browser, Ad, Impression, Click, ROI event.. and timestamps / ids for everything. Using a Spark job I would like to, at scale create a neo4j graph. The implementation of which I've tried to investigate on how to scale the creation / insertion of the neo4j data. It seems Mazerunner can take the output of a graphx job and resubmit via some Queue. It also seems like Mazerunner could build a graph from a basic spark / graphx query. And finally I looked into the batch-import project which seems really fast at possibly creating the necessary neo4j files. And subsequently, it would be great to re-batch in new data.
Presently, mazerunner provides the ability to perform graph analysis on the data already stored in Neo4j Server. However, one important feature is the ability to store the data streaming out of Spark into Neo4j in real time. And also, perform operation on that.
Example of one such condition can be: http://stackoverflow.com/questions/28896898/using-neo4j-with-apache-spark
The text was updated successfully, but these errors were encountered: