Skip to content

DATAVIEW is a big data workflow management system. It uses Dropbox as the data cloud and Amazon EC2 as the compute cloud. Current research focuses on the security and privacy aspects of DATAVIEW as well as performance and cost optimization for running workflows in clouds.

Notifications You must be signed in to change notification settings

shiyonglu/DATAVIEW

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DATAVIEW

DATAVIEW (www.dataview.org) is a big data workflow management system. It uses Dropbox as the data cloud and Amazon EC2 as the compute cloud. It also provides a workflow_LocalExecutor for users to run their local machine off the cloud. Current research focuses on the 1) performance and cost optimization for running workflows in clouds and 2) infrastructual-level support on GPU-enabled deep learning workflows. For deep learning workflows, it currently supports GPU infrastructures including 1) the Local NVIDIA GPU of a PC, 2) GPU Xavier and Nano SoMs (System-on-Module) and 3) the Heterogeneous GPU Cluster.

DATAVIEW supports two programing interfaces to develop and run workflows:

  1. JAVA API: A programmer can develop various workflow tasks and workflows based on the DATAVIEW libraries. /DATAVIEW/src/test.java shows the six steps to create a customized workflow and execute it in Amazon EC2 or Local PC environment.
  • The external dependecies libraries must be added to the Eclipse project from /DATAVIEW/WebContent/WEB-INF/lib
  • To utilize the Amazon EC2, the accessKey and secretKey should be updated in config.properties under /DATAVIEW/WebContent/workflowLibDir/
  • After finishing the workflow, please terminate all the EC2 instances from your AWS account manually (in the case of running worklfow in Amazon EC2).
  1. Visual Programming: DATAVIEW is deployed as a Web site in Tomcat and a user can drag and drop tasks and link them into a workflow in a visual workflow design and execution environment called Webbench.
  • A dropbox accout is necessary to store all the input data, workflow tasks, the final output files produced by the workflow execution. The user needs to create Three default folders Dropbox/DATAVIEW/Tasks, which stores the task file (class file or jar file); Dropbox/DATAVIEW/Workflows, which stores the mxgraph file for the generated workflow; Dropbox/DATAVIEW-INPUT, which stores the input files for a workflow. Four relational algebra tasks (jar files) and input files are already stored under the DATAVIEW/WebContent/workflowTaskDir folder.
  • A local account needs to be registered to show a visualized workflow.
  • A dropbox token should be provided in the main interface when you login in, which can be generated based on this tutorial:https://blogs.dropbox.com/developers/2014/05/generate-an-access-token-for-your-own-account/

Download and configure DATAVIEW as JAVA API

Check out tutorial: https://youtu.be/xJikeWptYSw or follow the instructions below:
  1. Download the DATAVIEW package from https://github.com/shiyonglu/DATAVIEW by clicking the "Clone or Download" button.
  2. Unzip the DATAVIEW-master.zip file and import the DATAVIEW project into Eclipse as an "Existing Projects into Workspace" by selecting "Projects from Folder or Archive".
  3. The external dependecies libraries must be added to the Eclipse project from /DATAVIEW/WebContent/WEB-INF/lib
  4. /DATAVIEW/src/test.java shows the six steps to create a new workflow and execute it with local executor.

Download, configure, and deploy DATAVIEW as a Website

Check out tutorial: https://youtu.be/7Sz4PSD_6Cs or follow the instructions below:
  1. Follow the first three steps from Download and configure DATAVIEW as JAVA API
  2. Create three default folders Dropbox/DATAVIEW/Tasks, which stores the task file (class file or jar file); Dropbox/DATAVIEW/Workflows, which stores the mxgraph file for the generated workflow; Dropbox/DATAVIEW-INPUT, which stores the input files for a workflow in your dropbox.
  3. Get a dropbox token.

Run Deep Learning workflow (NNWorkflow) in DATAVIEW on Local NVIDIA GPU

Check out The introduction of DlaaW (Deep-learning-as-a-workflow) in DATAVIEW : https://www.youtube.com/watch?v=3KDq5CTcrGE.

Below are some extra tips aside from instructions in Download, configure, and deploy DATAVIEW as a Website and Download, configure, and deploy DATAVIEW as a Website:

  1. JAVA API: A programmer can utilize various workflow NNasks and NNWorkflows based on the DATAVIEW libraries. /DATAVIEW/src/NNTest.java shows the 4 steps to create a customized NNWorkflow and execute it in one of NNTrainers (each corresponding to one specific execution plan and GPU infrastructure).
  • There is no need to install extra libraries or driver (e.g. CUDA toolkit) as long as you have a local NVIDIA GPU on your PC.
  • In order to run NNWorkflow Java API version, need tomcat version lower than or equal to tomcat 9 (Our recommendation is tomcat 9).
  1. Visual Programming: DATAVIEW is deployed as a Web site in Tomcat and a user can drag and drop tasks and link them into a NNWorkflow in a visual workflow design and execution environment called Webbench.
  • In order to run NNWorkflow Website version on your Local PC, need java jdk version less than or equal to 15 (Our recommendation is JAVA JDK 15).
  • To run NNWorkflow in web GUIs, you should copy following files from your local DATAVIEW TrainerDLLs and ExecutorDLLs folders from /DATAVIEW/WebContent/workflowTaskDir repository to the DATAVIEW-INPUT folder in your dropbox, files including jsoncpp.dll, maintest.dll, nnExecutor.dll

DATAVIEW Tutorials

  1. Chapter 1: A gentle introduction to DATAVIEW https://youtu.be/7S4iGKXpaAc)
  2. How to download, import DATAVIEW into Eclipse as Java API and run a workflow with local executor (https://youtu.be/xJikeWptYSw)
  3. How to create a relational algebra workflow in DATAVIEW through the interface (https://youtu.be/AQw0S_QO8zg)

About

DATAVIEW is a big data workflow management system. It uses Dropbox as the data cloud and Amazon EC2 as the compute cloud. Current research focuses on the security and privacy aspects of DATAVIEW as well as performance and cost optimization for running workflows in clouds.

Resources

Stars

Watchers

Forks

Packages

No packages published