deepforge-dev · brollb · Apr 9, 2020 · Apr 8, 2020 · Apr 8, 2020 · Apr 8, 2020
diff --git a/docs/deployment/dockerized.rst b/docs/deployment/dockerized.rst
diff --git a/docs/deployment/native.rst b/docs/deployment/native.rst
@@ -55,17 +55,7 @@ By default, DeepForge will start on `http://localhost:8888`. However, the port c
 
 Worker
 ~~~~~~
-The DeepForge worker can be started with
-
-.. code-block:: bash
-
-    deepforge start --worker
-
-To connect to a remote deepforge instance, add the url of the DeepForge server:
-
-.. code-block:: bash
-
-    deepforge start --worker http://myaddress.com:1234
+The DeepForge worker (used with WebGME compute) can be used to enable users to connect their own machines to use for any required computation. This can be installed from `https://github.com/deepforge-dev/worker`. It is recommended to install `Conda <https://conda.io/en/latest/>`_ on the worker machine so any dependencies can be automatically installed.
 
 Updating
 ~~~~~~~~
@@ -109,22 +99,6 @@ and navigate to `http://localhost:8888` to start using DeepForge!
 
 Alternatively, if jobs are going to be executed on an external worker, run `./bin/deepforge start -s` locally and navigate to `http://localhost:8888`.
 
-DeepForge Worker
-~~~~~~~~~~~~~~~~
-If you are using `./bin/deepforge start -s` you will need to set up a DeepForge worker (`./bin/deepforge start` starts a local worker for you!). DeepForge workers are slave machines connected to DeepForge which execute the provided jobs. This allows the jobs to access the GPU, etc, and provides a number of benefits over trying to perform deep learning tasks in the browser.
-
-Once DeepForge is installed on the worker, start it with
-
-.. code-block:: bash
-
-    ./bin/deepforge start -w
-
-Note: If you are running the worker on a different machine, put the address of the DeepForge server as an argument to the command. For example:
-
-.. code-block:: bash
-
-    ./bin/deepforge start -w http://myaddress.com:1234
-
 Updating
 ~~~~~~~~
 Updating can be done the same as any other git project; that is, by running `git pull` from the project root. Sometimes, the dependencies need to be updated so it is recommended to run `npm install` following `git pull`.

diff --git a/docs/deployment/overview.rst b/docs/deployment/overview.rst
@@ -5,20 +5,17 @@ DeepForge Component Overview
 ----------------------------
 DeepForge is composed of four main elements:
 
-- *Server*: Main component hosting all the project information and is connected to by the clients.
-- *Database*: MongoDB database containing DeepForge, job queue for the workers, etc.
-- *Worker*: Slave machine performing the actual machine learning computation.
 - *Client*: The connected browsers working on DeepForge projects.
-
-Of course, only the *Server*, *Database* (MongoDB) and *Worker* need to be installed. If you are not going to execute any machine learning pipelines, installing the *Worker* can be skipped.
+- *Server*: Main component hosting all the project information and is connected to by the clients.
+- *Compute*: Connected computational resources used for executing pipelines.
+- *Storage*: Connected storage resources used for storing project data artifacts such as datasets or trained model weights.
 
 Component Dependencies
 ----------------------
 The following dependencies are required for each component:
 
-- *Server* (NodeJS v8.11.3)
+- *Server* (NodeJS LTS)
 - *Database* (MongoDB v3.0.7)
-- *Worker*: NodeJS v8.11.3 (used for job management logic) and Python 3. If you are using the deepforge-keras extension, you will also need Keras and `TensorFlow <https://tensorflow.org>`_ installed.
 - *Client*: We recommend using Google Chrome and are not supporting other browsers (for now). In other words, other browsers can be used at your own risk.
 
 Configuration

diff --git a/docs/deployment/quick_start.rst b/docs/deployment/quick_start.rst
@@ -0,0 +1,19 @@
+Quick Start
+===========
+The recommended (and easiest) way to get started with DeepForge is using docker-compose. First, install `docker <https://docs.docker.com/engine/installation/>`_ and `docker-compose <https://docs.docker.com/compose/install/>`_.
+
+Next, download the docker-compose file for DeepForge:
+
+.. code-block:: bash
+
+    wget https://raw.githubusercontent.com/deepforge-dev/deepforge/master/docker/docker-compose.yml
+
+Then start DeepForge using docker-compose:
+
+.. code-block:: bash
+
+    docker-compose up
+
+and now DeepForge can be used by opening a browser to `http://localhost:8888 <http://localhost:8888>`_!
+
+For detailed instructions about deployment installations, check out our `deployment installation instructions <../getting_started/configuration.rst>`_ An example of customizing a deployment using docker-compose can be found `here <https://github.com/deepforge-dev/deepforge/tree/master/.deployment>`_.
diff --git a/docs/fundamentals/custom_operations.rst b/docs/fundamentals/custom_operations.rst
@@ -9,68 +9,127 @@ Operations are used in pipelines and have named inputs and outputs. When creatin
 
 .. figure:: operation_editor.png
     :align: center
-    :scale: 45 %
 
-    Editing the "Train" operation from the "CIFAR10" example
+    Editing the "TrainValidate" operation from the "redshift" example
 
-The interface editor is provided on the left and presents the interface as a diagram showing the input data and output data as objects flowing into or out of the given operation. Selecting the operation node in the operation interface editor will expand the node and allow the user to add or edit attributes for the given operation. These attributes are exposed when using this operation in a pipeline and can be set at design time - that is, these are set when creating the given pipeline. The interface diagram may also contain light blue nodes flowing into the operation. These nodes represent "references" that the operation accepts as input before running. When using the operation, references will appear alongside the attributes but will allow the user to select from a list of all possible targets when clicked.
+The interface editor is provided on the right and presents the interface as a diagram showing the input data and output data as objects flowing into or out of the given operation. Selecting the operation node in the operation interface editor will expand the node and allow the user to add or edit attributes for the given operation. These attributes are exposed when using this operation in a pipeline and can be set at design time - that is, these are set when creating the given pipeline. The interface diagram may also contain light blue nodes flowing into the operation. These nodes represent "references" that the operation accepts as input before running. When using the operation, references will appear alongside the attributes but will allow the user to select from a list of all possible targets when clicked.
 
 .. figure:: operation_interface.png
     :align: center
     :scale: 85 %
 
-    The train operation accepts training data, a model and attributes for shuffling data, setting the batch size, and the number of epochs.
+    The TrainValidate operation accepts training data, a model and attributes for setting the batch size, and the number of epochs.
 
-On the right of the operation editor is the implementation editor. The implementation editor is a code editor specially tailored for programming the implementations of operations in DeepForge. It also is synchronized with the interface editor. A section of the implementation is shown below:
+The operation editor also provides an interface to specify operation python dependencies. DeepForge uses
+:code:`conda` to manage python dependencies for an operation. This pairs well with the integration of various compute platforms that available to the user and the only requirement for a user is to have Conda installed in their computing platform. You can specify operation dependencies using a conda environment `file <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#create-env-file-manually>`_ as shown in the diagram below:
+
+
+.. figure:: operation_environment.png
+    :align: center
+
+    The operation environment contains python dependencies for the given operation.
+
+To the left of the operation editor is the implementation editor. The implementation editor is a code editor specially tailored for programming the implementations of operations in DeepForge. It also is synchronized with the interface editor. A section of the implementation is shown below:
 
 .. code:: python
+
+    import numpy as np
+    from sklearn.model_selection import train_test_split
     import keras
+    import time
     from matplotlib import pyplot as plt
 
-    class Train():
-        def __init__(self, model, shuffle=True, epochs=100, batch_size=32):
-            self.model = model
-
-            self.epochs = epochs
-            self.shuffle = shuffle
-            self.batch_size = batch_size
-            return
-
+    import tensorflow as tf
 
-        def execute(self, training_data):
-            (x_train, y_train) = training_data
-            opt = keras.optimizers.rmsprop(lr=0.0001, decay=1e-6)
-            self.model.compile(loss='categorical_crossentropy',
-                               optimizer=opt,
-                               metrics=['accuracy'])
-            plot_losses = PlotLosses()
-            self.model.fit(x_train, y_train,
-                           self.batch_size,
-                           epochs=self.epochs,
-                           callbacks=[plot_losses],
-                           shuffle=self.shuffle)
-
-            model = self.model
-            return model
+    import tensorflow as tf
+    config = tf.compat.v1.ConfigProto()
+    config.gpu_options.allow_growth = True
+    sess = tf.compat.v1.Session(config=config)
 
-The "Train" operation uses capabilities from the :code:`keras` package to train the neural network. This operation sets all the parameters using values provided to the operation as either attributes or references. In the implementation, attributes are provided as arguments to the constructor making the user defined attributes accessible from within the implementation. References are treated similarly to operation inputs and are also arguments to the constructor. This can be seen with the :code:`model` constructor argument. Finally, operations return their outputs in the :code:`execute` method; in this example, it returns a single output named :code:`model`, that is, the trained neural network.
+    class TrainValidate():
+        def __init__(self, model, epochs=10, batch_size=32):
+            self.model=model
+            self.batch_size = batch_size
+            self.epochs = epochs
+            np.random.seed(32)
+            return
 
-After defining the interface and implementation, we can now use the "Train" operation in our pipelines! An example is shown below.
+        def execute(self, dataset):
+            model=self.model
+            model.summary()
+            model.compile(optimizer='adam',
+                          loss='sparse_categorical_crossentropy',
+                          metrics=['sparse_categorical_accuracy'])
+            X = dataset['X']
+            y = dataset['y']
+            y_cats = self.to_categorical(y)
+            model.fit(X, y_cats,
+                      epochs=self.epochs,
+                      batch_size=self.batch_size,
+                      validation_split=0.15,
+                      callbacks=[PlotLosses()])
+            return model.get_weights()
+
+        def to_categorical(self, y, max_y=0.4, num_possible_classes=32):
+            one_step = max_y / num_possible_classes
+            y_cats = []
+            for values in y:
+                y_cats.append(int(values[0] / one_step))
+            return y_cats
+
+        def datagen(self, X, y):
+            # Generates a batch of data
+            X1, y1 = list(), list()
+            n = 0
+            while 1:stash@{1}
+                for sample, label in zip(X, y):
+                    n += 1
+                    X1.append(sample)
+                    y1.append(label)
+                    if n == self.batch_size:
+                        yield [[np.array(X1)], y1]
+                        n = 0
+                        X1, y1 = list(), list()
+
+
+    class PlotLosses(keras.callbacks.Callback):
+        def on_train_begin(self, logs={}):
+            self.i = 0
+            self.x = []
+            self.losses = []
+
+        def on_epoch_end(self, epoch, logs={}):
+            self.x.append(self.i)
+            self.losses.append(logs.get('loss'))
+            self.i += 1
+
+            self.update()
+
+        def update(self):
+            plt.clf()
+            plt.title("Training Loss")
+            plt.ylabel("CrossEntropy Loss")
+            plt.xlabel("Epochs")
+            plt.plot(self.x, self.losses, label="loss")
+            plt.legend()
+            plt.show()
+
+The "TrainValidate" operation uses capabilities from the :code:`keras` package to train the neural network. This operation sets all the parameters using values provided to the operation as either attributes or references. In the implementation, attributes are provided as arguments to the constructor making the user defined attributes accessible from within the implementation. References are treated similarly to operation inputs and are also arguments to the constructor. This can be seen with the :code:`model` constructor argument. Finally, operations return their outputs in the :code:`execute` method; in this example, it returns a single output named :code:`model`, that is, the trained neural network.
+
+After defining the interface and implementation, we can now use the "TrainValidate" operation in our pipelines! An example is shown below.
 
 .. figure:: train_operation.png
     :align: center
     :scale: 85 %
 
-    Using the "Train" operation in a pipeline
+    Using the "TrainValidate" operation in a pipeline
 
-Operation feedback
+Operation Feedback
 ------------------
 Operations in DeepForge can generate metadata about its execution. This metadata is generated during the execution and provided back to the user in real-time. An example of this includes providing real-time plotting feedback. When implementing an operation in DeepForge, this metadata can be created using the :code:`matplotlib` plotting capabilities.
 
-.. figure:: graph_example.png
+.. figure:: plotloss.png
     :align: center
     :scale: 75 %
 
-    An example graph of the loss function while training a neural network
-
-Detailed information about the available operation metadata types can be found in the `reference <reference/feedback_mechanisms.rst>`_.
+    An example graph of the loss function while training a neural network.
diff --git a/docs/fundamentals/integration.rst b/docs/fundamentals/integration.rst
@@ -0,0 +1,25 @@
+Storage and Compute Adapters
+============================
+DeepForge is designed to integrate with existing computational and storage resources and is not intended to be a competitor to existing HPC or object storage frameworks.
+This integration is made possible through the use of compute and storage adapters. This section provides a brief description of these adapters as well as currently supported integrations.
+
+Storage Adapters
+----------------
+Projects in DeepForge may contain artifacts which reference datasets, trained model weights, or other associated binary data. Although the project code, pipelines, and models are stored in MongoDB, this associated data is stored using a storage adapter. Storage adapters enable DeepForge to store this associated data using an appropriate storage resource, such as a object store w/ an S3-compatible API.
+This also enables users to "bring their own storage" as they can connect their existing cyberinfrastructure to a public deployment of DeepForge.
+Currently, DeepForge supports 3 different storage adapters:
+
+1. S3 Storage: Object storage with an S3-compatible API such as `minio <https://play.min.io>`_ or `AWS S3 <https://aws.amazon.com/s3/>`_
+2. SciServer Files Service : Files service from `SciServer <https://sciserver.org>`_
+3. WebGME Blob Server : Blob storage provided by `WebGME <https://webgme.org/>`_
+
+Compute Adapters
+----------------
+Similar to storage adapters, compute adapters enable DeepForge to integrate with existing cyberinfrastructure used for executing some computation or workflow. This is designed to allow users to leverage their existing HPC or other computational resources with DeepForge. Compute adapters provide an interface through which DeepForge is able to execute workflows (e.g., training a neural network) on external machines.
+
+Currently, the following compute adapters are available:
+
+1. WebGME Worker: A worker machine which polls for jobs via the `WebGME Executor Framework <https://github.com/webgme/webgme/wiki/GME-Executor-Framework>`_. Registered users can connect their own compute machines enabling them to use their personal desktops with DeepForge.
+2. SciServer-Compute: Compute service offered by `SciServer <https://sciserver.org>`_
+3. Server Compute: Execute the job on the server machine. This is similar to the execution model used by Jupyter notebook servers.
+
diff --git a/docs/fundamentals/operation_editor.png b/docs/fundamentals/operation_editor.png
diff --git a/docs/fundamentals/operation_environment.png b/docs/fundamentals/operation_environment.png
diff --git a/docs/fundamentals/operation_interface.png b/docs/fundamentals/operation_interface.png
diff --git a/docs/fundamentals/plotloss.png b/docs/fundamentals/plotloss.png
diff --git a/docs/fundamentals/train_operation.png b/docs/fundamentals/train_operation.png