amzn · meissnereric · Feb 20, 2019 · Jan 2, 2019 · Jan 7, 2019 · Jan 9, 2019
diff --git a/docs/design_documents/inference.md b/docs/design_documents/inference.md
@@ -2,7 +2,7 @@
 
 ## Overview
 
-Inference in MXFusion is broken down into a few logical pieces that can be combined together as necessary.
+Inference in MXFusion is broken down into a few logical pieces that can be combined together as necessary. MXFusion relies on MXNet's Gluon as the underlying computational engine.
 
 The highest level object you'll deal with will be derived from the ```mxfusion.inference.Inference``` class. This is the outer loop that drives the inference algorithm, holds the relevant parameters and models for training, and handles serialization after training. At a minimum, ```Inference``` objects take as input the ```InferenceAlgorithm``` to run. On creation, an ```InferenceParameters``` object is created and attached to the ```Inference``` method which will store and manage (MXNet) parameters during inference.
 
@@ -126,3 +126,35 @@ infr2.load(primary_model_file=PREFIX+'_graph_0.json',
 
 
 ```
+
+## Inference Internals
+
+This section goes into more details about the steps that happen under the hood when an inference method is actually run.
+
+Pseudocode for this process to reference:
+```
+
+m = make_model()
+observed = [m.y, m.x]
+q = Posterior(model=m)
+alg = StochasticVariationalInference(model=m, observed=observed, posterior=q)
+infr = GradBasedInference(inference_algorithm=alg, grad_loop=BatchInferenceLoop())
+infr.initialize(y=y, x=x)
+infr.run(max_iter=1, learning_rate=1e-2, y=y, x=x)
+
+```
+
+
+As discussed above, the first thing that happens for a variational inference method is to create a ```Posterior``` from the ```Model```. This makes a copy of the model that can then be changed without altering the structure of the original model while allowing the user to logically reference the same variable in the model and posterior.
+
+When the ```InferenceAlgorithm``` object is created (```StochasticVariationalInference``` above), references to the ```Model``` and ```Posterior``` objects are kept but no additional MXNet memory or parameters are allocated at this time.
+
+When the ```Inference``` (```GradBasedInference``` above) object is created, again, references to the graph objects are kept but no MXNet memory is allocated yet. An ```InferenceParameters``` object is also created but parameters are not created in it yet.
+
+Some ```Inference``` classes need their ```initialize(...)``` methods be called before calling ```run(...)```, but most can be called by simply calling ```run(...)``` with the appropriate arguments, and it will call initialize before proceeding with the run step.
+
+When ```run(**kwargs)``` is called, the 3 primary steps happen:
+1. ```Inference.initialize()``` is called if not already initialized. This derives the correct shapes of everything from the data passed in via ```kwargs``` and initializes all of the MXNet Parameter objects needed for the computation.
+2. ```Inference.create_executor()``` is called (which calls it's ```InferenceAlgorithm.create_executor()```'s method) to create an ObjectiveBlock. This is an MXNet Gluon HybridBlock object. This is the primary computational graph object which gets executed to perform inference in MXFusion.
+ * If desired, this block can be hybridized and saved down into a symbolic graph for reloading by passing in ```hybridize=True``` when initializing your ```Inference``` object. See MXNet Gluon documentation on [hybrid mode](https://mxnet.incubator.apache.org/tutorials/gluon/hybrid.html) for more details.
+3. The ```ObjectiveBlock``` or ```executor``` created in the last step is now run, running data through the MXNet compute graph that was constructed.
diff --git a/docs/design_documents/serialization.md b/docs/design_documents/serialization.md
@@ -0,0 +1,40 @@
+# Serialization
+
+## Saving
+
+Saving your work in MXFusion is straightforward.
+Saving an inference method will save the model, any additional graphs used in the inference method, the state of the parameters at the end of the inference method, and any relevant configuration and constants used for the inference. Simply call ```.save``` on the run inference method you want to save.
+
+```python
+infr.save(prefix=PREFIX)
+```
+
+
+The model and other graphs are all saved into a single JSON file using NetworkX's [JSON graph format](https://networkx.github.io/documentation/latest/reference/readwrite/json_graph.html). MXFusion ModelComponents are serialized into JSON objects (see ```mxfusion.util.graph_serialization```) and Modules are stored recursively as sub-graphs inside the same JSON structure. The most important information attached to a ModelComponent when it is saved is its place in the graph topology, its UUID, and its 'name' attribute, as these are how we reload the graph and parameters in successfully later. It is important to note that only a skeleton of the graphs are actually saved and that the model creation code must be re-run at load time.
+
+
+> *If you're curious why we can't save all of it, this is because we can't save things with arbitrary code like MXNet Block's that users can use within MXFusion as Functions in their FactorGraphs.)*
+
+
+The parameters of and constants are saved using MXNet Gluon's serialization functionality, as a map of UUID to numerical value. Any other inference configurations are saved into a simple JSON file.
+
+## Loading back to MXFusion
+
+Loading back in inference results in MXFusion is also straightforward. Before loading, re-run the model/posterior and inference creation code that you ran when you trained the Inference method. Then call ```.load``` on the newly created inference method, passing in the relevant file names from the save step.
+
+```python
+infr2.load(graphs_file=PREFIX+'_graphs.json',
+           parameters_file=PREFIX+'_params.json')
+```
+
+The loading process has 3 major steps. The first is to reload the graphs and parameters from files into memory. The second is to reconcile those loaded graphs and parameters with the current model and inference method. The third is to load the rest of the configuration.
+
+The first step uses NetworkX to load back in the graphs, which it loads into skeleton FactorGraphs (not full Models or Posteriors, and only basic ModelComponents with connections not Variables and Factors with information like what type of distribution the Factor is) because only minimal topology and naming information is saved during serialization. It uses MXNet to load the parameters back into Gluon Parameters.
+
+The second step traverses the loaded skeleton FactorGraphs and attempts to match the variables in those graphs to the corresponding variables in the current model that you ran before loading the model . When it finds a match, it loads the corresponding parameter into the current inference's parameters and makes a note of this match. It then performs this process recursively for all variables in all of the graphs. We use the UUIDs and names of the variables, and the topology of the graphs as relevant information during the reconciliation process but it's not perfect and may fail sometimes due to ambiguities in the graph. If this happens, try naming more variables explicitly in the graph by attaching them to the graph directly, i.e. ```m.node = Variable()``` (or filing an issue!)
+
+The third step simply loads from the JSON configuration file into the inference method and relevant configuration.
+
+# Hybridize and loading from native MXNet
+
+TBD See issue \#109.