pzivich · pzivich · Dec 30, 2020 · Jul 2, 2020 · Jul 5, 2020 · Jul 5, 2020
diff --git a/.gitignore b/.gitignore
@@ -4,3 +4,5 @@ build
 dist
 *.egg-info
 .idea
+builddir/*
+venv/*
diff --git a/.travis.yml b/.travis.yml
@@ -1,9 +1,11 @@
 language: python
 
+dist: focal
+
 python:
-   - "3.5"
    - "3.6"
    - "3.7"
+   - "3.8"
 
 notifications:
   email: false

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,32 @@
 ## Change logs
 
+### v0.9.0
+The 0.9.x series drops support of Python 3.5.x. Only Python 3.6+ are now supported. Support has also been added for
+Python 3.8
+
+Cross-fit estimators have been implemented for better causal inference with machine learning. Cross-fit estimators 
+include `SingleCrossfitAIPTW`, `DoubleCrossfitAIPTW`, `SingleCrossfitTMLE`, and `DoubleCrossfitTMLE`. Currently 
+functionality is limited to treatment and outcome nuisance models only (i.e. no model for missing data). These 
+estimators also do not accept weighted data (since most of `sklearn` does not support weights)
+
+Super-learner functionality has been added via `SuperLearner`. Additions also include emprical mean (`EmpiricalMeanSL`),
+generalized linear model (`GLMSL`), and step-wise backward/forward selection via AIC (`StepwiseSL`). These new 
+estimators are wrappers that are compatible with `SuperLearner` and mimic some of the R superlearner functionality.
+
+Directed Acyclic Graphs have been added via `DirectedAcyclicGraph`. These analyze the graph for sufficient adjustment
+sets, and can be used to display the graph. These rely on an optional NetworkX dependency.
+
+`AIPTW` now supports the `custom_model` optional argument for user-input models. This is the same as `TMLE` now.
+
+`zipper_plot` function for creating zipper plots has been added. 
+
+Housekeeping: `bound` has been updated to new procedure, updated how `print_results` displays to be uniform, created
+function to check missingness of input data in causal estimators, added warning regarding ATT and ATU variance for 
+IPTW, and added back observation IDs for `MonteCarloGFormula`
+
+Future plans: `TimeFixedGFormula` will be deprecated in favor of two estimators with different labels. This will more 
+clearly delineate ATE versus stochastic effects. The replacement estimators are to be added
+
 ### v0.8.2
 `IPSW` and `AIPSW` now natively support adjusting for confounding. Both now have the `treatment_model()` function, 
 which calculates the inverse probability of treatment weights. How weights are handled in `AIPSW` are updated. They 
@@ -256,9 +283,9 @@ specified
 
 ``TMLE`` now allows estimation of risk ratios and odds ratios. Estimation procedure is based on ``tmle.R``
 
-``TMLE`` variance formula has been modified to match ``tmle.R`` rather than other resources. This is beneficial for future 
-implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority for me at
-this time). 
+``TMLE`` variance formula has been modified to match ``tmle.R`` rather than other resources. This is beneficial for 
+future implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority 
+for me at this time). 
 
 ``TMLE`` now includes an option to place bounds on predicted probabilities using the ``bound`` option. Default is to use
 all predicted probabilities. Either symmetrical or asymmetrical truncation can be specified.
@@ -303,9 +330,14 @@ edition pg340.
 ### v0.3.0
 **BIG CHANGES**:
 
-To conform with PEP and for clarity, all association/effect measures on a pandas dataframe are now class statements. This makes them distinct from the summary data calculators. Additionally, it allows users to access any part of the results now, unlike the previous implementation. The SD can be pulled from the corresponds results dataframe. Please see the updated webiste for how to use the class statements.
+To conform with PEP and for clarity, all association/effect measures on a pandas dataframe are now class statements. 
+This makes them distinct from the summary data calculators. Additionally, it allows users to access any part of the 
+results now, unlike the previous implementation. The SD can be pulled from the corresponds results dataframe. Please 
+see the updated webiste for how to use the class statements.
 
-Name changes within the calculator branch. With the shift of the dataframe calculations to classes, now these functions are given more descriptive names. Additionally, all functions now return a list of the point estimate, SD, lower CL, upper CL. Please see the website for all the new function names
+Name changes within the calculator branch. With the shift of the dataframe calculations to classes, now these 
+functions are given more descriptive names. Additionally, all functions now return a list of the point estimate, SD, 
+lower CL, upper CL. Please see the website for all the new function names
 
 Addition of Targeted Maximum Likelihood Estimator as zepid.causal.doublyrobust.TMLE
 
@@ -372,18 +404,21 @@ Addition of IPW for Interference settings. No current timeline but hopefully bef
 Further conforming to PEP guidelines (my bad)
 
 #### 0.1.6
-Removed histogram option from IPTW in favor of kernel density. Since histograms are easy to generate with matplotlib, just dropped the entire option.
+Removed histogram option from IPTW in favor of kernel density. Since histograms are easy to generate with matplotlib, 
+just dropped the entire option.
 
 Created causal branch. IPW functions moved inside this branch
 
-Added depreciation warning to the IPW branch, since this will be removed in 0.2 in favor of the causal branch for organization of future implemented methods
+Added depreciation warning to the IPW branch, since this will be removed in 0.2 in favor of the causal branch for 
+organization of future implemented methods
 
 Added time-fixed g-formula
 
 Added simple double-robust estimator (based on Funk et al 2011)
 
 #### 0.1.5
-Fix to 0.1.4 and since PyPI does not allow reuse of library versions, I had to create new one. Fixes issue with ipcw_prep() that was a pandas error (tried to drop NoneType from columns)
+Fix to 0.1.4 and since PyPI does not allow reuse of library versions, I had to create new one. Fixes issue with 
+ipcw_prep() that was a pandas error (tried to drop NoneType from columns)
 
 #### 0.1.4
 Updates: Added dynamic risk plot
@@ -394,4 +429,5 @@ Fixes: Added user option to allow late entries for ipcw_prep()
 Updates: added ROC curve generator to graphics, allows user-specification of censoring indicator to ipcw,
 
 #### 0.1.2
-Original release. Previous versions (0.1.0, 0.1.1) had errors I found when trying to install via PyPI. I forgot to include the `package` statement in `setup`
+Original release. Previous versions (0.1.0, 0.1.1) had errors I found when trying to install via PyPI. I forgot to 
+include the `package` statement in `setup`
diff --git a/docs/Causal Graphs.rst b/docs/Causal Graphs.rst
@@ -0,0 +1,87 @@
+.. image:: images/zepid_logo_small.png
+
+-------------------------------------
+
+Causal Graphs
+'''''''''''''
+
+This page demonstrates analysis for causal diagrams (graphs). These diagrams are meant to help identify the sufficient
+adjustment set to identify the causal effect. Currently only directed acyclic graphs are supported by single-world
+intervention graphs will be added.
+
+Note that this branch requires installation of ``NetworkX`` since that library is used to analyses the graph objects
+
+Directed Acyclic Graphs
+==========================
+Directed acyclic graphs (DAGs) provide an easy graphical tool to determine sufficient adjustment sets to control for all
+confounding and identify the causal effect of an exposure on an outcome. DAGs rely on the assumption of d-separation of
+the exposure and outcome. Currently the ``DirectedAcyclicGraph`` class only allows for assessing the d-separation
+of the exposure and outcome. Additional support for checking d-separation between missingness, censoring, mediators,
+and time-varying exposures will be added in future versions.
+
+Remember that DAGs should be constructed prior to data collection preferablly. Also the major assumptions that a DAG
+makes is the *lack* of arrows and *lack* of nodes. The assumptions are the items not present within the diagram.
+
+Let's look at some classical examples of DAGs.
+
+M-Bias
+^^^^^^^^^^^
+
+First we will create the "M-bias" DAG. This DAG is named after its distinct shape
+
+.. code:: python
+
+   from zepid.causal.causalgraph import DirectedAcyclicGraph
+   import matplotlib.pyplot as plt
+
+   dag = DirectedAcyclicGraph(exposure='X', outcome="Y")
+   dag.add_arrows((('X', 'Y'),
+                   ('U1', 'X'), ('U1', 'B'),
+                   ('U2', 'B'), ('U2', 'Y')
+                  ))
+   pos = {"X": [0, 0], "Y": [1, 0], "B": [0.5, 0.5],
+          "U1": [0, 1], "U2": [1, 1]}
+
+   dag.draw_dag(positions=pos)
+   plt.tight_layout()
+   plt.show()
+
+.. image:: images/zepid_dag_mbias.png
+
+After creating the DAG, we can determine the sufficient adjustment set
+
+.. code:: python
+
+   dag.calculate_adjustment_sets()
+   print(dag.adjustment_sets)
+
+Since B is a collider, the minimally sufficient adjustment set is the empty set
+
+Butterfly-Bias
+^^^^^^^^^^^^^^
+Butterfly-bias is an extension of the previous M-bias DAG where we need to adjust for B but B also opens a backdoor
+path (specifically the path it is a collider on).
+
+.. code:: python
+
+   dag.add_arrows((('X', 'Y'),
+                   ('U1', 'X'), ('U1', 'B'),
+                   ('U2', 'B'), ('U2', 'Y'),
+                   ('B', 'X'), ('B', 'Y')
+                   ))
+
+   dag.draw_dag(positions=pos)
+   plt.tight_layout()
+   plt.show()
+
+.. image:: images/zepid_dag_bbias.png
+
+In the case of Butterfly bias, there are 3 possible adjustment sets
+
+.. code:: python
+
+   dag.calculate_adjustment_sets()
+   print(dag.adjustment_sets)
+
+Remember that DAGs should be constructed prior to data collection preferablly. Also the major assumptions that a DAG
+makes is the *lack* of arrows and *lack* of nodes. The assumptions are the items not present within the diagram
diff --git a/docs/Graphics.rst b/docs/Graphics.rst
@@ -385,3 +385,29 @@ one more example,
 In this example, there is additive modification, but *no multiplicative modification*. These plots also can have the
 number of reference lines displayed changed, and support the keyword arguments of `plt.plot()` function. See the
 function documentation for further details.
+
+
+Zipper Plot
+===========
+Zipper plots provide an easy way to visualize the performance of confidence intervals in simulations. Confidence
+intervals across simulations are displayed in a single plot, with the option to color the confidence limits by whether
+they include the true value. Below is an example of a zipper plot. For ease, I generated the confidence intervals using
+some random numbers (you would pull the confidence intervals from the estimators in practice).
+
+.. code:: python
+
+ from zepid.graphics import zipper_plot
+ lower = np.random.uniform(-0.1, 0.1, size=100)
+ upper = lower + np.random.uniform(0.1, 0.2, size=100)
+
+ zipper_plot(truth=0,
+             lcl=lower,
+             ucl=upper,
+             colors=('blue', 'green'))
+ plt.show()
+
+
+.. image:: images/zipper_example.png
+
+In this example, confidence interval coverage would be considered rather poor (if we are expecting the usual 95%
+coverage).
diff --git a/docs/Reference/Causal.rst b/docs/Reference/Causal.rst
@@ -2,6 +2,17 @@ Causal
 ======
 Documentation for each of the causal inference methods implemented in zEpid
 
+Causal Diagrams
+---------------------------
+
+.. currentmodule:: zepid.causal.causalgraph.dag
+
+.. autosummary::
+   :toctree: generated/
+
+   DirectedAcyclicGraph
+
+
 Inverse Probability Weights
 ---------------------------
 
@@ -60,6 +71,14 @@ Augmented Inverse Probability Weights
 
    AIPTW
 
+.. currentmodule:: zepid.causal.doublyrobust.crossfit
+
+.. autosummary::
+   :toctree: generated/
+
+   SingleCrossfitAIPTW
+   DoubleCrossfitAIPTW
+
 Targeted Maximum Likelihood Estimator
 -------------------------------------
 
@@ -71,6 +90,14 @@ Targeted Maximum Likelihood Estimator
    TMLE
    StochasticTMLE
 
+.. currentmodule:: zepid.causal.doublyrobust.crossfit
+
+.. autosummary::
+   :toctree: generated/
+
+   SingleCrossfitTMLE
+   DoubleCrossfitTMLE
+
 G-estimation of SNM
 -------------------
 

diff --git a/docs/Reference/Graphics.rst b/docs/Reference/Graphics.rst
@@ -24,6 +24,7 @@ Displaying Results
   pvalue_plot
   dynamic_risk_plot
   labbe_plot
+  zipper_plot
 
 
 .. automodule:: zepid.graphics.graphics

diff --git a/docs/Reference/Super Learner.rst b/docs/Reference/Super Learner.rst
@@ -0,0 +1,24 @@
+Super Learner
+====================
+Details for super learner and associated candidate estimators
+within zEpid.
+
+Super Learners
+--------------
+
+.. currentmodule:: zepid.superlearner.stackers
+
+.. autosummary::
+
+  SuperLearner
+
+Candidate Estimators
+---------------------
+
+.. currentmodule:: zepid.superlearner.estimators
+
+.. autosummary::
+
+  EmpiricalMeanSL
+  GLMSL
+  StepwiseSL
diff --git a/docs/Reference/generated/zepid.base.Diagnostics.rst b/docs/Reference/generated/zepid.base.Diagnostics.rst
@@ -1,14 +1,24 @@
-zepid.base.Diagnostics
-=========================================
+zepid.base.Diagnostics
+======================
 
 .. currentmodule:: zepid.base
 
 .. autoclass:: Diagnostics
-   :members:
 
+
+   .. automethod:: __init__
+
+
    .. rubric:: Methods
 
    .. autosummary::
-
+
+      ~Diagnostics.__init__
       ~Diagnostics.fit
       ~Diagnostics.summary
+
+
+
+
+
+
diff --git a/docs/Reference/generated/zepid.base.IncidenceRateDifference.rst b/docs/Reference/generated/zepid.base.IncidenceRateDifference.rst
@@ -1,15 +1,25 @@
-zepid.base.IncidenceRateDifference
-=========================================
+zepid.base.IncidenceRateDifference
+==================================
 
 .. currentmodule:: zepid.base
 
 .. autoclass:: IncidenceRateDifference
-   :members:
 
+
+   .. automethod:: __init__
+
+
    .. rubric:: Methods
 
    .. autosummary::
-
+
+      ~IncidenceRateDifference.__init__
       ~IncidenceRateDifference.fit
-      ~IncidenceRateDifference.summary
       ~IncidenceRateDifference.plot
+      ~IncidenceRateDifference.summary
+
+
+
+
+
+