Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.9.0 #143

Merged
merged 62 commits into from
Dec 30, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
73376a1
adding functionality for DAG
pzivich Jul 2, 2020
db05662
adding direct input from networkx object
pzivich Jul 5, 2020
7f64ac4
adding functionality to load and optional networkx dependency
pzivich Jul 5, 2020
18bc89a
adding tests for causal graphs
pzivich Jul 5, 2020
3841f3e
adding better error handling
pzivich Jul 5, 2020
af43b80
adding documentation for DirectedAcyclicGraph
pzivich Jul 5, 2020
7b38350
travis updates
pzivich Jul 5, 2020
fc145d7
versioning numpy on travis for python 3.5
pzivich Jul 5, 2020
f2acde7
init 0.9.0 and new travis setup
pzivich Jul 12, 2020
d918231
optional import in init
pzivich Jul 12, 2020
1e8635b
Merge branch 'dag' into v0.9.0
pzivich Jul 12, 2020
deae6ee
Cross-fit AIPTW implementations
pzivich Jul 19, 2020
a429d99
adding tests for causal graphs
pzivich Jul 19, 2020
ac548ef
aipw_calculator as function in utils
pzivich Jul 19, 2020
db7b8f5
fixing returning IDs via #126
pzivich Aug 3, 2020
e715ba2
a working zipper plot implementation
pzivich Aug 3, 2020
900dce4
Adoption of SuperLearner into zEpid. Adds EmpiricalMeanSL and Stepwis…
pzivich Dec 12, 2020
037eb1c
Finished both forward and backward step-wise selection for the models
pzivich Dec 12, 2020
1944271
SuperLearner implementation added. Still need to compare to R, write …
pzivich Dec 13, 2020
2c77db3
Added examples of use to documentation
pzivich Dec 13, 2020
686dc31
Adding test to SuperLearner and the candidate estimators
pzivich Dec 13, 2020
92f9a30
Finished tests
pzivich Dec 14, 2020
f45da47
Having SuperLearner `return self` so it works with the current setup …
pzivich Dec 14, 2020
ef71e67
adding custom_model to AIPW
pzivich Dec 15, 2020
6569882
replacing bounding in TMLE with new format
pzivich Dec 15, 2020
4f3a1e6
further tweaks to superlearner
pzivich Dec 15, 2020
744e1a9
adding better tests for TMLE with custom_model
pzivich Dec 15, 2020
5d9e409
finishing replacement of _bounding_ with zepid.calc.utils.probability…
pzivich Dec 15, 2020
fc68556
updating readthedocs to have Super Learner documentation
pzivich Dec 16, 2020
fac0e07
Adding risk-ratio support for AIPTW and CrossfitAIPTW
pzivich Dec 16, 2020
297b9ba
Adding seeds for partitioning procedure for reproducibility
pzivich Dec 16, 2020
8aee128
Adding example data set for the crossfit estimators
pzivich Dec 16, 2020
1b6f653
AIPTW cross-fitting fit tests. Just check that current operationaliza…
pzivich Dec 16, 2020
8fecc0f
adding CrossfitTMLE to cross-fitting estimators
pzivich Dec 17, 2020
69151dd
fixing random_state flow
pzivich Dec 27, 2020
065f1a3
adding AIPTW OOB detector
pzivich Dec 27, 2020
983428f
adding additional superlearner probability bound check for predict_proba
pzivich Dec 27, 2020
a24ce59
adding diagnostics to cross-fit estimators
pzivich Dec 27, 2020
386487c
updating setup.py to drop python 3.5 support
pzivich Dec 27, 2020
8a7e7f8
fixing AIPW warning to avoid misdetection for continuous outcomes
pzivich Dec 27, 2020
7596e8a
adding more tests for the cross-fit estimators
pzivich Dec 27, 2020
f5ac31a
some test fixes. I think it will still fail due to seeds though
pzivich Dec 27, 2020
061f0b1
should fix travisci issue...
pzivich Dec 28, 2020
095aa13
Alternative way to specify RandomState for travisci compat?
pzivich Dec 28, 2020
44a1e27
checking if on numpy or pandas side for differences
pzivich Dec 28, 2020
076458d
well it isn't numpy, so maybe it is pandas...
pzivich Dec 28, 2020
984cf10
the issue is reproducibility is not a function of the sample splittin…
pzivich Dec 28, 2020
a0bc459
fix travis-requirements typo...
pzivich Dec 28, 2020
cf9837b
adding GLM SuperLearner functionality (since sci-kit learn has extra …
pzivich Dec 28, 2020
c04ae71
Updating crossfit and other DR estimators to use GLMSL rather than sk…
pzivich Dec 28, 2020
e819b47
fixing equal to all_close in GMLSL tests
pzivich Dec 28, 2020
ae5cbee
adding function to load formatted data (and checked it with IPTW)
pzivich Dec 29, 2020
839d267
added binary exposure check to check_input_data as this is in GEstima…
pzivich Dec 29, 2020
d5211de
Replaced check_input_data in most estimators! (generalize will eventu…
pzivich Dec 29, 2020
51352b7
Better looking print_results returned
pzivich Dec 29, 2020
948ea74
adding variance warning to IPTW confidence intervals for the ATT and …
pzivich Dec 29, 2020
877bbbe
adding manual test for zipper_plot
pzivich Dec 29, 2020
e3759b7
adding zipper_plot to ReadTheDocs
pzivich Dec 29, 2020
5bb0dbc
Adding GLMSL docs to ReadTheDocs
pzivich Dec 29, 2020
eccf829
Updating ReadTheDocs (and making it easier to manage going forward)
pzivich Dec 30, 2020
07f46de
Update DAG ReadTheDocs
pzivich Dec 30, 2020
1451729
Updating Change-log for all v0.9.0 additions
pzivich Dec 30, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ build
dist
*.egg-info
.idea
builddir/*
venv/*
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
language: python

dist: focal

python:
- "3.5"
- "3.6"
- "3.7"
- "3.8"

notifications:
email: false
Expand Down
54 changes: 45 additions & 9 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,32 @@
## Change logs

### v0.9.0
The 0.9.x series drops support of Python 3.5.x. Only Python 3.6+ are now supported. Support has also been added for
Python 3.8

Cross-fit estimators have been implemented for better causal inference with machine learning. Cross-fit estimators
include `SingleCrossfitAIPTW`, `DoubleCrossfitAIPTW`, `SingleCrossfitTMLE`, and `DoubleCrossfitTMLE`. Currently
functionality is limited to treatment and outcome nuisance models only (i.e. no model for missing data). These
estimators also do not accept weighted data (since most of `sklearn` does not support weights)

Super-learner functionality has been added via `SuperLearner`. Additions also include emprical mean (`EmpiricalMeanSL`),
generalized linear model (`GLMSL`), and step-wise backward/forward selection via AIC (`StepwiseSL`). These new
estimators are wrappers that are compatible with `SuperLearner` and mimic some of the R superlearner functionality.

Directed Acyclic Graphs have been added via `DirectedAcyclicGraph`. These analyze the graph for sufficient adjustment
sets, and can be used to display the graph. These rely on an optional NetworkX dependency.

`AIPTW` now supports the `custom_model` optional argument for user-input models. This is the same as `TMLE` now.

`zipper_plot` function for creating zipper plots has been added.

Housekeeping: `bound` has been updated to new procedure, updated how `print_results` displays to be uniform, created
function to check missingness of input data in causal estimators, added warning regarding ATT and ATU variance for
IPTW, and added back observation IDs for `MonteCarloGFormula`

Future plans: `TimeFixedGFormula` will be deprecated in favor of two estimators with different labels. This will more
clearly delineate ATE versus stochastic effects. The replacement estimators are to be added

### v0.8.2
`IPSW` and `AIPSW` now natively support adjusting for confounding. Both now have the `treatment_model()` function,
which calculates the inverse probability of treatment weights. How weights are handled in `AIPSW` are updated. They
Expand Down Expand Up @@ -256,9 +283,9 @@ specified

``TMLE`` now allows estimation of risk ratios and odds ratios. Estimation procedure is based on ``tmle.R``

``TMLE`` variance formula has been modified to match ``tmle.R`` rather than other resources. This is beneficial for future
implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority for me at
this time).
``TMLE`` variance formula has been modified to match ``tmle.R`` rather than other resources. This is beneficial for
future implementation of missing data adjustment. Also would allow for mediation analysis with TMLE (not a priority
for me at this time).

``TMLE`` now includes an option to place bounds on predicted probabilities using the ``bound`` option. Default is to use
all predicted probabilities. Either symmetrical or asymmetrical truncation can be specified.
Expand Down Expand Up @@ -303,9 +330,14 @@ edition pg340.
### v0.3.0
**BIG CHANGES**:

To conform with PEP and for clarity, all association/effect measures on a pandas dataframe are now class statements. This makes them distinct from the summary data calculators. Additionally, it allows users to access any part of the results now, unlike the previous implementation. The SD can be pulled from the corresponds results dataframe. Please see the updated webiste for how to use the class statements.
To conform with PEP and for clarity, all association/effect measures on a pandas dataframe are now class statements.
This makes them distinct from the summary data calculators. Additionally, it allows users to access any part of the
results now, unlike the previous implementation. The SD can be pulled from the corresponds results dataframe. Please
see the updated webiste for how to use the class statements.

Name changes within the calculator branch. With the shift of the dataframe calculations to classes, now these functions are given more descriptive names. Additionally, all functions now return a list of the point estimate, SD, lower CL, upper CL. Please see the website for all the new function names
Name changes within the calculator branch. With the shift of the dataframe calculations to classes, now these
functions are given more descriptive names. Additionally, all functions now return a list of the point estimate, SD,
lower CL, upper CL. Please see the website for all the new function names

Addition of Targeted Maximum Likelihood Estimator as zepid.causal.doublyrobust.TMLE

Expand Down Expand Up @@ -372,18 +404,21 @@ Addition of IPW for Interference settings. No current timeline but hopefully bef
Further conforming to PEP guidelines (my bad)

#### 0.1.6
Removed histogram option from IPTW in favor of kernel density. Since histograms are easy to generate with matplotlib, just dropped the entire option.
Removed histogram option from IPTW in favor of kernel density. Since histograms are easy to generate with matplotlib,
just dropped the entire option.

Created causal branch. IPW functions moved inside this branch

Added depreciation warning to the IPW branch, since this will be removed in 0.2 in favor of the causal branch for organization of future implemented methods
Added depreciation warning to the IPW branch, since this will be removed in 0.2 in favor of the causal branch for
organization of future implemented methods

Added time-fixed g-formula

Added simple double-robust estimator (based on Funk et al 2011)

#### 0.1.5
Fix to 0.1.4 and since PyPI does not allow reuse of library versions, I had to create new one. Fixes issue with ipcw_prep() that was a pandas error (tried to drop NoneType from columns)
Fix to 0.1.4 and since PyPI does not allow reuse of library versions, I had to create new one. Fixes issue with
ipcw_prep() that was a pandas error (tried to drop NoneType from columns)

#### 0.1.4
Updates: Added dynamic risk plot
Expand All @@ -394,4 +429,5 @@ Fixes: Added user option to allow late entries for ipcw_prep()
Updates: added ROC curve generator to graphics, allows user-specification of censoring indicator to ipcw,

#### 0.1.2
Original release. Previous versions (0.1.0, 0.1.1) had errors I found when trying to install via PyPI. I forgot to include the `package` statement in `setup`
Original release. Previous versions (0.1.0, 0.1.1) had errors I found when trying to install via PyPI. I forgot to
include the `package` statement in `setup`
87 changes: 87 additions & 0 deletions docs/Causal Graphs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
.. image:: images/zepid_logo_small.png

-------------------------------------

Causal Graphs
'''''''''''''

This page demonstrates analysis for causal diagrams (graphs). These diagrams are meant to help identify the sufficient
adjustment set to identify the causal effect. Currently only directed acyclic graphs are supported by single-world
intervention graphs will be added.

Note that this branch requires installation of ``NetworkX`` since that library is used to analyses the graph objects

Directed Acyclic Graphs
==========================
Directed acyclic graphs (DAGs) provide an easy graphical tool to determine sufficient adjustment sets to control for all
confounding and identify the causal effect of an exposure on an outcome. DAGs rely on the assumption of d-separation of
the exposure and outcome. Currently the ``DirectedAcyclicGraph`` class only allows for assessing the d-separation
of the exposure and outcome. Additional support for checking d-separation between missingness, censoring, mediators,
and time-varying exposures will be added in future versions.

Remember that DAGs should be constructed prior to data collection preferablly. Also the major assumptions that a DAG
makes is the *lack* of arrows and *lack* of nodes. The assumptions are the items not present within the diagram.

Let's look at some classical examples of DAGs.

M-Bias
^^^^^^^^^^^

First we will create the "M-bias" DAG. This DAG is named after its distinct shape

.. code:: python

from zepid.causal.causalgraph import DirectedAcyclicGraph
import matplotlib.pyplot as plt

dag = DirectedAcyclicGraph(exposure='X', outcome="Y")
dag.add_arrows((('X', 'Y'),
('U1', 'X'), ('U1', 'B'),
('U2', 'B'), ('U2', 'Y')
))
pos = {"X": [0, 0], "Y": [1, 0], "B": [0.5, 0.5],
"U1": [0, 1], "U2": [1, 1]}

dag.draw_dag(positions=pos)
plt.tight_layout()
plt.show()

.. image:: images/zepid_dag_mbias.png

After creating the DAG, we can determine the sufficient adjustment set

.. code:: python

dag.calculate_adjustment_sets()
print(dag.adjustment_sets)

Since B is a collider, the minimally sufficient adjustment set is the empty set

Butterfly-Bias
^^^^^^^^^^^^^^
Butterfly-bias is an extension of the previous M-bias DAG where we need to adjust for B but B also opens a backdoor
path (specifically the path it is a collider on).

.. code:: python

dag.add_arrows((('X', 'Y'),
('U1', 'X'), ('U1', 'B'),
('U2', 'B'), ('U2', 'Y'),
('B', 'X'), ('B', 'Y')
))

dag.draw_dag(positions=pos)
plt.tight_layout()
plt.show()

.. image:: images/zepid_dag_bbias.png

In the case of Butterfly bias, there are 3 possible adjustment sets

.. code:: python

dag.calculate_adjustment_sets()
print(dag.adjustment_sets)

Remember that DAGs should be constructed prior to data collection preferablly. Also the major assumptions that a DAG
makes is the *lack* of arrows and *lack* of nodes. The assumptions are the items not present within the diagram
26 changes: 26 additions & 0 deletions docs/Graphics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -385,3 +385,29 @@ one more example,
In this example, there is additive modification, but *no multiplicative modification*. These plots also can have the
number of reference lines displayed changed, and support the keyword arguments of `plt.plot()` function. See the
function documentation for further details.


Zipper Plot
===========
Zipper plots provide an easy way to visualize the performance of confidence intervals in simulations. Confidence
intervals across simulations are displayed in a single plot, with the option to color the confidence limits by whether
they include the true value. Below is an example of a zipper plot. For ease, I generated the confidence intervals using
some random numbers (you would pull the confidence intervals from the estimators in practice).

.. code:: python

from zepid.graphics import zipper_plot
lower = np.random.uniform(-0.1, 0.1, size=100)
upper = lower + np.random.uniform(0.1, 0.2, size=100)

zipper_plot(truth=0,
lcl=lower,
ucl=upper,
colors=('blue', 'green'))
plt.show()


.. image:: images/zipper_example.png

In this example, confidence interval coverage would be considered rather poor (if we are expecting the usual 95%
coverage).
27 changes: 27 additions & 0 deletions docs/Reference/Causal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@ Causal
======
Documentation for each of the causal inference methods implemented in zEpid

Causal Diagrams
---------------------------

.. currentmodule:: zepid.causal.causalgraph.dag

.. autosummary::
:toctree: generated/

DirectedAcyclicGraph


Inverse Probability Weights
---------------------------

Expand Down Expand Up @@ -60,6 +71,14 @@ Augmented Inverse Probability Weights

AIPTW

.. currentmodule:: zepid.causal.doublyrobust.crossfit

.. autosummary::
:toctree: generated/

SingleCrossfitAIPTW
DoubleCrossfitAIPTW

Targeted Maximum Likelihood Estimator
-------------------------------------

Expand All @@ -71,6 +90,14 @@ Targeted Maximum Likelihood Estimator
TMLE
StochasticTMLE

.. currentmodule:: zepid.causal.doublyrobust.crossfit

.. autosummary::
:toctree: generated/

SingleCrossfitTMLE
DoubleCrossfitTMLE

G-estimation of SNM
-------------------

Expand Down
1 change: 1 addition & 0 deletions docs/Reference/Graphics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Displaying Results
pvalue_plot
dynamic_risk_plot
labbe_plot
zipper_plot


.. automodule:: zepid.graphics.graphics
Expand Down
24 changes: 24 additions & 0 deletions docs/Reference/Super Learner.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Super Learner
====================
Details for super learner and associated candidate estimators
within zEpid.

Super Learners
--------------

.. currentmodule:: zepid.superlearner.stackers

.. autosummary::

SuperLearner

Candidate Estimators
---------------------

.. currentmodule:: zepid.superlearner.estimators

.. autosummary::

EmpiricalMeanSL
GLMSL
StepwiseSL
18 changes: 14 additions & 4 deletions docs/Reference/generated/zepid.base.Diagnostics.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,24 @@
zepid.base.Diagnostics
=========================================
zepid.base.Diagnostics
======================

.. currentmodule:: zepid.base

.. autoclass:: Diagnostics
:members:


.. automethod:: __init__


.. rubric:: Methods

.. autosummary::


~Diagnostics.__init__
~Diagnostics.fit
~Diagnostics.summary






20 changes: 15 additions & 5 deletions docs/Reference/generated/zepid.base.IncidenceRateDifference.rst
Original file line number Diff line number Diff line change
@@ -1,15 +1,25 @@
zepid.base.IncidenceRateDifference
=========================================
zepid.base.IncidenceRateDifference
==================================

.. currentmodule:: zepid.base

.. autoclass:: IncidenceRateDifference
:members:


.. automethod:: __init__


.. rubric:: Methods

.. autosummary::


~IncidenceRateDifference.__init__
~IncidenceRateDifference.fit
~IncidenceRateDifference.summary
~IncidenceRateDifference.plot
~IncidenceRateDifference.summary






Loading