Skip to content

Commit

Permalink
updated documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
rmj3197 committed Sep 29, 2024
1 parent 7cc0022 commit f130e95
Show file tree
Hide file tree
Showing 5 changed files with 11 additions and 8 deletions.
3 changes: 1 addition & 2 deletions MDDC/datasets/data/statin49_with_cluster_idx.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
Description
-----------

A 50 by 7 data matrix of a contingency table processed from FDA Adverse Event Reporting System (FAERS) database from the third quarter of 2014 to the fourth quarter of 2020.
This dataset also contains the cluster index of the various adverse events.
A 50 by 7 data matrix of a contingency table processed from FDA Adverse Event Reporting System (FAERS) database from the third quarter of 2014 to the fourth quarter of 2020. This dataset also contains the cluster index of the various adverse events.

Format
------
Expand Down
4 changes: 2 additions & 2 deletions MDDC/utils/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -701,8 +701,8 @@ def generate_contin_table_with_clustered_AE_with_tol(
RTD = \\frac{|n^{orig}_{\\cdot \\cdot} - n^{sim}_{\\cdot \\cdot}|}{n^{orig}_{\\cdot \\cdot}} \\times 100
This indicates the difference in the total number of reports in the simulated datasets and the original input
total number of reports. A lower value of tolerance will mean that the generated tables will have total number
of reports closer to the actual supplied value.
total number of reports. Sufficiently low values of tolerance will return generated tables with total number
of reports equal to the actual supplied value.
contin_table : numpy.ndarray, pandas.DataFrame, default=None
A data matrix representing an I x J contingency table with row (adverse event) and column (drug) names.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
- We are interested in which (AE, drug) pairs are signals. The signals refer to potential adverse events that may be caused by a drug.
- In the contingency table setting, the signals refer to the cells with $n_{ij}$ abnormally higher than the expected values.
- [**Rousseeuw and Bossche (2018)**](https://wis.kuleuven.be/stat/robust/papers/publications-2018/rousseeuwvandenbossche-ddc-technometrics-2018.pdf) proposed the Detecting Deviating Cells (DDC) algorithm for outlier identification in a multivariate dataset.
- The original DDC algorithm assumes multivariate normality of the data and selects cutoff values based on this assumption. The foundation of the DDC algorithm lies in detecting deviating data cells within a multivariate normally distributed dataset. Inspired by this work, we modify the DDC algorithm to better suit the discrete nature of adverse event data in pharmacovigilance that clearly do not follow a multivariate normal distribution.
- The original DDC algorithm assumes multivariate normality of the data and selects cutoff values based on this assumption. We modify the DDC algorithm to better suit the discrete nature of adverse event data in pharmacovigilance that clearly do not follow a multivariate normal distribution.
- Our Modified Detecting Deviating Cells (MDDC) algorithm has the following characteristics:
1. It is easy to compute.
2. It considers AE relationships.
Expand Down
6 changes: 5 additions & 1 deletion docs/source/user_guide/mddc_algorithm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ The Modified Detecting Deviating Cells (MDDC) algorithm is described as follows:

1. **Standardized Pearson Residual Calculation**

Let the row marginals of a contingency table be denoted as :math:`n_{i\bullet} = \sum_{j = 1}^{J} n_{ij}`, the column marginals as :math:`n_{\bullet j} = \sum_{i = 1}^{I} n_{ij}`,
and the total number of reports as :math:`n_{\bullet \bullet} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} n_{ij}`.


For each cell in the contingency table, compute the standardized Pearson residual:

.. math::
Expand All @@ -20,7 +24,7 @@ The Modified Detecting Deviating Cells (MDDC) algorithm is described as follows:
- :math:`\{e^+_{ij}\}` for cells with :math:`n_{ij} > 0`
- :math:`\{e^0_{ij}\}` for cells with :math:`n_{ij} = 0`

The boxplot statistics are used as cutoff values for detecting the first set of outlying cells:
To identify the cutoff value use either the boxplot statistic defined below or the Monte Carlo (MC) method applied to the standardized Pearson residuals.

.. math::
Expand Down
4 changes: 2 additions & 2 deletions docs/source/user_guide/optimal_c_algorithm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ This algorithm describes a method to determine the value of `c` in the cutoff fo

**Steps:**

1. Generate a large number of :math:`I × J` tables under the assumption of independence (:math:`\lambda_{ij} = 1`).
1. Generate a large number of :math:`I \times J` tables under the assumption of independence (:math:`\lambda_{ij} = 1`).

2. Compute the standardized Pearson residuals.

3. Compute the upper limits with :math:`c = 1.5`, and calculate the FDR.
3. Compute the upper limits of the boxplot statistic with :math:`c = 1.5`, and calculate the FDR.

4. If :math:`FDR < 0.05`, stop. Otherwise, if :math:`FDR > 0.05`, use a grid search to find the optimal `c` such that :math:`FDR \leq 0.05`.

0 comments on commit f130e95

Please sign in to comment.