updated documentation

rmj3197 · Sep 29, 2024 · f130e95 · f130e95
1 parent 7cc0022
commit f130e95
Show file tree

Hide file tree

Showing 5 changed files with 11 additions and 8 deletions.
diff --git a/MDDC/datasets/data/statin49_with_cluster_idx.rst b/MDDC/datasets/data/statin49_with_cluster_idx.rst
@@ -6,8 +6,7 @@
 Description
 -----------
 
-A 50 by 7 data matrix of a contingency table processed from FDA Adverse Event Reporting System (FAERS) database from the third quarter of 2014 to the fourth quarter of 2020.
-This dataset also contains the cluster index of the various adverse events. 
+A 50 by 7 data matrix of a contingency table processed from FDA Adverse Event Reporting System (FAERS) database from the third quarter of 2014 to the fourth quarter of 2020. This dataset also contains the cluster index of the various adverse events. 
 
 Format
 ------

diff --git a/MDDC/utils/_utils.py b/MDDC/utils/_utils.py
@@ -701,8 +701,8 @@ def generate_contin_table_with_clustered_AE_with_tol(
             RTD = \\frac{|n^{orig}_{\\cdot \\cdot} - n^{sim}_{\\cdot \\cdot}|}{n^{orig}_{\\cdot \\cdot}} \\times 100
 
         This indicates the difference in the total number of reports in the simulated datasets and the original input
-        total number of reports. A lower value of tolerance will mean that the generated tables will have total number
-        of reports closer to the actual supplied value.
+        total number of reports. Sufficiently low values of tolerance will return generated tables with total number 
+        of reports equal to the actual supplied value.
 
     contin_table : numpy.ndarray, pandas.DataFrame, default=None
         A data matrix representing an I x J contingency table with row (adverse event) and column (drug) names.

diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@
 - We are interested in which (AE, drug) pairs are signals. The signals refer to potential adverse events that may be caused by a drug.
 - In the contingency table setting, the signals refer to the cells with $n_{ij}$ abnormally higher than the expected values.
 - [**Rousseeuw and Bossche (2018)**](https://wis.kuleuven.be/stat/robust/papers/publications-2018/rousseeuwvandenbossche-ddc-technometrics-2018.pdf) proposed the Detecting Deviating Cells (DDC) algorithm for outlier identification in a multivariate dataset.
-- The original DDC algorithm assumes multivariate normality of the data and selects cutoff values based on this assumption. The foundation of the DDC algorithm lies in detecting deviating data cells within a multivariate normally distributed dataset. Inspired by this work, we modify the DDC algorithm to better suit the discrete nature of adverse event data in pharmacovigilance that clearly do not follow a multivariate normal distribution. 
+- The original DDC algorithm assumes multivariate normality of the data and selects cutoff values based on this assumption. We modify the DDC algorithm to better suit the discrete nature of adverse event data in pharmacovigilance that clearly do not follow a multivariate normal distribution. 
 - Our Modified Detecting Deviating Cells (MDDC) algorithm has the following characteristics:
   1. It is easy to compute.
   2. It considers AE relationships.

diff --git a/docs/source/user_guide/mddc_algorithm.rst b/docs/source/user_guide/mddc_algorithm.rst
@@ -7,6 +7,10 @@ The Modified Detecting Deviating Cells (MDDC) algorithm is described as follows:
 
 1. **Standardized Pearson Residual Calculation**
 
+   Let the row marginals of a contingency table be denoted as :math:`n_{i\bullet} = \sum_{j = 1}^{J} n_{ij}`, the column marginals as :math:`n_{\bullet j} = \sum_{i = 1}^{I} n_{ij}`, 
+   and the total number of reports as :math:`n_{\bullet \bullet} = \sum_{i = 1}^{I} \sum_{j = 1}^{J} n_{ij}`.
+
+
    For each cell in the contingency table, compute the standardized Pearson residual:
 
    .. math::
@@ -20,7 +24,7 @@ The Modified Detecting Deviating Cells (MDDC) algorithm is described as follows:
       - :math:`\{e^+_{ij}\}` for cells with :math:`n_{ij} > 0`
       - :math:`\{e^0_{ij}\}` for cells with :math:`n_{ij} = 0`
 
-   The boxplot statistics are used as cutoff values for detecting the first set of outlying cells:
+   To identify the cutoff value use either the boxplot statistic defined below or the Monte Carlo (MC) method applied to the standardized Pearson residuals.
 
    .. math::
    

diff --git a/docs/source/user_guide/optimal_c_algorithm.rst b/docs/source/user_guide/optimal_c_algorithm.rst
@@ -7,10 +7,10 @@ This algorithm describes a method to determine the value of `c` in the cutoff fo
 
 **Steps:**
 
-1. Generate a large number of :math:`I × J` tables under the assumption of independence (:math:`\lambda_{ij} = 1`).
+1. Generate a large number of :math:`I \times J` tables under the assumption of independence (:math:`\lambda_{ij} = 1`).
 
 2. Compute the standardized Pearson residuals.
 
-3. Compute the upper limits with :math:`c = 1.5`, and calculate the FDR.
+3. Compute the upper limits of the boxplot statistic with :math:`c = 1.5`, and calculate the FDR.
 
 4. If :math:`FDR < 0.05`, stop. Otherwise, if :math:`FDR > 0.05`, use a grid search to find the optimal `c` such that :math:`FDR \leq 0.05`.