Create error estimator for SUM #528

dvadym · 2024-10-16T13:19:49Z

ErrorEstimator uses ContributionHistograms to estimate approximately RMSE error for different L0/Linf bounds. This PR allows to create ErrorEstimator for SUM (the different with COUNT, only different set of histograms)

RamSaw · 2024-10-16T15:35:29Z

pipeline_dp/dataset_histograms/histogram_error_estimator.py

            ratio_dropped_linf = self.get_ratio_dropped_linf(linf_bound)
        ratio_dropped = 1 - (1 - ratio_dropped_l0) * (1 - ratio_dropped_linf)
        stddev = self._get_stddev(l0_bound, linf_bound)
+        if to_print:


is it for debugging only? should we remove it? just double checking :)

Yes, thanks. Removed

pipeline_dp/dataset_histograms/histogram_error_estimator.py

RamSaw · 2024-10-16T15:40:01Z

tests/dataset_histograms/histogram_error_estimator_test.py

@@ -49,9 +49,18 @@ def _get_estimator(
        epsilon: float = 2**0.5 / 2,
        delta: Optional[float] = None,
    ):
-        return histogram_error_estimator.create_error_estimator(
+        return histogram_error_estimator.create_estimator_for_count_privacy_id_count(


I would rename from _get_estimator to _get_estimator_for_count_and_privacy_id_count.

RamSaw · 2024-10-16T15:45:10Z

tests/dataset_histograms/histogram_error_estimator_test.py

@@ -121,6 +139,17 @@ def test_get_ratio_dropped_linf(self, linf_bound, expected):
        self.assertAlmostEqual(estimator.get_ratio_dropped_linf(linf_bound),
                               expected)

+    @parameterized.parameters((0, 1), (0.5, 0.89), (1, 0.78), (2, 0.76),
+                              (40, 0))
+    # there 1 is contribution 40 and 10 contribution 1.


there is 1 contribution of 40 and 10 contributions of 1.

RamSaw · 2024-10-16T15:50:46Z

tests/dataset_histograms/histogram_error_estimator_test.py

+                              (40, 0))
+    # there 1 is contribution 40 and 10 contribution 1.
+    # total contribution = 1*40+10*1 = 50
+    # when linf_bound = 0.5, left after contribution bounding 11*0.5=5.5, i.e.


how linf can be a double and not an integer?

linf_bound is max contribution per partition, which means

max_contributions_per_partition for COUNT

max_sum_per_partition for SUM (which can be double)

dvadym

Thanks for review!

dvadym · 2024-10-17T09:17:47Z

pipeline_dp/dataset_histograms/histogram_error_estimator.py

            ratio_dropped_linf = self.get_ratio_dropped_linf(linf_bound)
        ratio_dropped = 1 - (1 - ratio_dropped_l0) * (1 - ratio_dropped_linf)
        stddev = self._get_stddev(l0_bound, linf_bound)
+        if to_print:


Yes, thanks. Removed

dvadym · 2024-10-17T09:19:33Z

tests/dataset_histograms/histogram_error_estimator_test.py

@@ -49,9 +49,18 @@ def _get_estimator(
        epsilon: float = 2**0.5 / 2,
        delta: Optional[float] = None,
    ):
-        return histogram_error_estimator.create_error_estimator(
+        return histogram_error_estimator.create_estimator_for_count_privacy_id_count(


dvadym · 2024-10-17T09:20:00Z

tests/dataset_histograms/histogram_error_estimator_test.py

@@ -121,6 +139,17 @@ def test_get_ratio_dropped_linf(self, linf_bound, expected):
        self.assertAlmostEqual(estimator.get_ratio_dropped_linf(linf_bound),
                               expected)

+    @parameterized.parameters((0, 1), (0.5, 0.89), (1, 0.78), (2, 0.76),
+                              (40, 0))
+    # there 1 is contribution 40 and 10 contribution 1.


dvadym · 2024-10-17T09:21:07Z

tests/dataset_histograms/histogram_error_estimator_test.py

+                              (40, 0))
+    # there 1 is contribution 40 and 10 contribution 1.
+    # total contribution = 1*40+10*1 = 50
+    # when linf_bound = 0.5, left after contribution bounding 11*0.5=5.5, i.e.


linf_bound is max contribution per partition, which means

max_contributions_per_partition for COUNT

max_sum_per_partition for SUM (which can be double)

Error estimator for SUM

95556eb

dvadym requested a review from RamSaw October 16, 2024 13:20

RamSaw approved these changes Oct 16, 2024

View reviewed changes

Addressed comments

381f880

dvadym commented Oct 17, 2024

View reviewed changes

dvadym merged commit 3a7a0ff into OpenMined:main Oct 17, 2024
6 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create error estimator for SUM #528

Create error estimator for SUM #528

dvadym commented Oct 16, 2024

RamSaw Oct 16, 2024

dvadym Oct 17, 2024

RamSaw Oct 16, 2024

dvadym Oct 17, 2024

RamSaw Oct 16, 2024

dvadym Oct 17, 2024

RamSaw Oct 16, 2024

dvadym Oct 17, 2024

dvadym left a comment

dvadym Oct 17, 2024

dvadym Oct 17, 2024

dvadym Oct 17, 2024

dvadym Oct 17, 2024

Create error estimator for SUM #528

Create error estimator for SUM #528

Conversation

dvadym commented Oct 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dvadym left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment