Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create error estimator for SUM #528

Merged
merged 2 commits into from
Oct 17, 2024
Merged

Conversation

dvadym
Copy link
Collaborator

@dvadym dvadym commented Oct 16, 2024

ErrorEstimator uses ContributionHistograms to estimate approximately RMSE error for different L0/Linf bounds. This PR allows to create ErrorEstimator for SUM (the different with COUNT, only different set of histograms)

@dvadym dvadym requested a review from RamSaw October 16, 2024 13:20
ratio_dropped_linf = self.get_ratio_dropped_linf(linf_bound)
ratio_dropped = 1 - (1 - ratio_dropped_l0) * (1 - ratio_dropped_linf)
stddev = self._get_stddev(l0_bound, linf_bound)
if to_print:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it for debugging only? should we remove it? just double checking :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. Removed

@@ -49,9 +49,18 @@ def _get_estimator(
epsilon: float = 2**0.5 / 2,
delta: Optional[float] = None,
):
return histogram_error_estimator.create_error_estimator(
return histogram_error_estimator.create_estimator_for_count_privacy_id_count(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename from _get_estimator to _get_estimator_for_count_and_privacy_id_count.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, done

@@ -121,6 +139,17 @@ def test_get_ratio_dropped_linf(self, linf_bound, expected):
self.assertAlmostEqual(estimator.get_ratio_dropped_linf(linf_bound),
expected)

@parameterized.parameters((0, 1), (0.5, 0.89), (1, 0.78), (2, 0.76),
(40, 0))
# there 1 is contribution 40 and 10 contribution 1.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is 1 contribution of 40 and 10 contributions of 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

(40, 0))
# there 1 is contribution 40 and 10 contribution 1.
# total contribution = 1*40+10*1 = 50
# when linf_bound = 0.5, left after contribution bounding 11*0.5=5.5, i.e.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how linf can be a double and not an integer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linf_bound is max contribution per partition, which means

max_contributions_per_partition for COUNT

max_sum_per_partition for SUM (which can be double)

Copy link
Collaborator Author

@dvadym dvadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review!

ratio_dropped_linf = self.get_ratio_dropped_linf(linf_bound)
ratio_dropped = 1 - (1 - ratio_dropped_l0) * (1 - ratio_dropped_linf)
stddev = self._get_stddev(l0_bound, linf_bound)
if to_print:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks. Removed

@@ -49,9 +49,18 @@ def _get_estimator(
epsilon: float = 2**0.5 / 2,
delta: Optional[float] = None,
):
return histogram_error_estimator.create_error_estimator(
return histogram_error_estimator.create_estimator_for_count_privacy_id_count(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx, done

@@ -121,6 +139,17 @@ def test_get_ratio_dropped_linf(self, linf_bound, expected):
self.assertAlmostEqual(estimator.get_ratio_dropped_linf(linf_bound),
expected)

@parameterized.parameters((0, 1), (0.5, 0.89), (1, 0.78), (2, 0.76),
(40, 0))
# there 1 is contribution 40 and 10 contribution 1.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

(40, 0))
# there 1 is contribution 40 and 10 contribution 1.
# total contribution = 1*40+10*1 = 50
# when linf_bound = 0.5, left after contribution bounding 11*0.5=5.5, i.e.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linf_bound is max contribution per partition, which means

max_contributions_per_partition for COUNT

max_sum_per_partition for SUM (which can be double)

@dvadym dvadym merged commit 3a7a0ff into OpenMined:main Oct 17, 2024
6 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants