Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DP Count combiner #144

Merged
merged 5 commits into from
Jan 17, 2022
Merged

DP Count combiner #144

merged 5 commits into from
Jan 17, 2022

Conversation

dvadym
Copy link
Collaborator

@dvadym dvadym commented Jan 17, 2022

No description provided.

pipeline_dp/combiners.py Outdated Show resolved Hide resolved
def merge_accumulators(self, accumulator1: int, accumulator2: int):
return accumulator1 + accumulator2

def compute_metrics(self, accumulator: int) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent declaration uses type 'Accumulator', but here we say 'int'. Should we remove 'Accumulator' as the type from the parent class, and just say that any class can be used here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've removed type declarations from base class, they are incorrect

self._params = params

def create_accumulator(self, values) -> int:
return len(values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return length of the array? Maybe we can add some docs to the parent method to document the expectation for the return type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more comments on Combiner based class on how the Combiner framework works and on CountCombiner class. Please check whether it becomes more clear

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's much clearer, thank you!

class CombinerParams:
"""Parameters for an combiner.

Wraps epsilon and delta from the MechanismSpec which are lazily loaded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm likely missing some context.. why is that lazily loaded, and where is this loading happens?

nit2: Maybe we can stay less technical in the class doc, and add more comments to the code. Maybe we can say here something like "wraps all the information needed by the combiner to do the differentially-private computation, e.g. privacy budget and mechanism.

WDYT?

Copy link
Collaborator Author

@dvadym dvadym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for comments! PTAL

self._params = params

def create_accumulator(self, values) -> int:
return len(values)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more comments on Combiner based class on how the Combiner framework works and on CountCombiner class. Please check whether it becomes more clear

def merge_accumulators(self, accumulator1: int, accumulator2: int):
return accumulator1 + accumulator2

def compute_metrics(self, accumulator: int) -> float:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've removed type declarations from base class, they are incorrect

@@ -9,6 +14,18 @@ class Combiner(abc.ABC):
aggregation state. Combiners contain logic, while accumulators contain data.
The API of combiners are inspired by Apache Beam CombineFn class.
https://beam.apache.org/documentation/transforms/python/aggregation/combineperkey/#example-5-combining-with-a-combinefn

Let we have some dataset X to aggregate. The workflow of running an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Here's how PipelineDP uses combiners to performs an aggregation on some dataset X:

  1. ....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


Assumption: merge_accumulators is associative binary operation.

The type of the accumulator is specific for each concrete Combiner.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

The type of accumulator depends on the aggregation performed by this Combiner. For example, this can be a primitive type (e.g. int for a simple "count" aggregation) or a more complex structure (e.g. for "mean")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

pipeline_dp/combiners.py Outdated Show resolved Hide resolved
self._params = params

def create_accumulator(self, values) -> int:
return len(values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's much clearer, thank you!

@dvadym dvadym merged commit 53e7a60 into main Jan 17, 2022
@delete-merged-branch delete-merged-branch bot deleted the concrete_combiners branch January 17, 2022 17:56
yuan2z pushed a commit to yuan2z/PipelineDP that referenced this pull request Jan 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants