-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DP Count combiner #144
DP Count combiner #144
Conversation
pipeline_dp/combiners.py
Outdated
def merge_accumulators(self, accumulator1: int, accumulator2: int): | ||
return accumulator1 + accumulator2 | ||
|
||
def compute_metrics(self, accumulator: int) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parent declaration uses type 'Accumulator', but here we say 'int'. Should we remove 'Accumulator' as the type from the parent class, and just say that any class can be used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've removed type declarations from base class, they are incorrect
self._params = params | ||
|
||
def create_accumulator(self, values) -> int: | ||
return len(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this return length of the array? Maybe we can add some docs to the parent method to document the expectation for the return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more comments on Combiner based class on how the Combiner framework works and on CountCombiner class. Please check whether it becomes more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's much clearer, thank you!
pipeline_dp/combiners.py
Outdated
class CombinerParams: | ||
"""Parameters for an combiner. | ||
|
||
Wraps epsilon and delta from the MechanismSpec which are lazily loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm likely missing some context.. why is that lazily loaded, and where is this loading happens?
nit2: Maybe we can stay less technical in the class doc, and add more comments to the code. Maybe we can say here something like "wraps all the information needed by the combiner to do the differentially-private computation, e.g. privacy budget and mechanism.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for comments! PTAL
self._params = params | ||
|
||
def create_accumulator(self, values) -> int: | ||
return len(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more comments on Combiner based class on how the Combiner framework works and on CountCombiner class. Please check whether it becomes more clear
pipeline_dp/combiners.py
Outdated
def merge_accumulators(self, accumulator1: int, accumulator2: int): | ||
return accumulator1 + accumulator2 | ||
|
||
def compute_metrics(self, accumulator: int) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've removed type declarations from base class, they are incorrect
pipeline_dp/combiners.py
Outdated
@@ -9,6 +14,18 @@ class Combiner(abc.ABC): | |||
aggregation state. Combiners contain logic, while accumulators contain data. | |||
The API of combiners are inspired by Apache Beam CombineFn class. | |||
https://beam.apache.org/documentation/transforms/python/aggregation/combineperkey/#example-5-combining-with-a-combinefn | |||
|
|||
Let we have some dataset X to aggregate. The workflow of running an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Here's how PipelineDP uses combiners to performs an aggregation on some dataset X:
- ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
pipeline_dp/combiners.py
Outdated
|
||
Assumption: merge_accumulators is associative binary operation. | ||
|
||
The type of the accumulator is specific for each concrete Combiner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
The type of accumulator depends on the aggregation performed by this Combiner. For example, this can be a primitive type (e.g. int for a simple "count" aggregation) or a more complex structure (e.g. for "mean")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
self._params = params | ||
|
||
def create_accumulator(self, values) -> int: | ||
return len(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's much clearer, thank you!
No description provided.