-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML challenge inspired from the aggregate reporting API proposal #137
Comments
(I apologize for the repost) For the challenge could you please provide us for an order of magnitude of the epsilon for us to work with? If you have an idea on the noise distribution too (Gaussian? Laplacian?) it would be great! We understand very well that this number might change (and we hope the challenge will help us inform the choice of epsilon!) but it would really help to have an order of magnitude to set up the challenge. Many thanks! |
Thanks Basile! I'll work on trying to get this information to you soon. |
Hello @BasileLeparmentier, Thank you for realizing this challenge, we will surely learn a lot and prepare for the future of click / sales optimization ! |
Hi, Yes all of this will be described in the challenge! |
Dear @csharrison , Sorry to bother you, but if we could get the value of the epsilon today, it would really be great. We are launching the challenge Monday and if we want to be able to do checks, if we get the epsilon later than today, it will be very hard for us. many thanks, |
Hey Basile, yep planning on sending you some information by today, I just need to get a few final sign-offs. Sorry for the delay. |
Thx! |
Hey Basile, here are some parameters for you to work with.
Aggregation functionThere are a few choices of aggregation functions you can use. Any of these could be acceptable but offer different trade-offs.
I’m happy to go into more detail on any of this. Note that we are planning on publishing a doc with more details on how this relates to the attribution API, so stay tuned. In that doc we are planning on suggesting option (3) for its flexibility and compatibility with MPC, but for the purposes of this challenge I think any of these are fine. Note: Training models using aggregate data may be easier if you allow on-device model training / Federated Learning. We haven’t elaborated on how to do this but we should discuss it if you want to do any follow-up competitions in this space, since it may be something the Privacy Sandbox could support. |
Hi Charlie, Many thanks for your answer! Is it important for you to stick with the Laplacian noise? As it is a challenge, there are a lot of queries (around 200, which we would also expect in a real world setting, even probably more), we believe that using the gaussian mechanism with a delta of 10e-6 or 10e-8 would give better learning results with the same privacy guarantee. Any thought? Basile |
The Laplace noise fits better with our current design (details pending) where we wouldn't expect so many queries / one report to influence so many different outputs. However, if you already have all these aggregation dimensions fixed in your design I think it should be OK to use Gaussian noise. We aren't opposed to Gaussian noise on any principled level. |
FYI, published some more information about aggregate attribution reports here: |
Hi Charlie and others interested, The ML challenge is now live! For those who want to participate, please follow this link https://competitions.codalab.org/competitions/31485 |
Hi everyone,
(repost of issue csharrison/aggregate-reporting-api#21 as apparently it was not the right repo for it)
We are delighted to announce that we will be organising a challenge with adKDD inspired by the aggregate measurement API, tackling the optimisation use case.
Criteo will provide a dataset and some prize money and let researchers and data scientists from around the world compete on how to learn performing bidding models from differential private reports. Link to the challenge here.
We will be happy to work with the Chrome team to set the appropriate parameter to the differential privacy function, to be as close as possible to what real-life operations could look like (e.g. an epsilon level would be beneficial).
We hope to kickstart the challenge in early May, so if you are interested in solving the “optimisation” use case using the aggregate reporting API, please do participate!
Best,
Basile on behalf of the Criteo team
The text was updated successfully, but these errors were encountered: