-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agenda Request - what is the right scope for functionality that should be supported in the first iteration of the joint measurement effort? #56
Comments
Thanks for filing this @marianapr. I am generally OK with scoping to an as-simple-as-possible MVP, but I would prefer we pick an overall architecture which allows for the flexibility we think we need to support future use-cases (e.g. ML training, etc.). All else equal, I do think it would be beneficial if we could take advantage of all the work on PPM / DAP / VDAFs happening in IETF, but I think if it conflicts with the longer term goals I think it's OK to take a step back and re-evaluate the needs of the system rather than constraining ourselves to that work. |
If you are interested in exploring Federated Learning (Horizontal and Vertical), multi-task learning or split learning we can help defining the architecture for on device training and inference. |
By "overall architecture" do you mean API? Or do you mean something like server architecture? Regardless, it seems to me that we've heard two substantive questions:
I try to come at both of these from the perspective of what we know how to do. I.e., we should target as MVP something we most know how build. Based on the discussion last time, I think that rules out optimization, which seems like an open problem in a pretty substantial way. If there are things we know will rule it out that don't otherwise make things better, let's not do those, but otherwise I think we should defer it. WRT cross-device conversion, my sense is that it's somewhere between a nice to have and a very important depending on who you talk to. This brings us to whether it significantly complicates things to implement it. To which the answer is.... maybe? |
I mean the high-level architecture of the API, e.g. whether attribution happens on device or not, or whether API choices fix a particular query pattern or allow dynamic queries.
I think that is a little bit reductive. I think there are good results from the Criteo AdKDD challenge that show that a fairly simple aggregate system can perform logistic regression. The techniques needed to achieve that result seem generally applicable to other reporting use-cases too (e.g. multi-query scenarios). I agree general purpose ML might be a bit out of scope though. |
@AramZS I saw that you put the discussion on this topic tonight. I know that today is a holiday for many of the companies in the US. I think this discussion only makes sense if we have a quorum of representatives. |
I know it's pretty late notice, but given that we only have two technical topics considered right now in the agenda, it might make sense to push them all on one day to have a better chance of getting quorum. |
+1 to what Charlie said, can we merge the two topics in one day - I am not sure the East Coast people can make two midnight meetings in the same week. |
Others of us managed to attend when the timing was bad for us personally. We all have to contend with the occasional bad hour or public holiday conflict. I'm personally not very happy with the late setting of the agenda for this particular meeting. Receiving confirmations the day of the meeting has meant that I'm a little behind. I might be ready in time for the second session, but it will be a push to get it done with less than 7 hours notice for the first. I understand if @marianapr has similar challenges; we've not had a lot of notice. But if the problem is that there has been a lack of preparation, I don't want that being used to marginalize people who are disadvantaged by the timing of other meetings. If the goal is to share the burden of meeting at awkward hours, cancellation works directly against that goal. Ideally, we would have an agenda one week before a meeting; the requests were submitted well in advance of that. Then lack of preparation time would not be an excuse to shorten those sessions that happen to be inconvenient for certain geographies. |
I agree that advance notice for preparation would be appreciated. Today is a holiday for me and I did not have much time to prepare, hence I will prefer putting the two topics that are overlapping quite a bit on Wednesday. |
My apologies I also was coming in off of time off and had an unexpected lack of internet access, this falls on me and shouldn't happen again. I do think we can move this to the 2nd session, especially since it seems that there is a feeling the two strongly overlap. That said, I think we can spend some time this first session to set up the discussion more effectively, especially since we intend to discuss scope. |
Hi all, At the end of our day-1 discussion on this topic, I promised to file two issues to help structure the conversation in day-2. I have now done so.
I'm looking forward to knocking down these strawmen together on day-2! I'm posting here to give everyone at least 24 hours to read and reflect on these before our next discussion. |
Here's a (somewhat ad-hoc) list of use-cases we can prioritize, and possibly reduce down if we can immediately reject some use cases as infeasible.
I think many of these are likely achievable in an MVP, and hopefully more in any extensions we want to add. |
I think breaking this down a bit might be nice. E.g.,
|
Let me edit my list @jpfeiffe , I completely agree, especially given that some of the simple optimizations should be compatible in a wide range of proposals. |
Presumably you want "offline <-> online" in both directions. And maybe add multi-valued (vector) outputs to the complicated queries. |
I vote to NOT shoot for the following things in the MVP
Unless it is much better specified so that we can engage with specific use-cases I'd also suggest we jettison
|
Participants: did we want to expand this conversation in the upcoming meeting? |
I think it will be great to hear from more participants what is a useful MVP from their point of view. I think the applications and the functionalities mentioned in the talks from the previous meeting and the discussions on the issues can be a good starting point. We can also follow what Charlie was suggesting to split the applications in categorized: absolutely required for an MVP to be useful, nice to have , and advanced capabilities. |
@marianapr I think that's reasonable. I'll add this to the Agenda at the top of day 2. |
I'm interested in hearing folks who would like to lead this discussion and potentially leaders around each suggested section to help guide the discussion:
|
I doubt we will be able to get a representative selection of API users for this next meeting. Rather than try to host this particular discussion live - some kind of a survey would probably give us better data about which features are most critical. |
@benjaminsavage I agree and I don't think this needs to be comprehensive but we can continue the conversation and maybe talk through what we want in such a survey? |
In the sessions dedicated to measurement use-cases deep dive and requirements Charlie presented several applications and their importance. We started a discussion what subset of these applications should be supported in the first design version of the measurement effort. Most of this discussion was focused on whether optimization should be a requirement. I also want to add here the question whether we should require support for attributions outside the user device, this will enable cross-device attribution but will come with complexities related to the joining. Is it worth starting with a solution based on the current instantiations of the PPM standardization effort in IETF (one advantage of this is that such solutions have been deployed and are still running in practice for other aggregation applications such as the Exposure Notifications Private Aggregation, https://github.com/google/exposure-notifications-android/blob/master/doc/enexpress-analytics-faq.md)
It will be great to continue this discussion here and get opinions from more people about what are the minimum functionality capabilities that will make the outcome of the first iteration of the measurement design useful for them.
The text was updated successfully, but these errors were encountered: