-
Notifications
You must be signed in to change notification settings - Fork 51
2017 09 12 Meeting Minutes
Attendees:
- Sebastian Benthall
- Nick Doty
- Harsh Gupta
Harsh has graduated and is working in Data Science.
Seb and Nick are still finishing their disserations.
Indian Supreme Court has said privacy is a fundamental right. CIS India was involved in that parliamentary committee.
Nick has been trying to understand demographics across a large number of mailing lists. Maybe can learn some things, starting with gender and then generalizing from there.
HG: If you do some research using automated tools, shouldn't we worry about how accurate it is, so people have a bound about how well it performs.
ND: Yes, working on getting a start for now. There's a large need for manual or crowdsourced identification, but automation may help speed it up.
HG: Can you explain in one line or two how it work?
ND: I'm using a library that uses birth records in different countries to draw a connection between names and gender. It has drawbacks that it is rather specific to some countries. These groups are more global. Also, it has a very high confidence threshold, so I am developing a workflow for importing in new names.
HG: We could also use WikiData for this. You can query WikiData for particular persons, attributes associated with them. Maybe that could improve accuracy.
ND: There's some work on GitHub commits, Google+, which has gender marked. Not sure whether it will make a big difference.
How do we reuse useful code?
Let's put more general features in the library and then put an illustration of it in examples.
HG: That puts more work on the developer side, because the developer doesn't always have the incentive to generalize the function. Rather than having most logic in the example itself, should break away the functionality into the library code.
ND: Jupyter notebook is good for documentation of research, the logic of it. Everything will have different needs. A lot of the benefit of BigBang is showing people are investigating this question.
ND: How do we manage growth of our personal scientific code and general use software. If we agree that there's a differentiation between the two, and agree to communicate early about the general code, that would be good.
ND: We discussed having a sort of chat thing. Should we have one?
HG: I recommend Gitter.
SB: I will set it up.
ND: What about documentation?
SB: Great. Let's say that's something we need for the next release and discuss it in our next meeting, as it's a larger discussion.