Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ModelCard infrastructure (including implementation and testing) #243

Merged
merged 32 commits into from
Jul 12, 2022
Merged

Add ModelCard infrastructure (including implementation and testing) #243

merged 32 commits into from
Jul 12, 2022

Conversation

rmahinpei
Copy link
Contributor

Signed-off-by: Romina Mahinpei mahinpei@student.ubc.ca

Description & Motivation

  • Implementing & testing of ModelCard
  • A new feature enabling users to build an object that contains info about their trained model (info obtained via the model's built-in provenance & user-input)
  • Aim of feature is to allow more transparent model reporting

Paper reference

Model Cards for Model Reporting

Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jun 19, 2022
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, but there's a number of things we want to sort out.

First, all the new files need a license header, similar to the ones in other Tribuo source files.

Second, the output-json folder shouldn't be checked in, as far as I can tell it isn't used to validate output.

Third any data files that are necessary should be in src/test/resources rather than src/test/input-data. That way Maven will know to put them in the classpath of the test classes. We also can't distribute real data files, so let's come up with alternatives where you've used them.

Fourth, the formatting should be consistent, so that's using curly braces on single statements and no one line ifs or fors.

I've made more specific comments on other issues, so we can discuss them in those threads.

Interop/ModelCard/pom.xml Outdated Show resolved Hide resolved
Interop/ModelCard/pom.xml Outdated Show resolved Hide resolved
Interop/ModelCard/src/main/java/ModelCard.java Outdated Show resolved Hide resolved
Interop/ModelCard/src/main/java/ModelCard.java Outdated Show resolved Hide resolved
Interop/ModelCard/src/main/java/ModelDetails.java Outdated Show resolved Hide resolved
Interop/ModelCard/src/test/java/NativeModelsTest.java Outdated Show resolved Hide resolved
Interop/ModelCard/src/test/java/NativeModelsTest.java Outdated Show resolved Hide resolved
Interop/ModelCard/src/test/java/NativeModelsTest.java Outdated Show resolved Hide resolved
Interop/pom.xml Outdated Show resolved Hide resolved
Interop/ModelCard/src/test/java/EnsembleModelsTest.java Outdated Show resolved Hide resolved
rmahinpei added 20 commits June 24, 2022 21:44
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
@rmahinpei
Copy link
Contributor Author

I made edits for most of the suggested changes - I just had a few specific questions about some of the comments (which I've still left as unresolved above).

I was also curious about the PR failures. I'm still not entirely sure what's causing them (when I run mvn clean package locally, the build is always successful). Is there anything I can look into to get that fixed?

Another thing that I was recently thinking about is the precondition I currently have for UsageDetails component of ModelCard. In my head, I always assumed that a user would first build their model & the ModelCard, then save the ModelCard to file, and then afterwards run the CLI to update the UsageDetails component. As a result, my implementation for saveUsageDetails(CommandInterpreter ci, File destinationFile) assumes that destinationFile corresponds to a Json representation of a ModelCard and attempts to look for the other components that a ModelCard has before appending UsageDetails. I was wondering whether I should also support the scenario where a user first saves the UsageDetails (i.e. to an empty file) before making their model and ModelCard? I'd have to change parts of the code for when a user saves the ModelCard to file for the first time, but it's definitely doable.

Thanks for all the help!

@Craigacp
Copy link
Member

Another thing that I was recently thinking about is the precondition I currently have for UsageDetails component of ModelCard. In my head, I always assumed that a user would first build their model & the ModelCard, then save the ModelCard to file, and then afterwards run the CLI to update the UsageDetails component. As a result, my implementation for saveUsageDetails(CommandInterpreter ci, File destinationFile) assumes that destinationFile corresponds to a Json representation of a ModelCard and attempts to look for the other components that a ModelCard has before appending UsageDetails. I was wondering whether I should also support the scenario where a user first saves the UsageDetails (i.e. to an empty file) before making their model and ModelCard? I'd have to change parts of the code for when a user saves the ModelCard to file for the first time, but it's definitely doable.

I thought I'd mentioned this in the review but I must have forgotten, I think the UsageDetails should be split into two classes, one something record-like (like the other details classes), and one called ModelCardCLI. Combining the two of them into one class makes it a little counter intuitive for users who want to operate on model cards programmatically as it's got a bunch of weird methods that they can't call which are designed for CLI usage.

As for the save method, I think it'll become clearer once the've been split in two. At that point we can have the ModelCardCLI have a load method in which loads in the partial model card, and then the save method just writes the full card out to the supplied destination. I think eventually it would be nice to have the ModelCardCLI load a serialized model in and then auto-generate the model card, but that does make it tricky to include model evaluations as those don't currently have a serialized format and so can't be loaded in.

@rmahinpei
Copy link
Contributor Author

Yes, that makes sense - I have right now combined too many things in UsageDetails and splitting would be better.

I can have UsageDetails be more like the other detail classes where the user can programatically add some usage-related fields that they would roughly know even before training their model (e.g. intended-use, intended-users, primary-contact) and have ModelCardCLI for including fields that would be better known at the very end of the process (e.g. out-of-scope-uses, considerations-list, relevant-factors-list, resources-list, model-citation, model-license).

If I am to follow this design, would it make more sense for ModelCardCLI to just append the extra details to the UsageDetails section of a given ModelCard file or would it better to append them in a completely separate section? I personally think that having a separate section might be better (one section would more align with pre-build usage detail and the other with post-build usage details), but I can also see benefits to the other approach as well. I'd appreciate any input on this, and once we're confirmed on the design, I can go ahead and make the changes.

@Craigacp
Copy link
Member

I think it might be easier to have the ModelCardCLI write into some kind of UsageDetailsBuilder (using the Builder design pattern) which then gets converted into a UsageDetails as the model card is saved out. That way the UsageDetails can be immutable like the rest of the details classes.

Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
@rmahinpei
Copy link
Contributor Author

Got it - that makes sense. I'll start separating UsageDetails and ModelCardCLI (my plan is to have it done by the end of Fri, Jul 1). Then, I'll work around addMetric(...) in TestingDetails so that the class can be immutable (shouldn't take me too long either). Other than that, I know you also mentioned turning all the classes within ModelCard into records, and I was wondering whether this is something I should go ahead and do?

@Craigacp
Copy link
Member

Craigacp commented Jul 1, 2022

Moving things to records is lower down the priority list than the other items as I'm still thinking about if we want to depend on the model card code elsewhere in Tribuo (e.g. in ONNX export) in which case we might have to move it down to Java 8. Or decide to punt on adding model cards to ONNX exported models till the next major version when we bump everything to Java 17. The current set of Java 17 features this PR uses are easy to translate into Java 8, but records would be a bigger change.

rmahinpei added 3 commits July 1, 2022 18:33
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
…pecified

Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
rmahinpei added 5 commits July 6, 2022 17:43
… files

Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
…ent model cards

Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Signed-off-by: Romina Mahinpei <mahinpei@student.ubc.ca>
Copy link
Member

@Craigacp Craigacp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Romina, this looks great.

@Craigacp Craigacp merged commit 1a4f6e8 into oracle:main Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants