Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Allow adding additional information to metadata on model upload #184

Closed
sourcehawk opened this issue Jun 11, 2022 · 4 comments
Closed
Labels
enhancement New feature or request

Comments

@sourcehawk
Copy link

There is to my knowledge no straight forward way of retrieving additional data sent on model upload other than downloading the entire artifact and knowing the exact name of the file that it was stored in. It would be nice to be able to add additional information to the model metadata when uploading a new model in order to have direct access to any important information needed for further processing of models.

This could be an optional parameter to the upload method which provides an easy way to add something to the metadata. This could accept a python dictionary and would then be placed in the metadata under a specific key such as "extra".

Use case

# Custom information that a user wants to have available as metadata when calling `get_model_info`
important_info = {
    'required_columns': ["yay", "nay"],
    'data_transforms': ["std", "mean"],
    'training_data_marker': {
        'index_column': 'some_id',
        'index_value': 'some_value',
    },
    'replication_storage_information': {
        "actual_creation_date": "2021-11-23T10:10:23",
        "archived_date": "2022-1-14T12:14:23",
    }
}

metadata = model_store.upload(
       domain="my-domain", 
       state_name="archived", 
       model=lr_model, 
       extra_metadata=important_info
)

print(metadata)
>> 
{
    'model': {
        'domain': {...}, 
        'data': {...}, 
        'storage': {...},
        'code': {...}, 
        'git': {...}, 
        'extra': {
            'required_columns': ["yay", "nay"],
            'data_transforms': ["std", "mean"],
            'training_data_marker': {
                'index_column': 'some_id',
                'index_value': 'some_value',
            },
            'replication_storage_information': {
                "actual_creation_date": "2021-11-23T10:10:23",
                "archived_date": "2022-1-14T12:14:23",
            }
        }
    }
}

The extra parameter would have to be validated which could be done by checking whether the object is json serializable in the update method

if extra_metadata:
    try:
        json.dumps(extra_metadata)
    except Exception:
       raise ValueError("extra_metadata field must be json serializable")

The value of the field could be defaulted to an empty dict i.e 'extra': {} and should not break any existing functionality.

Any opinions on this?

@nlathia
Copy link
Contributor

nlathia commented Jun 12, 2022

Great idea! I've wanted to do this for some time, and you suggesting it might just be the motivation I needed 😄

I'm currently in the middle of moving the meta data implementation to use dataclasses:

Once that is done, I can definitely add this in and bundle it all together for the next release 🙌

@nlathia nlathia added the enhancement New feature or request label Jun 18, 2022
@nlathia
Copy link
Contributor

nlathia commented Jun 18, 2022

👋 @hauks96 I've now added this in, and so it will go out with the next release. Thank you for the suggestion, and if you have any more ideas feel free to open more issues or reach out to me directly!

@sourcehawk
Copy link
Author

@nlathia Brilliant! Thank you so much, looking forward to use it 😄

@nlathia
Copy link
Contributor

nlathia commented Sep 8, 2022

✅ This was released as part of modelstore==0.0.75

Let me know if you see any other issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants