Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting the ML Model extension #13

Closed
rbavery opened this issue Sep 5, 2023 · 32 comments · May be fixed by #16
Closed

Revisiting the ML Model extension #13

rbavery opened this issue Sep 5, 2023 · 32 comments · May be fixed by #16

Comments

@rbavery
Copy link
Collaborator

rbavery commented Sep 5, 2023

I'm opening this ticket to gauge interest in the ML Model extension and see if others find value in updating this extension given many recent developments in ML, namely:

  • a greater diversity of tasks, learning approaches, and asset types associated with ML models
  • a greater diversity of compute architectures used to train and run inference (AMD, TPU, NVIDIA GPU, Apple Metal).
  • Is "OS" a relevant field given this extension uses Docker?
  • The proliferation of ML models in the earth observation industry and we currently don't have a popular metadata standard for indexing and search of geospatial ML models

Are there plans to revamp and maintain something like this in the future as a part of the transition from Radiant Earth MLHub > Source Cooperative? @kbgg

@kbgg
Copy link
Member

kbgg commented Sep 5, 2023

Hey @rbavery, we don't have any plans to continue development of this extension but are more than happy to hand this over to you as a maintainer if you'd like!

@weiji14
Copy link
Collaborator

weiji14 commented Sep 6, 2023

Hi @kbgg, I'd be keen to help with maintenance on this STAC extension on behalf of DevSeed. We've discussed this internally that there is a lot of potential with this ml-model extension, and would love to take this into the next stage. If you could add my username @weiji14 to the repo, that would be great.

Also happy to share maintenance rights with other folks who are interested. I can mention this in the Pangeo ML Working Group meeting next month to see who else may be interested.

@rbavery
Copy link
Collaborator Author

rbavery commented Sep 6, 2023

I'm interested in helping with maintenance as well, my handle is @rbavery. I have some initial work on a pystac ml-model extension that I'd like to propose in conjunction with updates to this spec when it is ready.

@kbgg
Copy link
Member

kbgg commented Sep 6, 2023

I don't think I have permissions to add anyone to the repository, @m-mohr?

@HamedAlemo
Copy link
Collaborator

It's great to see growing interest in this!
It turns out I'm still the admin on this repo. I added you all, and @kbgg I also made you an admin.

@rbavery @weiji14 we will be happy to collaborate on this, but more on the user side to test new versions of the metadata for models.

@PondiB
Copy link

PondiB commented Nov 2, 2023

@rbavery and @weiji14, are there any ongoing behind-the-scenes conversations? @HamedAlemo, please include me as well, as I would like to help in maintenance as well, my handle is @PondiB .

@rbavery
Copy link
Collaborator Author

rbavery commented Nov 6, 2023

Hi @PondiB ! I've recently moved jobs and am focusing on all things geo ML at wherobots.ai . I may contribute back to this, but I'm not a part of any behind the scenes conversations right now.

To get conversations going in the open, I created a public slack channel in the Cloud Native Geo Foundation Slack that anyone is welcome to join: https://join.slack.com/t/cloudnativegeo/shared_invite/zt-235w8flfo-TW5Tpi1sPqQFWeMy~7ROHA

@PondiB
Copy link

PondiB commented Nov 6, 2023

@rbavery, well noted, and thanks for the Slack channel link.

@rbavery
Copy link
Collaborator Author

rbavery commented Dec 6, 2023

Hi all, I spoke with @fmigneault-crim about another ml extension project he and others built: https://github.com/crim-ca/dlm-extension

it's farther along and more up to date than this current repo. I suggest we archive this repo in favor of the https://github.com/crim-ca/dlm-extension repo . If https://github.com/crim-ca/dlm-extension gains some more adoption a next step would be to move it ti the stac-extensions org. I think the next Maturity threshold is 3 organizations using it. I'm planning to build a validation library with Pydantic v2 for the DLM extension and use it to track models at Wherobots.ai.

@m-mohr
Copy link
Contributor

m-mohr commented Dec 6, 2023

Can the other repo enable the issue tracker? I have a couple of comments ;)

@PondiB
Copy link

PondiB commented Dec 6, 2023

@rbavery, I appreciate the reference to the repo currently under development by @fmigneault-crim and the team. I will look into it in detail. However, I have a question regarding its primary focus on "Deep Learning". Is the strategic initiative to extend its scope to encompass pixel-based machine learning models? If yes, then I think it should be renamed.

@fmigneault
Copy link
Collaborator

@m-mohr done

@fmigneault
Copy link
Collaborator

@PondiB
The definitions are made to allow pixel-based ML models (I would like further validation with actual use cases however, if some can be provided to add any missing/relevant fields). The definitions are generic to allow other model variations as well, such as ROI classification/detection. The "Deep Learning" name was chosen to avoid confusion with ML-model (https://github.com/stac-extensions/ml-model) which was already taken.

@m-mohr
Copy link
Contributor

m-mohr commented Dec 6, 2023

@fmigneault Would it make sense to copy over the DLM extension content into the ml-model repository and release it as a 2.0.0 eventually?

@fmigneault
Copy link
Collaborator

That could make sense.
Is there any plan to keep using ml-model by other projects?
I have not evaluated if everything in current ml-model can be entirely ported to dlm definitions.

@HamedAlemo
Copy link
Collaborator

I agree with replacing the existing version with the DLM extension and maybe using the same name ml-model so it's generic and inclusive. DLM as it is now is more up to date for sure. I don't think any organization uses ml-model actively now. We generated this for models that were hosted on Radiant MLHub but with the transition to Source Cooperative no models are cataloged anymore.
cc @kbgg .

@devisperessutti
Copy link

We are very interested in this discussion and hope we can contribute to take any extension further.

We reviewed the currently available STAC-ML specifications to compare them and find main strengths/limitations. Report available here. We also have found the DLM to be more up-to-date and complete, but had doubts about its generalization to any machine learning method, rather than deep learning.

We have created two STAC items, one for a DL and one for a LightGBM model (as we couldn't find complete examples). For the pixel-based GBM, the final_layer_size required field in dlm:outputs doesn't apply, while most of the other operators could be applied (although the dlm:architecture and dlm:inputs required operators might be too strict).

@rbavery
Copy link
Collaborator Author

rbavery commented Jan 17, 2024

That's great feedback @devisperessutti thanks! I've worked on revamping the DLM to be more general and not bake in required fields that are particular to deep learning or a specific ML framework.

It still has marked optional fields that cater to the deep learning/computer vision community since I expect they will need fields similar to final_layer_size.

Any feedback on this Pr or the associated hackmd doc would be super valuable to pushing this extension forward: crim-ca/dlm-extension#2
https://hackmd.io/DBRF1sQCS1WmSqygJNKQJQ?view

Right now we're seeking comment and looking to resolve issues around how this extension should be referenced, how nested should extension objects be, and other comments.

Once a critical mass of folks are aligned on the extension, we can bring it into this repository and highlight many community examples.

@fmariv
Copy link

fmariv commented Feb 1, 2024

From @earthpulse we are also very interested in this discussion and we would be more than happy to get involved in the project and contribute!

To give context: we are in the consortium that develops and maintains the EOTDL, where anyone can create, share and use training datasets for EO ML applications. We have adopted STAC as our core specification and already have worked with it (example, ml-dataset extension), and were thinking about developing a new STAC extension for ML-models. Nonetheless, it has seemed better and more viable to help and contribute to the extensions that already exist such as DLM. We have found it is up-to-date and quite complete, but we also have doubts about its generalization to any machine learning method, rather than deep learning, which is what we really need. There are some missing features and elements we'd need to align the extension with our approach, and we'll be glad to discuss it further.

So, we are aligned with the development of DLM and the substitution of ml-model (perhaps changing the name, to avoid confusion?), and will be glad to contribute. Please @rbavery reach us to start contributing!

cc. @juansensio (CTO EarthPulse)

@fmigneault
Copy link
Collaborator

@fmariv
Good to hear more users are interested in the project.
If you could provide feedback directly on the work @rbavery already started in crim-ca/dlm-extension#2, that would help us understand the current irritants about DLM that should be adjusted to help generalization with other ML algorithms.

@rbavery
Copy link
Collaborator Author

rbavery commented Feb 1, 2024

ditto what @fmigneault said! You can comment directly in this markdown document if you prefer, or a code review on the PR would also be great.

I'm also down to discuss this extension on a video meet if it helps incorporate feedback quicker and advance this as a community standard. Feel free to book a 30-min meet on my calendar here: https://calendly.com/ryan-at-wherobots/30min

substitution of ml-model (perhaps changing the name, to avoid confusion?)

I'm open to changing the name. Currently we have named the extension the Machine Learning Model Extension in crim-ca/dlm-extension#2 and we were thinking we would release it as version 2 in this repository once we have some examples ready and incorporate more community feedback.

@rbavery
Copy link
Collaborator Author

rbavery commented Feb 6, 2024

I met with @fmariv, and put together a roadmap for v2: crim-ca/dlm-extension#7

I think once these items are complete this is good or close to good to be merged to this repo! Feel free to comment or add issues at https://github.com/crim-ca/dlm-extension/issues

as a reminder, this markdown document is the up to date doc of the schema and it is open for comment. you can ping me on github here or in the Cloud Native Geo slack channel ml-stac and I'll review and respond to comments.

@rbavery
Copy link
Collaborator Author

rbavery commented Feb 29, 2024

Hi all, we're close to wrapping up version 2 of this schema and an accompanying library to generate STAC metadata. Could I be made owner of this repo so that I can add @fmigneault and update other repo settings? Not sure who has the power to grant this!

Also, we will be giving a short presentation at the next STAC Community Meeting to introduce the new ML Model Extension if folks are interested in learning how to document their models and associate them with other STAC objects.

Join info below, everyone is welcome to come ask questions and give feedback. We might do some more focused session on the ML Model related extensions in the future if there's interest:

STAC Community Meetup
Monday, March 11 · 8:00 – 9:00am
Time zone: America/Los_Angeles
Google Meet joining info
Video call link: https://meet.google.com/gma-vujm-sbi
Or dial: ‪(US) +1 252-986-3093‬ PIN: ‪785 785 181‬#
More phone numbers: https://tel.meet/gma-vujm-sbi?pin=7110281050917

@m-mohr
Copy link
Contributor

m-mohr commented Feb 29, 2024

Upgraded you to Admin @rbavery

@PondiB The updates might be of interest to you.

@PondiB
Copy link

PondiB commented Mar 1, 2024

@PondiB The updates might be of interest to you.

Thanks for the tag. I am on vacation till April. I will try to attend the Meeting on 11th March.

@fmigneault
Copy link
Collaborator

I invite anyone working with ML and annotations in STAC to show interest to this: stac-utils/pystac#1313

@fmigneault
Copy link
Collaborator

Hi everyone.
The long-running PR (crim-ca/dlm-extension#2) for the new Machine Learning Model (MLM) extension is now merged!

Multiple STAC Item examples (https://github.com/crim-ca/dlm-extension/tree/main/examples) are provided with validation against the MLM schema (https://github.com/crim-ca/dlm-extension/blob/main/json-schema/schema.json) while making use of other STAC extensions at the same time.

A pydantic+pystac compatible tool is available here: https://github.com/crim-ca/dlm-extension/tree/main/stac_model
(see repo root for pyproject.toml, etc. for installation)

If you are interested in providing more examples (or getting precisions about provided examples), let me know though issues.

@rbavery
Copy link
Collaborator Author

rbavery commented Apr 18, 2024

I'll publish a release for https://pypi.org/project/stac-model and add you as a co-maintainer @fmigneault if that sounds good!

I can move everything from dlm-extension with a slightly updated README to reflect the new home of the extension. is now a good time to do that?

@fmigneault
Copy link
Collaborator

@rbavery
Yes, you can release a version for stac-model and add me to the co-maintainers.
For the move, I think a fork under stac-extensions would work? I think it is something to discuss during the next community meeting since we'll have to propose deprecating ml-model at the same time.

@rbavery
Copy link
Collaborator Author

rbavery commented Apr 18, 2024

Down to discuss!

We (at Wherobots) would rather have the stac-extensions org host the canonical repo for the extension. This way no one org is seen to own the maintenance of the extension and like other stac-extension repos all the issues and discussions happen within the org's version of the repo. This might make it clearer to potential contributors and users that this extension has similar open maintenance and contribution practices like other stac_extensions than if the repo was forked.

@fmigneault
Copy link
Collaborator

Relevant PR: #16

@fmigneault
Copy link
Collaborator

I think most items discussed here have been addressed by https://github.com/stac-extensions/mlm.
If any remains, a specific issue can be opened to discuss it in more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants