Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

point to example MLM v1.2.0 model metadata and artifacts on huggingface #29

Merged
merged 6 commits into from
Jul 19, 2024
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 19 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[![hackmd-github-sync-badge](https://hackmd.io/N1cWyDM2S9eaAQtSvS0J_A/badge)](https://hackmd.io/N1cWyDM2S9eaAQtSvS0J_A?both)

- **Title:** Machine Learning Model Extension
- **Identifier:** [https://crim-ca.github.io/mlm-extension/v1.2.0/schema.json](https://crim-ca.github.io/mlm-extension/v1.2.0/schema.json)
- **Identifier:** <https://crim-ca.github.io/mlm-extension/v1.2.0/schema.json>
fmigneault marked this conversation as resolved.
Show resolved Hide resolved
- **Field Name Prefix:** mlm
- **Scope:** Collection, Item, Asset, Links
- **Extension Maturity Classification:** Pilot
Expand Down Expand Up @@ -74,17 +74,20 @@ learning approaches, regardless of whether the approach constitutes a deep neura
It also combines multiple definitions from the predecessor [ML-Model](https://github.com/stac-extensions/ml-model)
extension to synthesize common use cases into a single reference for Machine Learning Models.

For more details about the [`stac-model`](stac_model) Python package, which provides definitions of the MLM extension
For more details about the [`stac-model`](./stac_model) Python package, which provides definitions of the MLM extension
using both [`Pydantic`](https://docs.pydantic.dev/latest/) and [`PySTAC`](https://pystac.readthedocs.io/en/stable/)
connectors, please refer to the [STAC Model](README_STAC_MODEL.md) document.
connectors, please refer to the [STAC Model](./README_STAC_MODEL.md) document.

> :warning: <br>
> FIXME: update examples
> FIXME: update examples

- Examples:
- **Example with a ??? trained with torchgeo** :warning: TODO update example
- [Item examples](https://huggingface.co/wherobots/mlm-stac) for scene-classification,
object detection, and semantic segmentation: Shows real world use of the
extension for describing models run on
[WherobotsAI Raster Inference](https://wherobots.com/wherobotsai-for-raster-inference/)
- [Collection example](examples/collection.json): Shows the basic usage of the extension in a STAC Collection
- [JSON Schema](json-schema/schema.json) TODO update
- [JSON Schema](https://crim-ca.github.io/mlm-extension/)
- [Changelog](./CHANGELOG.md)

## Item Properties and Collection Fields
Expand All @@ -102,7 +105,7 @@ The fields in the table below can be used in these parts of STAC documents:
| Field Name | Type | Description |
|-----------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| mlm:name | string | **REQUIRED** A name for the model. This can include, but must be distinct, from simply naming the model architecture. If there is a publication or other published work related to the model, use the official name of the model. |
| mlm:architecture | [Model Architecture](#model-architecture) string | **REQUIRED** A generic and well established architecture name of the model. |
| mlm:architecture | [Model Architecture](#model-architecture) string | **REQUIRED** A generic and well established architecture name of the model. |
| mlm:tasks | \[[Task Enum](#task-enum)] | **REQUIRED** Specifies the Machine Learning tasks for which the model can be used for. If multi-tasks outputs are provided by distinct model heads, specify all available tasks under the main properties and specify respective tasks in each [Model Output Object](#model-output-object). |
| mlm:framework | string | Framework used to train the model (ex: PyTorch, TensorFlow). |
| mlm:framework_version | string | The `framework` library version. Some models require a specific version of the machine learning `framework` to run. |
Expand All @@ -114,7 +117,7 @@ The fields in the table below can be used in these parts of STAC documents:
| mlm:accelerator | [Accelerator Type Enum](#accelerator-type-enum) \| null | The intended computational hardware that runs inference. If undefined or set to `null` explicitly, the model does not require any specific accelerator. |
| mlm:accelerator_constrained | boolean | Indicates if the intended `accelerator` is the only `accelerator` that can run inference. If undefined, it should be assumed `false`. |
| mlm:accelerator_summary | string | A high level description of the `accelerator`, such as its specific generation, or other relevant inference details. |
| mlm:accelerator_count | integer | A minimum amount of `accelerator` instances required to run the model. |
| mlm:accelerator_count | integer | A minimum amount of `accelerator` instances required to run the model. |
| mlm:input | \[[Model Input Object](#model-input-object)] | **REQUIRED** Describes the transformation between the EO data and the model input. |
| mlm:output | \[[Model Output Object](#model-output-object)] | **REQUIRED** Describes each model output and how to interpret it. |
| mlm:hyperparameters | [Model Hyperparameters Object](#model-hyperparameters-object) | Additional hyperparameters relevant for the model. |
Expand Down Expand Up @@ -226,7 +229,7 @@ It is recommended to define `accelerator` with one of the following values:

> :warning: <br>
> If `mlm:accelerator = amd64`, this explicitly indicates that the model does not (and will not try to) use any
> accelerator, even if some are available from the runtime environment. This is to be distinguished from
> accelerator, even if some are available from the runtime environment. This is to be distinguished from
> the value `mlm:accelerator = null`, which means that the model *could* make use of some accelerators if provided,
> but is not constrained by any specific one. To improve comprehension by users, it is recommended that any model
> using `mlm:accelerator = amd64` also set explicitly `mlm:accelerator_constrained = true` to illustrate that the
Expand Down Expand Up @@ -258,7 +261,7 @@ what is the expected behavior of the model. It is therefore recommended to provi
#### Bands and Statistics

Depending on the supported `stac_version` and other `stac_extensions` employed by the STAC Item using MLM,
the [STAC 1.1 - Band Object][stac-1.1-band],
the [STAC 1.1 - Band Object][stac-1.1-band],
the [STAC Raster - Band Object][stac-raster-band] or
the [STAC EO - Band Object][stac-eo-band] can be used for
representing bands information, including notably the `nodata` value,
Expand Down Expand Up @@ -366,7 +369,7 @@ Below are some notable common names recommended for use, but others can be emplo
- `score`
- `confidence`

For example, a tensor of multiple RBG images represented as $B \times C \times H \times W$ should
For example, a tensor of multiple RBG images represented as $B \times C \times H \times W$ should
indicate `dim_order = ["batch", "channel", "height", "width"]`.

#### Normalize Enum
Expand Down Expand Up @@ -464,7 +467,7 @@ the following formats are recommended as alternative scripts and function refere

While only `tasks` is a required field, all fields are recommended for tasks that produce a fixed
shape tensor and have output classes. Outputs that have variable dimensions, can define the `result` with the
appropriate dimension value `-1` in the `shape` field. When the model does not produce specific classes, such
appropriate dimension value `-1` in the `shape` field. When the model does not produce specific classes, such
as for `regression`, `image-captioning`, `super-resolution` and some `generative` tasks, to name a few, the
`classification:classes` can be omitted.

Expand Down Expand Up @@ -513,7 +516,7 @@ Following is an example of what the hyperparameters definition could look like:
| mlm:training | [Asset Object][stac-asset] | **RECOMMENDED** Information to run the training pipeline of the model being described. |
| mlm:inference | [Asset Object][stac-asset] | **RECOMMENDED** Information to run the inference pipeline of the model being described. |

It is recommended that the [Assets][stac-asset] defined in a STAC Item using MLM extension use the above field property
It is recommended that the [Assets][stac-asset] defined in a STAC Item using MLM extension use the above field property
names for nesting the Assets in order to improve their quick identification, although the specific names employed are
left up to user preference. However, the MLM Asset definitions **MUST** include the
appropriate [MLM Asset Roles](#mlm-asset-roles) to ensure their discovery.
Expand All @@ -522,7 +525,7 @@ appropriate [MLM Asset Roles](#mlm-asset-roles) to ensure their discovery.

### MLM Asset Roles

Asset `roles` should include relevant names that describe them. This does not only include
Asset `roles` should include relevant names that describe them. This does not only include
the [Recommended Asset Roles](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-roles)
from the core specification, such as `data` or `metadata`, but also descriptors such as `mlm:model`, `mlm:weights` and
so on, as applicable for the relevant MLM Assets being described. Please refer to the following sections
Expand Down Expand Up @@ -713,8 +716,8 @@ for running tests are copied here for convenience.

### Running tests

The same checks that run as checks on PRs are part of the repository and can be run locally to verify that changes
are valid. To run tests locally, you'll need `npm`, which is a standard part of
The same checks that run as checks on PRs are part of the repository and can be run locally to verify that changes
are valid. To run tests locally, you'll need `npm`, which is a standard part of
any [node.js](https://nodejs.org/en/download/) installation.

First, install everything with npm once. Navigate to the root of this repository and on your command line run:
Expand Down
Loading