Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add an introduction readme #6

Merged
merged 1 commit into from
Oct 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,37 @@
# CNAI Model Format Specification
# CNAI Model Specification Proposal

[![LICENSE](https://img.shields.io/github/license/CloudNativeAI/model-spec.svg?style=flat-square)](https://github.com/CloudNativeAI/model-spec/blob/main/LICENSE)
[![GoDoc](https://godoc.org/github.com/CloudNativeAI/model-spec?status.svg)](https://godoc.org/github.com/CloudNativeAI/model-spec)

The Cloud Native Artifacial Intelegence(CNAI) Model Format Specification is a specification for a model format that is designed to be used in cloud native environments.
The Cloud Native Artifacial Intelegence(CNAI) Model Specification aims to provide a standard way to package, distribute and run AI models in a cloud native environment.

For details, see the [specification](docs/v1/spec.md).
## Rationale

Looking back in history, there are clear trends in the evolution of infrastructure. At first, there is the machine centric infrastructure age. GNU/Linux was born there and we saw a boom of Linux distributions then. Then comes the Virtual Machine centric infrastructure age, where we saw the rise of cloud computing and the development of virtualization technologies. The third age is the container centric infrastructure, and we saw the rise of container technologies like Docker and Kubernetes. The fourth age, which has just begun, is the AI model centric infrastructure age, where we will see a burst of technologies and projects around AI model development and deployment.

![img](docs/img/infra-trends.png)

Each of the new ages has brought new technologies and new ways of thinking. The container centric infrastructure has brought us the OCI image specification, which has become the standard for packaging and distributing software. The AI model centric infrastructure will bring us new ways of packaging and distributing AI models. The model specification is an attempt to define a standard to help package, distribute and run AI models in a cloud native environment.

## Current Work

There are two versions of specifications proposed, both of which are under development:

* v1: The first version of the specification, provides a compatible way to package and distribute models based on the current [OCI image specification](https://github.com/opencontainers/image-spec/) and [the artifacts guidelines](https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidelines-for-artifact-usage). For compatibility reasons, it only contains part of the model metadata, and handles model artifacts as opaque binaries. However, it provides a convient way to package AI models in the container image format and can be used as [OCI volume sources](https://github.com/kubernetes/enhancements/issues/4639) in Kubernetes environments.
* v2: The second version of the specification, in a pretty early stage, includes a model image specification and a model runtime specification. The model image specification packages models with details like model artifacts, metadata, configuration, and runtime environment. The model runtime specification defines how to run the packaged models in a cloud native environment. It builds a foundation for promoting AI models as a first-class citizen in the cloud native ecosystem, and let users build once and run anywhere.

We consider the two versions incremental steps toward a standard model specification. The v1 specification is a simple and compatible way to package AI models in the container image format, while the v2 specification is a more comprehensive and cloud native way to package, distribute, and run AI models.

For details, please see [the v1 specification](docs/v1/spec.md) and [the v2 specification introduction](docs/v2/intro.md).

## LICENSE

Apache 2.0 License. Please see [LICENSE](LICENSE) for more information.

## Contributing

Any feedback, suggestions, and contributions are welcome. Please feel free to open an issue or pull request.

Especially, we look forward to integrating the model specification with different model registry implementations (like [Harbor](https://goharbor.io/) and [Kubeflow model registry](https://www.kubeflow.org/docs/components/model-registry/overview/)), as well as existing model centric infrastructure projects like [Kubeflow](https://www.kubeflow.org/), [ollama](https://github.com/ollama/ollama), [Huggingface](https://huggingface.co/), [Lepton](https://www.lepton.ai/), and others.

Enjoy!
Binary file added docs/img/infra-trends.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
10 changes: 5 additions & 5 deletions docs/v1/spec.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Model Specification
# Model Specification Version 1

This specification defines an open standard Artifacial Intelegence model, which is based on the [Image Format Specification](https://github.com/opencontainers/image-spec/blob/main/spec.md#image-format-specification).
The specification defines an open standard Artifacial Intelegence model. It is defined through the artifact extension based on [the OCI image specification](https://github.com/opencontainers/image-spec/blob/main/spec.md#image-format-specification), and extends model features through `artifactType` and `annotations`. Model storage and distribution can be optimized based on artifact extension.

The goal of this specification is to package models in an OCI artifact to take advantage of OCI distribution and ensure efficient model deployment.

Expand All @@ -19,7 +19,7 @@ Therefore, the model specification must be defined through the artifact extensio

The model specification is defined through the artifact extension based on the OCI image specification, and extend model features through `artifactType` and `annotations`. Model storage and distribution can be optimized based on artifact extension.

![manifest](../../img/v1/manifest.svg)
![manifest](../img/v1/manifest.svg)

## Workflow

Expand All @@ -31,13 +31,13 @@ Use tools(ORAS, Ollama, etc.) to build required resources in the model repositor

Next push the artifact to the OCI registry(Harbor, Docker Hub, etc.), and use the functionalities of the OCI registry to manage the model artifact.

![build-push](../../img/v1/build-and-push.png)
![build-push](../img/v1/build-and-push.png)

### PULL & SERVE

The container runtime(containerd, cri-o, etc) pulls the model artifact from the OCI registry, and mounts the model artifact as a read-only volume. Therefore, distributed model can use the P2P technology(Dragonfly, Kraken, etc) to reduce the pressure on the registry and preheat the model artifact into each node. If the model artifact is already present on the node, the container runtime can reuse the model artifact to mount different containers in the same node.

![pull-serve](../../img/v1/pull-and-serve.png)
![pull-serve](../img/v1/pull-and-serve.png)

## Understanding the Specification

Expand Down
40 changes: 40 additions & 0 deletions docs/v2/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Model Specification Version 2

## Overview

The core of the v2 model specification is the definition of the model artifact, metadata and runtime environment.

The model artifact is a collection of files that represent the AI model. It consists of the model configuration, model weights, model tokenizer, and other model resources.

The model metadata is general information about the model, such as the model name, version, model family, description, author, license, and architecture. A model registry can parse the model metadata to display the model information.

The model runtime environment is the environment in which the model runs. It includes the inference engine information, such as verion, configuration, dependencies, and environment variables.

The model artifact, metadata and runtime environment are organized in a model manifest, which is a JSON file that describes the model. The model manifest is used to package and distribute the model, and can be stored in a model registry and downloaded by a model runtime.

With a proper defined model specification, we can package AI models of a model repository into a model image, and push the model image to the model registry. The model image can be pulled and run by the model runtime, either as a standalone package or as a readonly volume source in a container.

## Goals

The goals of developing the model specification are:

* To provide a way for developers to package and distribute AI models in a cloud native environment.
* To promote AI models as a first-class citizen and pave the way for the infrastructure to be organized around AI models.
* To define general model artifact, metadata, and runtime environment, so that the model can be easily understood and managed by any components of the infrastructure.
* To define a general model format description to allow easy integration of models with model runtimes.

## Non-Goals

* To build standard interfaces for model management tools to build, distribute, manage, and run AI models.

The model specification is designed to be a foundation for building standard interfaces to build, distribute, manage, and run AI models. But the model specification itself does not define such standard interfaces.

## Plans

The model specification is still pretty rough. It is a living document and will evolve over time. Future work includes:

* Figure out the details of AI model artifact, metadata, and runtime environment.
* Define a general transformer architecture abstraction to support build once and run everywhere of LLMs.
* Develop tools to build and save AI models in a model registry.
* Develop tools to pull and run AI models in a model runtime.
* Modify [vllm](https://github.com/vllm-project/vllm) to support the model specification and run any transformer architecture LLMs without modification.