docs: add an introduction readme (#6)

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
CloudNativeAI · Oct 9, 2024 · 56c44da · 56c44da
1 parent 1b35930
commit 56c44da
Show file tree

Hide file tree

Showing 7 changed files with 73 additions and 8 deletions.
diff --git a/README.md b/README.md
@@ -1,12 +1,37 @@
-# CNAI Model Format Specification
+# CNAI Model Specification Proposal
 
 [![LICENSE](https://img.shields.io/github/license/CloudNativeAI/model-spec.svg?style=flat-square)](https://github.com/CloudNativeAI/model-spec/blob/main/LICENSE)
 [![GoDoc](https://godoc.org/github.com/CloudNativeAI/model-spec?status.svg)](https://godoc.org/github.com/CloudNativeAI/model-spec)
 
-The Cloud Native Artifacial Intelegence(CNAI) Model Format Specification is a specification for a model format that is designed to be used in cloud native environments.
+The Cloud Native Artifacial Intelegence(CNAI) Model Specification aims to provide a standard way to package, distribute and run AI models in a cloud native environment.
 
-For details, see the [specification](docs/v1/spec.md).
+## Rationale
+
+Looking back in history, there are clear trends in the evolution of infrastructure. At first, there is the machine centric infrastructure age. GNU/Linux was born there and we saw a boom of Linux distributions then. Then comes the Virtual Machine centric infrastructure age, where we saw the rise of cloud computing and the development of virtualization technologies. The third age is the container centric infrastructure, and we saw the rise of container technologies like Docker and Kubernetes. The fourth age, which has just begun, is the AI model centric infrastructure age, where we will see a burst of technologies and projects around AI model development and deployment.
+
+![img](docs/img/infra-trends.png)
+
+Each of the new ages has brought new technologies and new ways of thinking. The container centric infrastructure has brought us the OCI image specification, which has become the standard for packaging and distributing software. The AI model centric infrastructure will bring us new ways of packaging and distributing AI models. The model specification is an attempt to define a standard to help package, distribute and run AI models in a cloud native environment.
+
+## Current Work
+
+There are two versions of specifications proposed, both of which are under development:
+
+* v1: The first version of the specification, provides a compatible way to package and distribute models based on the current [OCI image specification](https://github.com/opencontainers/image-spec/) and [the artifacts guidelines](https://github.com/opencontainers/image-spec/blob/main/manifest.md#guidelines-for-artifact-usage). For compatibility reasons, it only contains part of the model metadata, and handles model artifacts as opaque binaries. However, it provides a convient way to package AI models in the container image format and can be used as [OCI volume sources](https://github.com/kubernetes/enhancements/issues/4639) in Kubernetes environments.
+* v2: The second version of the specification, in a pretty early stage, includes a model image specification and a model runtime specification. The model image specification packages models with details like model artifacts, metadata, configuration, and runtime environment. The model runtime specification defines how to run the packaged models in a cloud native environment. It builds a foundation for promoting AI models as a first-class citizen in the cloud native ecosystem, and let users build once and run anywhere.
+
+We consider the two versions incremental steps toward a standard model specification. The v1 specification is a simple and compatible way to package AI models in the container image format, while the v2 specification is a more comprehensive and cloud native way to package, distribute, and run AI models.
+
+For details, please see [the v1 specification](docs/v1/spec.md) and [the v2 specification introduction](docs/v2/intro.md).
 
 ## LICENSE
 
 Apache 2.0 License. Please see [LICENSE](LICENSE) for more information.
+
+## Contributing
+
+Any feedback, suggestions, and contributions are welcome. Please feel free to open an issue or pull request.
+
+Especially, we look forward to integrating the model specification with different model registry implementations (like [Harbor](https://goharbor.io/) and [Kubeflow model registry](https://www.kubeflow.org/docs/components/model-registry/overview/)), as well as existing model centric infrastructure projects like [Kubeflow](https://www.kubeflow.org/), [ollama](https://github.com/ollama/ollama), [Huggingface](https://huggingface.co/), [Lepton](https://www.lepton.ai/), and others.
+
+Enjoy!
diff --git a/docs/img/infra-trends.png b/docs/img/infra-trends.png
diff --git a/img/v1/build-and-push.png → docs/img/v1/build-and-push.png b/img/v1/build-and-push.png → docs/img/v1/build-and-push.png
diff --git a/img/v1/manifest.svg → docs/img/v1/manifest.svg b/img/v1/manifest.svg → docs/img/v1/manifest.svg
diff --git a/img/v1/pull-and-serve.png → docs/img/v1/pull-and-serve.png b/img/v1/pull-and-serve.png → docs/img/v1/pull-and-serve.png
diff --git a/docs/v1/spec.md b/docs/v1/spec.md
@@ -1,6 +1,6 @@
-# Model Specification
+# Model Specification Version 1
 
-This specification defines an open standard Artifacial Intelegence model, which is based on the [Image Format Specification](https://github.com/opencontainers/image-spec/blob/main/spec.md#image-format-specification).
+The specification defines an open standard Artifacial Intelegence model. It is defined through the artifact extension based on [the OCI image specification](https://github.com/opencontainers/image-spec/blob/main/spec.md#image-format-specification), and extends model features through `artifactType` and `annotations`. Model storage and distribution can be optimized based on artifact extension.
 
 The goal of this specification is to package models in an OCI artifact to take advantage of OCI distribution and ensure efficient model deployment.
 
@@ -19,7 +19,7 @@ Therefore, the model specification must be defined through the artifact extensio
 
 The model specification is defined through the artifact extension based on the OCI image specification, and extend model features through `artifactType` and `annotations`. Model storage and distribution can be optimized based on artifact extension.
 
-![manifest](../../img/v1/manifest.svg)
+![manifest](../img/v1/manifest.svg)
 
 ## Workflow
 
@@ -31,13 +31,13 @@ Use tools(ORAS, Ollama, etc.) to build required resources in the model repositor
 
 Next push the artifact to the OCI registry(Harbor, Docker Hub, etc.), and use the functionalities of the OCI registry to manage the model artifact.
 
-![build-push](../../img/v1/build-and-push.png)
+![build-push](../img/v1/build-and-push.png)
 
 ### PULL & SERVE
 
 The container runtime(containerd, cri-o, etc) pulls the model artifact from the OCI registry, and mounts the model artifact as a read-only volume. Therefore, distributed model can use the P2P technology(Dragonfly, Kraken, etc) to reduce the pressure on the registry and preheat the model artifact into each node. If the model artifact is already present on the node, the container runtime can reuse the model artifact to mount different containers in the same node.
 
-![pull-serve](../../img/v1/pull-and-serve.png)
+![pull-serve](../img/v1/pull-and-serve.png)
 
 ## Understanding the Specification
 

diff --git a/docs/v2/intro.md b/docs/v2/intro.md
@@ -0,0 +1,40 @@
+# Model Specification Version 2
+
+## Overview
+
+The core of the v2 model specification is the definition of the model artifact, metadata and runtime environment.
+
+The model artifact is a collection of files that represent the AI model. It consists of the model configuration, model weights, model tokenizer, and other model resources.
+
+The model metadata is general information about the model, such as the model name, version, model family, description, author, license, and architecture. A model registry can parse the model metadata to display the model information.
+
+The model runtime environment is the environment in which the model runs. It includes the inference engine information, such as verion, configuration, dependencies, and environment variables.
+
+The model artifact, metadata and runtime environment are organized in a model manifest, which is a JSON file that describes the model. The model manifest is used to package and distribute the model, and can be stored in a model registry and downloaded by a model runtime.
+
+With a proper defined model specification, we can package AI models of a model repository into a model image, and push the model image to the model registry. The model image can be pulled and run by the model runtime, either as a standalone package or as a readonly volume source in a container.
+
+## Goals
+
+The goals of developing the model specification are:
+
+* To provide a way for developers to package and distribute AI models in a cloud native environment.
+* To promote AI models as a first-class citizen and pave the way for the infrastructure to be organized around AI models.
+* To define general model artifact, metadata, and runtime environment, so that the model can be easily understood and managed by any components of the infrastructure.
+* To define a general model format description to allow easy integration of models with model runtimes.
+
+## Non-Goals
+
+* To build standard interfaces for model management tools to build, distribute, manage, and run AI models.
+
+The model specification is designed to be a foundation for building standard interfaces to build, distribute, manage, and run AI models. But the model specification itself does not define such standard interfaces.
+
+## Plans
+
+The model specification is still pretty rough. It is a living document and will evolve over time. Future work includes:
+
+* Figure out the details of AI model artifact, metadata, and runtime environment.
+* Define a general transformer architecture abstraction to support build once and run everywhere of LLMs.
+* Develop tools to build and save AI models in a model registry.
+* Develop tools to pull and run AI models in a model runtime.
+* Modify [vllm](https://github.com/vllm-project/vllm) to support the model specification and run any transformer architecture LLMs without modification.