Skip to content

Commit

Permalink
bump version to v0.6.0 (#2445)
Browse files Browse the repository at this point in the history
* bump version to 0.6.0

* update readme

* update supported models

* update get_started on ascend platform
  • Loading branch information
lvhan028 authored Sep 13, 2024
1 parent 64fe4c5 commit e2aa4bd
Show file tree
Hide file tree
Showing 10 changed files with 97 additions and 75 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,10 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/08\] 🔥🔥 LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
- \[2024/07\] 🎉🎉 Support Llama3.1 8B, 70B and its TOOLS CALLING
- \[2024/09\] LMDeploy PyTorchEngine adds support for[Huawei Ascend](./docs/en/get_started/ascend/get_started.md). See supported models [here](docs/en/supported_models/supported_models.md)
- \[2024/09\] LMDeploy PyTorchEngine achieves 1.3x faster on Llama3-8B inference by introducing CUDA graph
- \[2024/08\] LMDeploy is integrated into [modelscope/swift](https://github.com/modelscope/swift) as the default accelerator for VLMs inference
- \[2024/07\] Support Llama3.1 8B, 70B and its TOOLS CALLING
- \[2024/07\] Support [InternVL2](docs/en/multi_modal/internvl.md) full-series models, [InternLM-XComposer2.5](docs/en/multi_modal/xcomposer2d5.md) and [function call](docs/en/llm/api_server_tools.md) of InternLM2.5
- \[2024/06\] PyTorch engine support DeepSeek-V2 and several VLMs, such as CogVLM2, Mini-InternVL, LlaVA-Next
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
Expand Down
6 changes: 4 additions & 2 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,10 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/08\] 🔥🔥 LMDeploy现已集成至 [modelscope/swift](https://github.com/modelscope/swift),成为 VLMs 推理的默认加速引擎
- \[2024/07\] 🎉🎉 支持 Llama3.1 8B 和 70B 模型,以及工具调用功能
- \[2024/09\] LMDeploy PyTorchEngine 增加了对 [华为 Ascend](docs/zh_cn/get_started/ascend/get_started.md) 的支持。支持的模型请见[这里](docs/zh_cn/supported_models/supported_models.md)
- \[2024/09\] 通过引入 CUDA Graph,LMDeploy PyTorchEngine 在 Llama3-8B 推理上实现了 1.3 倍的加速
- \[2024/08\] LMDeploy现已集成至 [modelscope/swift](https://github.com/modelscope/swift),成为 VLMs 推理的默认加速引擎
- \[2024/07\] 支持 Llama3.1 8B 和 70B 模型,以及工具调用功能
- \[2024/07\] 支持 [InternVL2](docs/zh_cn/multi_modal/internvl.md) 全系列模型,[InternLM-XComposer2.5](docs/zh_cn/multi_modal/xcomposer2d5.md) 模型和 InternLM2.5 的 [function call 功能](docs/zh_cn/llm/api_server_tools.md)
- \[2024/06\] PyTorch engine 支持了 DeepSeek-V2 和若干 VLM 模型推理, 比如 CogVLM2,Mini-InternVL,LlaVA-Next
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
Expand Down
2 changes: 1 addition & 1 deletion docker/Dockerfile_aarch64_ascend
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ RUN echo "source /usr/local/Ascend/ascend-toolkit/set_env.sh" >> ~/.bashrc && \
# timm is required for internvl2 model
RUN --mount=type=cache,target=/root/.cache/pip \
pip3 install transformers>=4.41.0 timm && \
pip3 install dlinfer-ascend==0.1.0
pip3 install dlinfer-ascend==0.1.0.post1

# lmdeploy
FROM build_temp as copy_temp
Expand Down
52 changes: 22 additions & 30 deletions docs/en/get_started/ascend/get_started.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,53 @@
# Get Started with Huawei Ascend (Atlas 800T A2
# Get Started with Huawei Ascend (Atlas 800T A2)

The usage of lmdeploy on a Huawei Ascend device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy.
Please read the original [Get Started](../get_started.md) guide before reading this tutorial.

## Installation

### Environment Preparation
We highly recommend that users build a Docker image for streamlined environment setup.

#### Drivers and Firmware
Git clone the source code of lmdeploy and the Dockerfile locates in the `docker` directory:

The host machine needs to install the Huawei driver and firmware version 23.0.3, refer to
[CANN Driver and Firmware Installation](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/softwareinst/instg/instg_0019.html)
and [download resources](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC3.alpha001&driver=1.0.0.2.alpha).
```shell
git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy
```

#### CANN
### Environment Preparation

The Docker version is supposed to be no less than `18.03`. And `Ascend Docker Runtime` should be installed by following [the official guide](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterschedulingig/.clusterschedulingig/dlug_installation_012.html).

File `docker/Dockerfile_aarch64_ascend` does not provide Ascend CANN installation package, users need to download the CANN (version 8.0.RC3.alpha001) software packages from [Ascend Resource Download Center](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.alpha001) themselves. And place the Ascend-cann-kernels-910b\*.run and Ascend-cann-toolkit\*-aarch64.run under the directory where the docker build command is executed.
#### Ascend Drivers, Firmware and CANN

#### Docker
The target machine needs to install the Huawei driver and firmware version 23.0.3, refer to
[CANN Driver and Firmware Installation](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/softwareinst/instg/instg_0019.html)
and [download resources](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC3.alpha001&driver=1.0.0.2.alpha).

Building the aarch64_ascend image requires Docker >= 18.03
And the CANN (version 8.0.RC3.alpha001) software packages should also be downloaded from [Ascend Resource Download Center](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.alpha001) themselves. Make sure to place the `Ascend-cann-kernels-910b*.run` and `Ascend-cann-toolkit*-aarch64.run` under the root directory of lmdeploy source code

#### Reference Command for Building the Image
#### Build Docker Image

The following reference command for building the image is based on the lmdeploy source code root directory as the current directory, and the CANN-related installation packages are also placed under this directory.
Run the following command in the root directory of lmdeploy to build the image:

```bash
DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:v0.1 \
DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \
    -f docker/Dockerfile_aarch64_ascend .
```

This image will install lmdeploy to `/workspace/lmdeploy` directory using `pip install --no-build-isolation -e .` command.

#### Using the Image

You can refer to the [documentation](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html)
for usage. It is recommended to install Ascend Docker Runtime.
Here is an example of starting container for Huawei Ascend device with Ascend Docker Runtime installed:
If the following command executes without any errors, it indicates that the environment setup is successful.

```bash
docker run -e ASCEND_VISIBLE_DEVICES=0 --net host -td --entry-point bash --name lmdeploy_ascend_demo \
    lmdeploy-aarch64-ascend:v0.1  # docker_image_sha_or_name
docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
```

#### Pip install

If you have lmdeploy installed and all Huawei environments are ready, you can run the following command to enable lmdeploy to run on Huawei Ascend devices. (Not necessary if you use the Docker image.)

```bash
pip install dlinfer-ascend
```
For more information about running the Docker client on Ascend devices, please refer to the [guide](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html)

## Offline batch inference

### LLM inference

Set `device_type="ascend"`  in the `PytorchEngineConfig`:
Set `device_type="ascend"` in the `PytorchEngineConfig`:

```python
from lmdeploy import pipeline
Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip install lmdeploy
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.6.0a0
export LMDEPLOY_VERSION=0.6.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
22 changes: 20 additions & 2 deletions docs/en/supported_models/supported_models.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Supported Models

## Models supported by TurboMind
The following tables detail the models supported by LMDeploy's TurboMind engine and PyTorch engine across different platforms.

## TurboMind on CUDA Platform

| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :---: |
Expand Down Expand Up @@ -38,7 +40,7 @@
The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
```

## Models supported by PyTorch
## PyTorchEngine on CUDA Platform

| Model | Size | Type | FP16/BF16 | KV INT8 | W8A8 | W4A16 |
| :------------: | :---------: | :--: | :-------: | :-----: | :--: | :---: |
Expand Down Expand Up @@ -79,3 +81,19 @@ The TurboMind engine doesn't support window attention. Therefore, for models tha
| Phi-3.5-mini | 3.8B | LLM | Yes | No | No | - |
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | No | No | - |
| Phi-3.5-vision | 4.2B | MLLM | Yes | No | No | - |

## PyTorchEngine on Huawei Ascend Platform

| Model | Size | Type | FP16/BF16 |
| :------------: | :------: | :--: | :-------: |
| Llama2 | 7B - 70B | LLM | Yes |
| Llama3 | 8B | LLM | Yes |
| Llama3.1 | 8B | LLM | Yes |
| InternLM2 | 7B - 20B | LLM | Yes |
| InternLM2.5 | 7B - 20B | LLM | Yes |
| Mixtral | 8x7B | LLM | Yes |
| QWen1.5-MoE | A2.7B | LLM | Yes |
| QWen2 | 7B | LLM | Yes |
| QWen2-MoE | A14.57B | LLM | Yes |
| InternVL(v1.5) | 2B-26B | MLLM | Yes |
| InternVL2 | 1B-40B | MLLM | Yes |
56 changes: 23 additions & 33 deletions docs/zh_cn/get_started/ascend/get_started.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,47 @@
# 华为昇腾(Atlas 800T A2)
# 华为昇腾Atlas 800T A2)

我们采用了LMDeploy中的PytorchEngine后端支持了华为昇腾设备,
所以在华为昇腾上使用lmdeploy的方法与在英伟达GPU上使用PytorchEngine后端的使用方法几乎相同。
在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)
我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)

## 安装

### 环境准备
我们强烈建议用户构建一个 Docker 镜像以简化环境设置。

#### Drivers和Firmware
克隆 lmdeploy 的源代码,Dockerfile 位于 docker 目录中。

Host需要安装华为驱动程序和固件版本23.0.3,请参考
[CANN 驱动程序和固件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/softwareinst/instg/instg_0019.html)
[下载资源](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC3.alpha001&driver=1.0.0.2.alpha)
```shell
git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy
```

#### CANN
### 环境准备

Docker 版本应不低于 18.03。并且需按照[官方指南](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc2/clusterscheduling/clusterschedulingig/clusterschedulingig/dlug_installation_012.html)安装 Ascend Docker Runtime。

`docker/Dockerfile_aarch64_ascend`没有提供CANN 安装包,用户需要自己从[昇腾资源下载中心](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.alpha001)下载CANN(8.0.RC3.alpha001)软件包。
并将Ascend-cann-kernels-910b\*.run 和 Ascend-cann-toolkit\*-aarch64.run 放在执行`docker build`命令的目录下。
#### Drivers,Firmware 和 CANN

#### Docker
目标机器需安装华为驱动程序和固件版本 23.0.3,请参考
[CANN 驱动程序和固件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/80RC1alpha003/softwareinst/instg/instg_0019.html)
[下载资源](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC3.alpha001&driver=1.0.0.2.alpha)

构建aarch64_ascend镜像需要Docker>=18.03
另外,`docker/Dockerfile_aarch64_ascend`没有提供CANN 安装包,用户需要自己从[昇腾资源下载中心](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC3.alpha001)下载CANN(8.0.RC3.alpha001)软件包。
并将``` Ascend-cann-kernels-910b*.run`` 和 ```Ascend-cann-toolkit\*-aarch64.run\`\` 放在 lmdeploy 源码根目录下。

#### 构建镜像的命令
#### 构建镜像

请在lmdeploy源代码根目录下执行以下镜像构建命令,CANN相关的安装包也放在此目录下
请在 lmdeploy源 代码根目录下执行以下镜像构建命令,CANN 相关的安装包也放在此目录下

```bash
DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:v0.1 \
DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \
    -f docker/Dockerfile_aarch64_ascend .
```

这个镜像将使用`pip install --no-build-isolation -e .`命令将lmdeploy安装到/workspace/lmdeploy目录。

#### 镜像的使用

关于镜像的使用方式,请参考这篇[文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html)
并且在使用镜像前安装Ascend Docker Runtime。
以下是在安装了 Ascend Docker Runtime 的情况下,启动用于华为昇腾设备的容器的示例:
如果以下命令执行没有任何错误,这表明环境设置成功。

```bash
docker run -e ASCEND_VISIBLE_DEVICES=0 --net host -td --entry-point bash --name lmdeploy_ascend_demo \
    lmdeploy-aarch64-ascend:v0.1  # docker_image_sha_or_name
docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env
```

#### 使用Pip安装

如果您已经安装了lmdeploy并且所有华为环境都已准备好,您可以运行以下命令使lmdeploy能够在华为昇腾设备上运行。(如果使用Docker镜像则不需要)

```bash
pip install dlinfer-ascend
```
关于在昇腾设备上运行`docker run`命令的详情,请参考这篇[文档](https://www.hiascend.com/document/detail/zh/mindx-dl/60rc1/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html)

## 离线批处理

Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/get_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip install lmdeploy
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:

```shell
export LMDEPLOY_VERSION=0.6.0a0
export LMDEPLOY_VERSION=0.6.0
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
22 changes: 20 additions & 2 deletions docs/zh_cn/supported_models/supported_models.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# 支持的模型

## TurboMind 支持的模型
以下列表分别为 LMDeploy TurboMind 引擎和 PyTorch 引擎在不同软硬件平台下支持的模型

## TurboMind CUDA 平台

| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
| :-------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :---: |
Expand Down Expand Up @@ -38,7 +40,7 @@
turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
```

### PyTorch 支持的模型
## PyTorchEngine CUDA 平台

| Model | Size | Type | FP16/BF16 | KV INT8 | W8A8 | W4A16 |
| :------------: | :---------: | :--: | :-------: | :-----: | :--: | :---: |
Expand Down Expand Up @@ -79,3 +81,19 @@ turbomind 引擎不支持 window attention。所以,对于应用了 window att
| Phi-3.5-mini | 3.8B | LLM | Yes | No | No | - |
| Phi-3.5-MoE | 16x3.8B | LLM | Yes | No | No | - |
| Phi-3.5-vision | 4.2B | MLLM | Yes | No | No | - |

## PyTorchEngine 华为昇腾平台

| Model | Size | Type | FP16/BF16 |
| :------------: | :------: | :--: | :-------: |
| Llama2 | 7B - 70B | LLM | Yes |
| Llama3 | 8B | LLM | Yes |
| Llama3.1 | 8B | LLM | Yes |
| InternLM2 | 7B - 20B | LLM | Yes |
| InternLM2.5 | 7B - 20B | LLM | Yes |
| Mixtral | 8x7B | LLM | Yes |
| QWen1.5-MoE | A2.7B | LLM | Yes |
| QWen2 | 7B | LLM | Yes |
| QWen2-MoE | A14.57B | LLM | Yes |
| InternVL(v1.5) | 2B-26B | MLLM | Yes |
| InternVL2 | 1B-40B | MLLM | Yes |
2 changes: 1 addition & 1 deletion lmdeploy/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple

__version__ = '0.6.0a0'
__version__ = '0.6.0'
short_version = __version__


Expand Down

0 comments on commit e2aa4bd

Please sign in to comment.