Skip to content

Commit

Permalink
Docs: Performance observations for M2 CPUs (#56)
Browse files Browse the repository at this point in the history
---------

Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com>
  • Loading branch information
blackforestboi and ashvardanian authored Dec 31, 2023
1 parent fdc8587 commit 8374ef6
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ Results for VQAv2 evaluation.
## Speed

On RTX 3090, the following performance is expected on text encoding.
On Nvidia RTX 3090, the following performance is expected on text encoding.

| Model | Multilingual | Speed | Speedup |
| :---------------------------------------- | -----------: | ---------------------: | ---------: |
Expand All @@ -271,14 +271,27 @@ On RTX 3090, the following performance is expected on text encoding.
| `sentence-transformers/all-MiniLM-L12-v2` | __Yes__ | 3'604 sequences/second | x 2.24 |
| `unum-cloud/uform-vl-multilingual-v2` | __Yes__ | 6'809 sequences/second | __x 4.22__ |

On RTX 3090, the following performance is expected on text token generation using `float16`, equivalent PyTorch settings, and greedy decoding.
On Nvidia RTX 3090, the following performance is expected on text token generation using `float16`, equivalent PyTorch settings, and greedy decoding.

| Model | Size | Speed | Speedup |
| :---------------------------------- | ---: | ------------------: | --------: |
| `llava-hf/llava-1.5-7b-hf` | 7B | ~ 40 tokens/second | |
| `Salesforce/instructblip-vicuna-7b` | 7B | ~ 40 tokens/second | |
| `unum-cloud/uform-gen` | 1.5B | ~ 140 tokens/second | __x 3.5__ |

Given the small size of the model it also work well on mobile devices.
On Apple M2 Arm chips the energy efficiency of inference can exceed that of the RTX 3090 GPU and other Ampere-generation cards.

| Device | Speed | Device TDP | Efficiency |
| :--------------------- | ------------------: | ---------: | ----------------: |
| Nvidia RTX 3090 | ~ 140 tokens/second | < 350W | 0.40 tokens/joule |
| Apple M2 Pro unplugged | ~ 19 tokens/second | < 20W | 0.95 tokens/joule |
| Apple M2 Max unplugged | ~ 38 tokens/second | < 36W | 1.06 tokens/joule |
| Apple M2 Max plugged | ~ 56 tokens/second | < 89W | 0.63 tokens/joule |

> [!WARNING]
> The above numbers are for reference only and are not guaranteed to be accurate.
## License

All models come under the same license as the code - Apache 2.0.

0 comments on commit 8374ef6

Please sign in to comment.