Add SimLayerKVPress #28

SimJeg · 2024-12-09T12:49:08Z

Add SimLayerKVPress (paper, official repository) following issue #19 and PR #22.

SimLayerKV uses a layer-wise approach to compression:
- layers identified as lazy use the Streaming LLM approach (only initial and recent KV pairs are kept)
- other layers use the full KV cache

To identify lazy layers, the last attention weights are used. If the sum of attention weights of the last tokens
over the initial and recent tokens is above the lazy_threshold, the layer is considered lazy.

As for the wrapper for layer-wise compression ratio, this press only works using flash attention (more investigation to be done on why). A notable difference from other presses is that the input of SimLayerKVPress is not a compression ratio but the lazy_threshold as defined in the paper. However I implemented a compression_ratio property that is computed dynamically.

@dame-cell, after implementing it, I found it was not necessary after all to implement a new cache class. Could you please review and comment this implementation ? I will also share it on the official repository to get a feedback from the authors.

dame-cell · 2024-12-09T12:54:00Z

Add SimLayerKVPress (paper, official repository) following issue #19 and PR #22.

SimLayerKV uses a layer-wise approach to compression:
- layers identified as lazy use the Streaming LLM approach (only initial and recent KV pairs are kept)
- other layers use the full KV cache

To identify lazy layers, the last attention weights are used. If the sum of attention weights of the last tokens
over the initial and recent tokens is above the lazy_threshold, the layer is considered lazy.

As for the wrapper for layer-wise compression ratio, this press only works using flash attention (more investigation to be done on why). A notable difference from other presses is that the input of SimLayerKVPress is not a compression ratio but the lazy_threshold as defined in the paper. However I implemented a compression_ratio property that is computed dynamically.

@dame-cell, after implementing it, I found it was not necessary after all to implement a new cache class. Could you please review and comment this implementation ? I will also share it on the official repository to get a feedback from the authors.

Thanks for adding this i was banging my head just trying to implement this

Right now Im actually busy I'll try reviewing tommorow if that's ok 😔

SimJeg · 2024-12-09T12:59:51Z

Thanks for adding this i was banging my head just trying to implement this

Right now Im actually busy I'll try reviewing tommorow if that's ok 😔

@dame-cell no pb, I also ask for a review on the official repository (see sail-sg/SimLayerKV#4)

kvpress/presses/simlayerkv_press.py

tests/presses/test_presses.py

maxjeblick

Thanks a lot for the PR!
In general, the PR is in a good shape. I left some comments, they should be fast to fix.
Regrading the press itself, it looks good to me. I haven't studied the original work in detail, so it may make sense to also wait if the authors give some feedback.

maxjeblick

Code + implementation LGTM, thanks for adding the press!

I haven't checked in detail if the press is equivalent to the original one, if there's no feedback in the next day(s), I will have a look at this, as well.

dame-cell · 2024-12-10T13:14:36Z

@SimJeg I have added a comment please check it out and tell me if this will work or not ?
other than that i have tested it myself seems to be working pretty good 💯

SimJeg · 2024-12-10T13:23:22Z

@dame-cell I don't see your comment, can you provide a link to it ?

dame-cell · 2024-12-10T14:17:13Z

@SimJeg forgive me here is the comment
in the original implementation they included different threshold for different models

        if 'llama3' in out_path:
            threshold = 0.9
        elif 'llama2' in out_path:
            threshold = 0.65
        elif 'mistral' in out_path:
            threshold = 0.8
        elif 'qwen' in out_path:
            threshold = 0.85

Adding something similar could make this even more versatile and model-aware. Just a thought—curious to hear your perspective! 😊

maybe something like this

def get_lazy_threshold(model_name: str) -> float:
    if 'llama3' in model_name:
        return 0.9
    elif 'llama2' in model_name:
        return 0.65
    elif 'mistral' in model_name:
        return 0.8
    elif 'qwen' in model_name:
        return 0.85
    else:
        return 0.7  # Default threshold

# Example  
module.config.get("model_name", "")
lazy_threshold = get_lazy_threshold(model_name)

SimJeg · 2024-12-10T14:51:10Z

I prefered to let the user specify the lazy_threshold argument to make it model agnostic but I will update the docstring to provide to the user help to set this value.

kvpress/presses/simlayerkv_press.py

maxjeblick

LGTM, thanks!
I left two small comments.

kvpress/presses/simlayerkv_press.py

maxjeblick

LGTM, thanks!

maxjeblick

Reapproving

jadeCurl · 2024-12-23T05:23:07Z

Hi,

Thanks for incorporating our SimLayerKV!

We are currently working on version 2 of our project, with a major update being its integration with flash attention to enhance efficiency. For your reference, here is the source code:

attn_out, lse = flash_attn(q, k, v, causal=True, return_lse=True)
# identification
# w_last = 32, w_sink=4, w_recent=1020
q_last = q[:, -w_last:].permute(0, 2, 1, 3)
k_comb = torch.cat([k[:, 0:w_sink], k[:, -w_recent:]], dim=1).permute(0, 2, 3, 1)
log_lazy_weight = torch.matmul(q_last, k_comb).logsumexp(dim=-1) - lse[:,:,-w_recent:]

add simlayerkvpress

ea82692

This was referenced Dec 9, 2024

Add simlayerkv #22

Closed

Integration in kvpress sail-sg/SimLayerKV#4

Open

maxjeblick self-assigned this Dec 9, 2024

SimJeg mentioned this pull request Dec 9, 2024

DynamicCache does not support variable lengths, except for FA2 huggingface/transformers#35168

Open

4 tasks

SimJeg changed the title ~~add simlayerkvpress~~ Add SimLayerKVPress Dec 9, 2024

add quantization support

190a1db

maxjeblick reviewed Dec 10, 2024

View reviewed changes

kvpress/presses/simlayerkv_press.py Outdated Show resolved Hide resolved

maxjeblick reviewed Dec 10, 2024

View reviewed changes

kvpress/presses/simlayerkv_press.py Outdated Show resolved Hide resolved

maxjeblick reviewed Dec 10, 2024

View reviewed changes

tests/presses/test_presses.py Outdated Show resolved Hide resolved

maxjeblick requested changes Dec 10, 2024

View reviewed changes

Adress PR feedback

445b62b

SimJeg mentioned this pull request Dec 10, 2024

Refactor press-scorer #24

Merged

maxjeblick approved these changes Dec 10, 2024

View reviewed changes

SimJeg added 3 commits December 10, 2024 16:53

Merge branch 'main' into simon/simlayerkv

c0296ed

Add lazy_threshold recommendations in docstring

5e5a6bc

Fix style

ea57085

SimJeg linked an issue Dec 10, 2024 that may be closed by this pull request

Add SIMLAYERKV #19

Closed

maxjeblick reviewed Dec 10, 2024

View reviewed changes

kvpress/presses/simlayerkv_press.py Outdated Show resolved Hide resolved

maxjeblick reviewed Dec 10, 2024

View reviewed changes

kvpress/presses/simlayerkv_press.py Outdated Show resolved Hide resolved

maxjeblick approved these changes Dec 10, 2024

View reviewed changes

SimJeg added 2 commits December 11, 2024 13:34

Adress PR feedback

73cde6b

Update compression_ratio management

befdbbf

maxjeblick reviewed Dec 11, 2024

View reviewed changes

kvpress/presses/simlayerkv_press.py Outdated Show resolved Hide resolved

SimJeg added 3 commits December 11, 2024 13:55

Update cr

7776670

Update cr

e5d69e4

Fix tests

d857270

maxjeblick approved these changes Dec 11, 2024

View reviewed changes

Remove assert

fbf225e

maxjeblick approved these changes Dec 11, 2024

View reviewed changes

SimJeg merged commit e36615c into main Dec 11, 2024
2 checks passed

SimJeg deleted the simon/simlayerkv branch December 11, 2024 14:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SimLayerKVPress #28

Add SimLayerKVPress #28

SimJeg commented Dec 9, 2024

dame-cell commented Dec 9, 2024 •

edited

Loading

SimJeg commented Dec 9, 2024

maxjeblick left a comment

maxjeblick left a comment

dame-cell commented Dec 10, 2024 •

edited

Loading

SimJeg commented Dec 10, 2024

dame-cell commented Dec 10, 2024

SimJeg commented Dec 10, 2024

maxjeblick left a comment

maxjeblick left a comment

maxjeblick left a comment

jadeCurl commented Dec 23, 2024 •

edited

Loading

Add SimLayerKVPress #28

Add SimLayerKVPress #28

Conversation

SimJeg commented Dec 9, 2024

dame-cell commented Dec 9, 2024 • edited Loading

SimJeg commented Dec 9, 2024

maxjeblick left a comment

Choose a reason for hiding this comment

maxjeblick left a comment

Choose a reason for hiding this comment

dame-cell commented Dec 10, 2024 • edited Loading

SimJeg commented Dec 10, 2024

dame-cell commented Dec 10, 2024

SimJeg commented Dec 10, 2024

maxjeblick left a comment

Choose a reason for hiding this comment

maxjeblick left a comment

Choose a reason for hiding this comment

maxjeblick left a comment

Choose a reason for hiding this comment

jadeCurl commented Dec 23, 2024 • edited Loading

dame-cell commented Dec 9, 2024 •

edited

Loading

dame-cell commented Dec 10, 2024 •

edited

Loading

jadeCurl commented Dec 23, 2024 •

edited

Loading