llama : rename n_ctx to kv_size #5568

ggerganov · 2024-02-18T20:15:27Z

The n_ctx name is causing some confusion since it's actual meaning is the size of the KV cache, while n_ctx_train is the training context of the model

This change fixes that, but since it is a big one and touches a lot of stuff, I'm not sure if it worth merging. Maybe sometime in the future, when the time is right

Original PR: #5546

ggml-ci

bobqianic · 2024-02-19T02:31:31Z

Does n_ctx in whisper.cpp also refer to the size of the KV cache?

ggerganov · 2024-02-19T07:41:33Z

In the decoder - yes

Green-Sky · 2024-02-19T12:11:22Z

tests/test-backend-ops.cpp

@@ -1545,7 +1545,7 @@ struct llama_hparams {
    int32_t n_tokens;

    // llm_build_context
-    static constexpr int32_t n_kv    = 32; // size of KV cache to consider (n_kv <= n_ctx
+    static constexpr int32_t n_kv    = 32; // size of KV cache to consider (n_kv <= kv_size


looks like this commend was missing a closing )

compilade · 2024-02-21T23:29:59Z

I do not agree with this change (but I like the underlying intention of making llama.cpp less confusing).

As I'm working on supporting Mamba in llama.cpp (see #5328), I'd like to warn that renaming n_ctx to kv_size would make it harder to support non-Transformer architectures in a straightforward way.

With Mamba, the KV cache size is tied to the maximum number of distinct sequences processed at the same time. Not the "context size". n_ctx is still used to limit the maximum number of processed tokens, which is fine, because some examples need a fixed size for the buffer of input tokens (e.g. in server, lookahead, lookup, parallel, and perplexity when calling llama_batch_init).

What I propose instead (and this is what I've started doing in #5328) is to keep n_ctx, but use kv_self.size instead of n_ctx in the places where it's really the KV cache size that is meant, because Mamba breaks the equivalence of n_ctx with kv_self.size.

TL;DR: renaming n_ctx to kv_size makes it harder to decouple the context size from the KV cache size.

ggerganov added breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. refactoring Refactoring labels Feb 18, 2024

ggerganov mentioned this pull request Feb 18, 2024

server: rename legacy --ctx-size to --kv-size option #5546

Closed

phymbert added 4 commits February 18, 2024 22:40

server: rename legacy --ctx-size to --kv-size

9a06956

server: document the --ctx-size deprecation in server README.md

ef96e8b

rename n_ctx to kv_size

6068734

fix some spaces added by IDE in math op

47c662b

ggml-ci

ggerganov force-pushed the gg/rename-n_ctx branch from 985fd62 to 47c662b Compare February 18, 2024 20:40

Green-Sky reviewed Feb 19, 2024

View reviewed changes

martindevans mentioned this pull request Feb 21, 2024

Getting exception: "llama_decode failed: 'NoKvSlot'" when LLM analyze text (news) SciSharp/LLamaSharp#528

Closed

ggerganov closed this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : rename n_ctx to kv_size #5568

llama : rename n_ctx to kv_size #5568

ggerganov commented Feb 18, 2024

bobqianic commented Feb 19, 2024

ggerganov commented Feb 19, 2024

Green-Sky Feb 19, 2024

compilade commented Feb 21, 2024 •

edited

Loading

llama : rename n_ctx to kv_size #5568

llama : rename n_ctx to kv_size #5568

Conversation

ggerganov commented Feb 18, 2024

bobqianic commented Feb 19, 2024

ggerganov commented Feb 19, 2024

Green-Sky Feb 19, 2024

Choose a reason for hiding this comment

compilade commented Feb 21, 2024 • edited Loading

compilade commented Feb 21, 2024 •

edited

Loading