Skip to content

Commit

Permalink
server: #5655 - continue to update other slots on embedding concurren…
Browse files Browse the repository at this point in the history
…t request.

server: tests: add multi users embeddings as fixed
  • Loading branch information
phymbert committed Feb 24, 2024
1 parent 525213d commit 09b77b4
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 34 deletions.
2 changes: 1 addition & 1 deletion examples/server/server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1836,7 +1836,7 @@ struct llama_server_context
send_embedding(slot);
slot.release();
slot.i_batch = -1;
return true;
continue;
}

completion_token_output result;
Expand Down
34 changes: 1 addition & 33 deletions examples/server/tests/features/issues.feature
Original file line number Diff line number Diff line change
@@ -1,36 +1,4 @@
# List of ongoing issues
@bug
Feature: Issues
# Issue #5655
Scenario: Multi users embeddings
Given a server listening on localhost:8080
And a model file stories260K.gguf
And a model alias tinyllama-2
And 42 as server seed
And 64 KV cache size
And 2 slots
And continuous batching
And embeddings extraction
Then the server is starting
Then the server is healthy

Given a prompt:
"""
Write a very long story about AI.
"""
And a prompt:
"""
Write another very long music lyrics.
"""
And a prompt:
"""
Write a very long poem.
"""
And a prompt:
"""
Write a very long joke.
"""
Given concurrent embedding requests
Then the server is busy
Then the server is idle
Then all embeddings are generated
# No confirmed issue at the moment
23 changes: 23 additions & 0 deletions examples/server/tests/features/parallel.feature
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Feature: Parallel
And 42 as server seed
And 64 KV cache size
And 2 slots
And embeddings extraction
And continuous batching
Then the server is starting
Then the server is healthy
Expand Down Expand Up @@ -75,3 +76,25 @@ Feature: Parallel
Then the server is busy
Then the server is idle
Then all prompts are predicted

Scenario: Multi users embeddings
Given a prompt:
"""
Write a very long story about AI.
"""
And a prompt:
"""
Write another very long music lyrics.
"""
And a prompt:
"""
Write a very long poem.
"""
And a prompt:
"""
Write a very long joke.
"""
Given concurrent embedding requests
Then the server is busy
Then the server is idle
Then all embeddings are generated

0 comments on commit 09b77b4

Please sign in to comment.