Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding -tb, --threads-batch to server.cpp #3584

Merged
merged 2 commits into from
Oct 11, 2023
Merged

Adding -tb, --threads-batch to server.cpp #3584

merged 2 commits into from
Oct 11, 2023

Conversation

m18coppola
Copy link
Contributor

resolves #3473

@ggerganov ggerganov merged commit a8bdd65 into ggerganov:master Oct 11, 2023
27 of 37 checks passed
joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 12, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp: (34 commits)
  examples: support LLaVA v1.5 (multimodal model) (ggerganov#3436)
  docs : fix typo GOMP_CPU_AFFINITY (ggerganov#3597)
  cmake : fix add_compile_options on macOS
  typo : it is `--n-gpu-layers` not `--gpu-layers` (ggerganov#3592)
  ci : check if there is enough VRAM (ggerganov#3596)
  server : add completion mode (no chat) (ggerganov#3582)
  prompts : add mnemonics.txt
  server : fix kv cache management (ggerganov#3588)
  main : fix session loading bug (ggerganov#3400)
  server : add parameter -tb N, --threads-batch N (ggerganov#3584)
  common : fix mirostat state when using multiple sequences (ggerganov#3543)
  batched : add bench tool (ggerganov#3545)
  examples : add batched.swift + improve CI for swift (ggerganov#3562)
  Add MPT model to supported models in README.md (ggerganov#3574)
  Minor improvements in GPT2 tokenizer (ggerganov#3567)
  readme : add bloom (ggerganov#3570)
  llm : add bloom models (ggerganov#3553)
  swift : improvements and fixes (ggerganov#3564)
  llm : add MPT support (ggerganov#3417)
  infill. : fix tokenization (ggerganov#3508)
  ...
cebtenzzre pushed a commit to cebtenzzre/llama.cpp that referenced this pull request Oct 24, 2023
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
ggerganov pushed a commit that referenced this pull request Oct 24, 2023
Co-authored-by: Michael Coppola <m18coppola@gmail.com>
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
mattgauf added a commit to mattgauf/llama.cpp that referenced this pull request Oct 27, 2023
* master: (350 commits)
  speculative : ensure draft and target model vocab matches (ggerganov#3812)
  llama : correctly report GGUFv3 format (ggerganov#3818)
  simple : fix batch handling (ggerganov#3803)
  cuda : improve text-generation and batched decoding performance (ggerganov#3776)
  server : do not release slot on image input (ggerganov#3798)
  batched-bench : print params at start
  log : disable pid in log filenames
  server : add parameter -tb N, --threads-batch N (ggerganov#3584) (ggerganov#3768)
  server : do not block system prompt update (ggerganov#3767)
  sync : ggml (conv ops + cuda MSVC fixes) (ggerganov#3765)
  cmake : add missed dependencies (ggerganov#3763)
  cuda : add batched cuBLAS GEMM for faster attention (ggerganov#3749)
  Add more tokenizer tests (ggerganov#3742)
  metal : handle ggml_scale for n%4 != 0 (close ggerganov#3754)
  Revert "make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)"
  issues : separate bug and enhancement template + no default title (ggerganov#3748)
  Update special token handling in conversion scripts for gpt2 derived tokenizers (ggerganov#3746)
  llama : remove token functions with `context` args in favor of `model` (ggerganov#3720)
  Fix baichuan convert script not detecing model (ggerganov#3739)
  make : add optional CUDA_NATIVE_ARCH (ggerganov#2482)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server.cpp is not accepting parameter -tb N, --threads-batch N
3 participants