Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for . (any character) token in grammar engine. #6467

Merged
merged 2 commits into from
Jun 6, 2024

Conversation

HanClinto
Copy link
Collaborator

@HanClinto HanClinto commented Apr 4, 2024

Low priority feature. Consider this more of a suggestion than a request. :)

As came up in discussion of #6441, I wanted a way to create a grammar that ensured a minimum response length. It seemed prudent to add support for a "." character that would match on any generated token -- without it, I used the string [^\x00], which matches on any non-null character (which is very nearly the same thing).

I don't know if this character be useful in too many other situations or not, so feel free to leave this one out if you don't think it's worthy.

I didn't add this token to the grammar tests, because frankly I haven't really been able to wrap my head around them. I would still like to eventually get around to writing some end-to-end / integration tests for the grammar engine that are a bit easier to grok and extend, but unless otherwise requested, I'll leave that exercise for another PR.

Example usage:

./main -m ./models/llama-2-7b.Q4_0.gguf -e -r "\n" --grammar "root ::= [^\n].+" -p "My favorite flavor is "

Example output:

 My favorite flavor is 🍌. surely you know that I love to eat meat, I'm a real carnivore, I like to eat pork, chicken and beef. I can not imagine my life without meat, I also like to eat seafood.

Copy link
Contributor

github-actions bot commented Apr 4, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 489 iterations 🚀

  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=9604.19ms p(90)=26264.61ms fails=0, finish reason: stop=489 truncated=0
  • Prompt processing (pp): avg=245.14tk/s p(90)=742.6tk/s total=195.52tk/s
  • Token generation (tg): avg=101.83tk/s p(90)=292.3tk/s total=130.03tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feature_grammar_char_any commit=9a3acbba9afa314a57acd546943fe91565a65d19
Time series

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 374.08, 374.08, 374.08, 374.08, 374.08, 482.72, 482.72, 482.72, 482.72, 482.72, 517.75, 517.75, 517.75, 517.75, 517.75, 560.12, 560.12, 560.12, 560.12, 560.12, 575.93, 575.93, 575.93, 575.93, 575.93, 580.77, 580.77, 580.77, 580.77, 580.77, 586.04, 586.04, 586.04, 586.04, 586.04, 595.78, 595.78, 595.78, 595.78, 595.78, 597.48, 597.48, 597.48, 597.48, 597.48, 608.13, 608.13, 608.13, 608.13, 608.13, 608.83, 608.83, 608.83, 608.83, 608.83, 623.06, 623.06, 623.06, 623.06, 623.06, 639.61, 639.61, 639.61, 639.61, 639.61, 658.27, 658.27, 658.27, 658.27, 658.27, 688.53, 688.53, 688.53, 688.53, 688.53, 653.29, 653.29, 653.29, 653.29, 653.29, 657.92, 657.92, 657.92, 657.92, 657.92, 657.34, 657.34, 657.34, 657.34, 657.34, 670.15, 670.15, 670.15, 670.15, 670.15, 673.02, 673.02, 673.02, 673.02, 673.02, 673.98, 673.98, 673.98, 673.98, 673.98, 673.06, 673.06, 673.06, 673.06, 673.06, 677.99, 677.99, 677.99, 677.99, 677.99, 681.33, 681.33, 681.33, 681.33, 681.33, 698.91, 698.91, 698.91, 698.91, 698.91, 698.29, 698.29, 698.29, 698.29, 698.29, 700.09, 700.09, 700.09, 700.09, 700.09, 701.44, 701.44, 701.44, 701.44, 701.44, 710.78, 710.78, 710.78, 710.78, 710.78, 708.49, 708.49, 708.49, 708.49, 708.49, 701.23, 701.23, 701.23, 701.23, 701.23, 698.32, 698.32, 698.32, 698.32, 698.32, 698.71, 698.71, 698.71, 698.71, 698.71, 698.51, 698.51, 698.51, 698.51, 698.51, 696.25, 696.25, 696.25, 696.25, 696.25, 699.24, 699.24, 699.24, 699.24, 699.24, 709.71, 709.71, 709.71, 709.71, 709.71, 714.14, 714.14, 714.14, 714.14, 714.14, 714.36, 714.36, 714.36, 714.36, 714.36, 719.03, 719.03, 719.03, 719.03, 719.03, 717.46, 717.46, 717.46, 717.46, 717.46, 717.29, 717.29, 717.29, 717.29, 717.29, 718.29, 718.29, 718.29, 718.29, 718.29, 716.27, 716.27, 716.27, 716.27, 716.27, 710.81, 710.81, 710.81, 710.81, 710.81, 698.63, 698.63, 698.63, 698.63, 698.63, 697.89, 697.89, 697.89, 697.89, 697.89, 697.41, 697.41, 697.41, 697.41, 697.41, 694.88, 694.88, 694.88, 694.88, 694.88, 691.54, 691.54, 691.54, 691.54, 691.54, 695.51, 695.51, 695.51, 695.51, 695.51, 698.07, 698.07, 698.07, 698.07, 698.07, 698.0, 698.0, 698.0, 698.0, 698.0, 697.73, 697.73, 697.73, 697.73, 697.73, 701.91, 701.91, 701.91, 701.91, 701.91, 704.73, 704.73, 704.73, 704.73, 704.73, 705.11, 705.11, 705.11, 705.11, 705.11, 706.42, 706.42, 706.42, 706.42, 706.42, 705.59, 705.59, 705.59, 705.59, 705.59, 708.55, 708.55, 708.55, 708.55, 708.55, 708.54, 708.54, 708.54, 708.54, 708.54]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.1, 30.1, 30.1, 30.1, 30.1, 25.89, 25.89, 25.89, 25.89, 25.89, 17.89, 17.89, 17.89, 17.89, 17.89, 18.98, 18.98, 18.98, 18.98, 18.98, 19.74, 19.74, 19.74, 19.74, 19.74, 20.24, 20.24, 20.24, 20.24, 20.24, 20.75, 20.75, 20.75, 20.75, 20.75, 20.89, 20.89, 20.89, 20.89, 20.89, 20.9, 20.9, 20.9, 20.9, 20.9, 20.85, 20.85, 20.85, 20.85, 20.85, 20.69, 20.69, 20.69, 20.69, 20.69, 20.54, 20.54, 20.54, 20.54, 20.54, 20.17, 20.17, 20.17, 20.17, 20.17, 19.81, 19.81, 19.81, 19.81, 19.81, 19.47, 19.47, 19.47, 19.47, 19.47, 18.7, 18.7, 18.7, 18.7, 18.7, 18.67, 18.67, 18.67, 18.67, 18.67, 18.8, 18.8, 18.8, 18.8, 18.8, 19.05, 19.05, 19.05, 19.05, 19.05, 18.89, 18.89, 18.89, 18.89, 18.89, 18.82, 18.82, 18.82, 18.82, 18.82, 18.73, 18.73, 18.73, 18.73, 18.73, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.52, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.57, 18.63, 18.63, 18.63, 18.63, 18.63, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.74, 18.66, 18.66, 18.66, 18.66, 18.66, 18.53, 18.53, 18.53, 18.53, 18.53, 18.37, 18.37, 18.37, 18.37, 18.37, 18.39, 18.39, 18.39, 18.39, 18.39, 18.47, 18.47, 18.47, 18.47, 18.47, 18.52, 18.52, 18.52, 18.52, 18.52, 18.65, 18.65, 18.65, 18.65, 18.65, 18.68, 18.68, 18.68, 18.68, 18.68, 18.63, 18.63, 18.63, 18.63, 18.63, 18.65, 18.65, 18.65, 18.65, 18.65, 18.53, 18.53, 18.53, 18.53, 18.53, 18.48, 18.48, 18.48, 18.48, 18.48, 18.51, 18.51, 18.51, 18.51, 18.51, 18.56, 18.56, 18.56, 18.56, 18.56, 18.59, 18.59, 18.59, 18.59, 18.59, 18.52, 18.52, 18.52, 18.52, 18.52, 18.42, 18.42, 18.42, 18.42, 18.42, 18.41, 18.41, 18.41, 18.41, 18.41, 18.4, 18.4, 18.4, 18.4, 18.4, 18.12, 18.12, 18.12, 18.12, 18.12, 17.82, 17.82, 17.82, 17.82, 17.82, 17.55, 17.55, 17.55, 17.55, 17.55, 17.53, 17.53, 17.53, 17.53, 17.53, 17.56, 17.56, 17.56, 17.56, 17.56, 17.64, 17.64, 17.64, 17.64, 17.64, 17.65, 17.65, 17.65, 17.65, 17.65, 17.7, 17.7, 17.7, 17.7, 17.7, 17.73, 17.73, 17.73, 17.73, 17.73, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.71, 17.67, 17.67, 17.67, 17.67, 17.67, 17.65, 17.65, 17.65, 17.65, 17.65]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.08, 0.08, 0.08, 0.08, 0.08, 0.28, 0.28, 0.28, 0.28, 0.28, 0.09, 0.09, 0.09, 0.09, 0.09, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.27, 0.27, 0.27, 0.27, 0.27, 0.3, 0.3, 0.3, 0.3, 0.3, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.32, 0.34, 0.34, 0.34, 0.34, 0.34, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.38, 0.38, 0.38, 0.38, 0.38, 0.42, 0.42, 0.42, 0.42, 0.42, 0.51, 0.51, 0.51, 0.51, 0.51, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.53, 0.42, 0.42, 0.42, 0.42, 0.42, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.22, 0.22, 0.22, 0.22, 0.22, 0.26, 0.26, 0.26, 0.26, 0.26, 0.28, 0.28, 0.28, 0.28, 0.28, 0.25, 0.25, 0.25, 0.25, 0.25, 0.16, 0.16, 0.16, 0.16, 0.16]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 489 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1712213110 --> 1712213738
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0]
                    
Loading

@HanClinto HanClinto force-pushed the feature_grammar_char_any branch from da3dc77 to 9a3acbb Compare April 4, 2024 06:40
@HanClinto HanClinto changed the title Added support for . (any characer) token in grammar engine. Added support for . (any character) token in grammar engine. Apr 4, 2024
@ggerganov
Copy link
Owner

I didn't add this token to the grammar tests, because frankly I haven't really been able to wrap my head around them. I would still like to eventually get around to writing some end-to-end / integration tests for the grammar engine that are a bit easier to grok and extend, but unless otherwise requested, I'll leave that exercise for another PR.

Matching any character seems it could be a useful addition, but I agree it would be better to first focus on improving grammar tests (and potentially performance). We can revisit this addition at a bit later point

@HanClinto
Copy link
Collaborator Author

HanClinto commented Apr 4, 2024

but I agree it would be better to first focus on improving grammar tests (and potentially performance). We can revisit this addition at a bit later point

Sounds great! That's where I'll turn my attention next -- I've done some profiling, and have some ideas in the works for how to improve the grammar sampler. Next step will be to add profiling to the grammar engine (I need to investigate the current state of benchmarks that include grammars). Meanwhile, this PR can stay here and we can return to it whenever we feel like it.

Thank you!

@ggerganov
Copy link
Owner

Awesome! The grammar functionality is a great feature and would be nice to get some extra attention. I sent you a collaborator invite, if you feel like helping out (no pressure if you don't have time / resources, this is mainly a token of appreciation at this point)

@HanClinto
Copy link
Collaborator Author

Awesome! The grammar functionality is a great feature and would be nice to get some extra attention. I sent you a collaborator invite, if you feel like helping out (no pressure if you don't have time / resources, this is mainly a token of appreciation at this point)

Wow, I am honored -- thank you very much!! I will do my best to not abuse the privilege.

I agree about the grammar functionality being very powerful. I think that verifiable correctness is going to be one of the big ways that local LLMs can really gain some usefulness -- I started doing some experiments a couple of months ago with text-to-SQL generation and I think that using grammars to ensure syntactically-correct SQL queries (even tuned for a person's specific database schema) offer a lot of potential to expand usefulness of LLMs. That's what started me down this whole road of digging into grammars on llama.cpp, and I'm excited to see what the future holds for it.

Thanks for everything!

@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix enhancement New feature or request labels May 10, 2024
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HanClinto Feel free to merge this if it is ready

@HanClinto
Copy link
Collaborator Author

@HanClinto Feel free to merge this if it is ready

Thank you! I'm much more familiar with the grammar engine now than I was when I first wrote this, so I'd like to try to look it all over again with fresh eyes.

Overall I'm feeling much better about making changes like this to the grammar engine now that we have integration test coverage.

@ochafik not sure what your availability is these days, but wouldn't mind your critique at some point as well.

@HanClinto
Copy link
Collaborator Author

HanClinto commented May 10, 2024

TODO before merge:

  • Add "." symbol to integration tests

Copy link
Contributor

github-actions bot commented May 11, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 545 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8604.06ms p(95)=21214.22ms fails=, finish reason: stop=488 truncated=57
  • Prompt processing (pp): avg=99.82tk/s p(95)=411.45tk/s
  • Token generation (tg): avg=32.22tk/s p(95)=46.44tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=feature_grammar_char_any commit=c1b89b83815248a986b9ec906a8d30dfc013b3e6

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 576.98, 576.98, 576.98, 576.98, 576.98, 839.89, 839.89, 839.89, 839.89, 839.89, 824.87, 824.87, 824.87, 824.87, 824.87, 873.16, 873.16, 873.16, 873.16, 873.16, 928.15, 928.15, 928.15, 928.15, 928.15, 920.62, 920.62, 920.62, 920.62, 920.62, 929.21, 929.21, 929.21, 929.21, 929.21, 939.5, 939.5, 939.5, 939.5, 939.5, 932.63, 932.63, 932.63, 932.63, 932.63, 943.06, 943.06, 943.06, 943.06, 943.06, 957.1, 957.1, 957.1, 957.1, 957.1, 934.95, 934.95, 934.95, 934.95, 934.95, 939.12, 939.12, 939.12, 939.12, 939.12, 858.97, 858.97, 858.97, 858.97, 858.97, 842.92, 842.92, 842.92, 842.92, 842.92, 846.62, 846.62, 846.62, 846.62, 846.62, 832.45, 832.45, 832.45, 832.45, 832.45, 829.19, 829.19, 829.19, 829.19, 829.19, 820.97, 820.97, 820.97, 820.97, 820.97, 822.99, 822.99, 822.99, 822.99, 822.99, 832.46, 832.46, 832.46, 832.46, 832.46, 832.66, 832.66, 832.66, 832.66, 832.66, 836.47, 836.47, 836.47, 836.47, 836.47, 850.72, 850.72, 850.72, 850.72, 850.72, 852.12, 852.12, 852.12, 852.12, 852.12, 853.31, 853.31, 853.31, 853.31, 853.31, 845.53, 845.53, 845.53, 845.53, 845.53, 844.19, 844.19, 844.19, 844.19, 844.19, 843.93, 843.93, 843.93, 843.93, 843.93, 848.22, 848.22, 848.22, 848.22, 848.22, 849.66, 849.66, 849.66, 849.66, 849.66, 847.52, 847.52, 847.52, 847.52, 847.52, 851.63, 851.63, 851.63, 851.63, 851.63, 863.86, 863.86, 863.86, 863.86, 863.86, 865.42, 865.42, 865.42, 865.42, 865.42, 867.95, 867.95, 867.95, 867.95, 867.95, 866.54, 866.54, 866.54, 866.54, 866.54, 865.89, 865.89, 865.89, 865.89, 865.89, 868.43, 868.43, 868.43, 868.43, 868.43, 870.28, 870.28, 870.28, 870.28, 870.28, 878.9, 878.9, 878.9, 878.9, 878.9, 884.83, 884.83, 884.83, 884.83, 884.83, 883.69, 883.69, 883.69, 883.69, 883.69, 882.52, 882.52, 882.52, 882.52, 882.52, 880.15, 880.15, 880.15, 880.15, 880.15, 883.33, 883.33, 883.33, 883.33, 883.33, 885.36, 885.36, 885.36, 885.36, 885.36, 884.23, 884.23, 884.23, 884.23, 884.23, 886.85, 886.85, 886.85, 886.85, 886.85, 888.83, 888.83, 888.83, 888.83, 888.83, 890.7, 890.7, 890.7, 890.7, 890.7, 891.68, 891.68, 891.68, 891.68, 891.68, 891.51, 891.51, 891.51, 891.51, 891.51, 892.79, 892.79, 892.79, 892.79, 892.79, 894.42, 894.42, 894.42, 894.42, 894.42, 895.71, 895.71, 895.71, 895.71, 895.71, 895.43, 895.43, 895.43, 895.43, 895.43, 896.4, 896.4, 896.4, 896.4, 896.4, 898.03, 898.03, 898.03, 898.03, 898.03, 899.61, 899.61, 899.61, 899.61, 899.61, 899.61, 899.61, 899.61]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 38.61, 38.61, 38.61, 38.61, 38.61, 30.22, 30.22, 30.22, 30.22, 30.22, 29.71, 29.71, 29.71, 29.71, 29.71, 32.5, 32.5, 32.5, 32.5, 32.5, 33.28, 33.28, 33.28, 33.28, 33.28, 34.1, 34.1, 34.1, 34.1, 34.1, 34.5, 34.5, 34.5, 34.5, 34.5, 34.57, 34.57, 34.57, 34.57, 34.57, 34.65, 34.65, 34.65, 34.65, 34.65, 34.29, 34.29, 34.29, 34.29, 34.29, 34.55, 34.55, 34.55, 34.55, 34.55, 34.28, 34.28, 34.28, 34.28, 34.28, 32.97, 32.97, 32.97, 32.97, 32.97, 32.74, 32.74, 32.74, 32.74, 32.74, 32.37, 32.37, 32.37, 32.37, 32.37, 31.74, 31.74, 31.74, 31.74, 31.74, 30.3, 30.3, 30.3, 30.3, 30.3, 30.31, 30.31, 30.31, 30.31, 30.31, 30.52, 30.52, 30.52, 30.52, 30.52, 30.23, 30.23, 30.23, 30.23, 30.23, 30.35, 30.35, 30.35, 30.35, 30.35, 30.38, 30.38, 30.38, 30.38, 30.38, 30.51, 30.51, 30.51, 30.51, 30.51, 30.59, 30.59, 30.59, 30.59, 30.59, 30.54, 30.54, 30.54, 30.54, 30.54, 30.69, 30.69, 30.69, 30.69, 30.69, 30.79, 30.79, 30.79, 30.79, 30.79, 30.66, 30.66, 30.66, 30.66, 30.66, 30.88, 30.88, 30.88, 30.88, 30.88, 31.1, 31.1, 31.1, 31.1, 31.1, 31.09, 31.09, 31.09, 31.09, 31.09, 31.31, 31.31, 31.31, 31.31, 31.31, 31.49, 31.49, 31.49, 31.49, 31.49, 31.3, 31.3, 31.3, 31.3, 31.3, 31.2, 31.2, 31.2, 31.2, 31.2, 31.14, 31.14, 31.14, 31.14, 31.14, 30.59, 30.59, 30.59, 30.59, 30.59, 30.65, 30.65, 30.65, 30.65, 30.65, 30.82, 30.82, 30.82, 30.82, 30.82, 30.85, 30.85, 30.85, 30.85, 30.85, 31.02, 31.02, 31.02, 31.02, 31.02, 30.99, 30.99, 30.99, 30.99, 30.99, 30.85, 30.85, 30.85, 30.85, 30.85, 30.53, 30.53, 30.53, 30.53, 30.53, 29.36, 29.36, 29.36, 29.36, 29.36, 29.04, 29.04, 29.04, 29.04, 29.04, 29.05, 29.05, 29.05, 29.05, 29.05, 29.08, 29.08, 29.08, 29.08, 29.08, 29.15, 29.15, 29.15, 29.15, 29.15, 29.22, 29.22, 29.22, 29.22, 29.22, 29.26, 29.26, 29.26, 29.26, 29.26, 29.28, 29.28, 29.28, 29.28, 29.28, 29.18, 29.18, 29.18, 29.18, 29.18, 29.12, 29.12, 29.12, 29.12, 29.12, 29.08, 29.08, 29.08, 29.08, 29.08, 29.2, 29.2, 29.2, 29.2, 29.2, 29.3, 29.3, 29.3, 29.3, 29.3, 29.51, 29.51, 29.51, 29.51, 29.51, 29.54, 29.54, 29.54, 29.54, 29.54, 29.6, 29.6, 29.6, 29.6, 29.6, 29.58, 29.58, 29.58]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18, 0.18, 0.18, 0.18, 0.18, 0.3, 0.3, 0.3, 0.3, 0.3, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.2, 0.2, 0.2, 0.2, 0.2, 0.38, 0.38, 0.38, 0.38, 0.38, 0.14, 0.14, 0.14, 0.14, 0.14, 0.33, 0.33, 0.33, 0.33, 0.33, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.35, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.31, 0.31, 0.31, 0.31, 0.31, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.14, 0.14, 0.14, 0.14, 0.14, 0.3, 0.3, 0.3, 0.3, 0.3, 0.11, 0.11, 0.11, 0.11, 0.11, 0.17, 0.17, 0.17, 0.17, 0.17, 0.3, 0.3, 0.3, 0.3, 0.3, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.11, 0.11, 0.11, 0.11, 0.11, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.37, 0.37, 0.37, 0.37, 0.37, 0.24, 0.24, 0.24, 0.24, 0.24, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.11, 0.26, 0.26, 0.26, 0.26, 0.26, 0.47, 0.47, 0.47, 0.47, 0.47, 0.59, 0.59, 0.59, 0.59, 0.59, 0.61, 0.61, 0.61, 0.61, 0.61, 0.44, 0.44, 0.44, 0.44, 0.44, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.24, 0.19, 0.19, 0.19, 0.19, 0.19, 0.1, 0.1, 0.1, 0.1, 0.1, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.13, 0.13, 0.13, 0.13, 0.13, 0.27, 0.27, 0.27, 0.27, 0.27, 0.33, 0.33, 0.33]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 545 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717680788 --> 1717681422
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0]
                    
Loading

@mofosyne mofosyne requested a review from ochafik May 11, 2024 01:57
@mofosyne mofosyne added the help wanted Extra attention is needed label May 22, 2024
@mofosyne
Copy link
Collaborator

mofosyne commented May 22, 2024

Noting that it appears there is a general agreement to merge this PR, but just waiting on someone to add "." symbol to integration tests. (Monitoring via this filter )

@HanClinto HanClinto force-pushed the feature_grammar_char_any branch from e56761d to 774e9f5 Compare June 5, 2024 22:43
@github-actions github-actions bot added the testing Everything test related label Jun 5, 2024
@HanClinto
Copy link
Collaborator Author

Rebased on master, integration tests added in 9e30513 -- ready for final review and merge!

Copy link
Collaborator

@ochafik ochafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

tests/test-grammar-integration.cpp Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants