Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Varnish ban regex has very poor performance #495

Closed
bastnic opened this issue Jun 2, 2021 · 12 comments · Fixed by #498
Closed

[Performance] Varnish ban regex has very poor performance #495

bastnic opened this issue Jun 2, 2021 · 12 comments · Fixed by #498

Comments

@bastnic
Copy link

bastnic commented Jun 2, 2021

The regexp used in Varnish ban has very poor performance.

This is a backport of a fix already merged in API Platform, where it's a little easier to fix because we know in advance the pattern of the tags to ban. More specifically, on APIP, we know that a tag always began with a / and as they are iri, they are unique and not nested. We can then strip the first group (^|,).

This is not the case here where tags could be anything.
This is something I wanted to talk since a long time but didn't took the time to share because I don't have the correct generic solution. Maybe you will have one.


The current ban regexp is not optimize and killed my varnish server to 100% cpu.

Several reasons, but let's explain how that work. This is a question of

rq/s * ban/s * size of the cache tags header * complexity of the regexp

I've a high number of requests and bans per seconds. The response has a lot of tags.

But I only ban one resource at a time, so the regexp was not that complex at first look.

Varnish stores all ban in a (deduplicated) list. Each cached resource has a pointer to the last ban checked.
If you add bans, when you call a cached resources, it will check against all the new bans and store the last ones.

The number of regexp analysis was high, but I can change my traffic, so I can only adjust the header itself or the regexp.

I looked at the regexp first

((^|,)/pim/v1/variants($|,))|((^|,)/pim/v1/variants/e3615f0c-a957-4a85-88b9-8db30d5c5c02($|,))|((^|,)/pim/v1/models/bebdee7b-fb53-448b-b79e-1bcf0dbcf050($|,))

Step 1: checking the begining of the resources is not useful if tags are only handled by apip.

(/pim/v1/variants($|,))|(/pim/v1/variants/e3615f0c-a957-4a85-88b9-8db30d5c5c02($|,))|(/pim/v1/models/bebdee7b-fb53-448b-b79e-1bcf0dbcf050($|,))

Stepe 2: regrouping the end tag at the end is better.

(/pim/v1/variants|/pim/v1/variants/e3615f0c-a957-4a85-88b9-8db30d5c5c02|/pim/v1/models/bebdee7b-fb53-448b-b79e-1bcf0dbcf050)($|,)

Step 3: remove the prefix from the regexp

(/variants|/variants/e3615f0c-a957-4a85-88b9-8db30d5c5c02|/models/bebdee7b-fb53-448b-b79e-1bcf0dbcf050)($|,)

Step 4: remove the prefix from the source header

Bench:

subject mean diff
benchCurrent 1.612μs 7.91x
benchStep1 0.575μs 2.82x
benchStep2 0.504μs 2.47x
benchStep3 0.492μs 2.41x
benchStep4 0.204μs 1.00x

On my production, I published Step3 today at 12:15:

image

On another project, I will let you guess when the patch was released:
image

@dbu
Copy link
Contributor

dbu commented Jun 3, 2021

very interesting, thank you for the post and the explanations!

i am surprised that removing the ^ has a positive effect - i would have assumed it makes it much easier for varnish when it only needs to check the start and not everywhere.

a while ago, we introduced support for varnish xkey. when you do cache tagging, i recommend to look at https://foshttpcache.readthedocs.io/en/latest/varnish-configuration.html#tag-invalidation-using-xkey as this lets varnish semantically understand that we are tagging, and gets rid of all regex processing - i would expect that to be a lot more efficient.

still, i'd be happy for a pull request that improves the regex. as you say, in the general case we do not know what tags could look like. but at the very least when building the regex string we could put (^|,) at the start and ($|,) at the end, instead of wrap each tag with it. if i understand correctly, that should speed up things some. and we could make https://github.com/FriendsOfSymfony/FOSHttpCache/blob/master/src/ProxyClient/Varnish.php#L191-L195 an extension point where users like you can introduce custom logic when the tags are built with a known schema and allow optimizations like yours. (how are you currently doing it, did you replace/extend the varnish proxy client?)

@bastnic
Copy link
Author

bastnic commented Jun 3, 2021

Ykey is the newer and current version and should be used as Xkey is deprecated. Both VMODS are available in the varnish-plus package.

Xkey is a very bad implementation and i'm not that sure that's more efficient than an efficient regexp. And it's deprecated, so careful about pushing for it. At least regexp are generic and allows correct ban.

The replacement Ykey seems to not be available for free.

@bastnic
Copy link
Author

bastnic commented Jun 3, 2021

how are you currently doing it, did you replace/extend the varnish proxy client?

I have the problem on APIP projects that do not use FosHttpCache but a copied code, easier to replace and fix. Here it's a backport to let you know of the issue.

@dbu
Copy link
Contributor

dbu commented Jun 3, 2021

afaik xkey is still available for the OS version https://github.com/varnish/varnish-modules - i was not aware that it is deprecated, the varnish-modules readme does not mention that. but indeed https://docs.varnish-software.com/varnish-cache-plus/vmods/ykey/ lists several issues with xkey. i guess that means i want to move to ykey in our project - though it seems we have not been bitten by the xkey shortcomings.

I have the problem on APIP projects that do not use FosHttpCache but a copied code, easier to replace and fix.

could we improve something in FOSHttpCache to allow you to use it directly, or is it a policy that you fork dependencies? (for the vcl, the files are made so that they can be included from the repository directly, but i would expect people to need to touch them up and use their own copy. often varnish won't have file system access to where the php code is anyways)

@bastnic
Copy link
Author

bastnic commented Jun 3, 2021

could we improve something in FOSHttpCache to allow you to use it directly, or is it a policy that you fork dependencies?

I don't know, @dunglas?

But the point of this issue is of course to provide the best performance out of FOSHttpCache by default.

@dunglas
Copy link
Contributor

dunglas commented Jun 3, 2021

AFAIR I didn't copied the code of FOSHttpCache initially. But it may have changed since then! I'll check.

@dbu
Copy link
Contributor

dbu commented Jun 3, 2021

AFAIR I didn't copied the code of FOSHttpCache initially. But it may have changed since then! I'll check.

thanks. just to be clear: it is totally within the license rules that parts of that code can be copied into APIP code, i have no issue with that. but imho FOSHttpCache does a decent job of documenting and testing the code so it could be benefitial for all to reuse the code, if APIP uses a significant amount of the features that the library, or also the FOSHttpCacheBundle offer.

if there are a few extension points missing or such, i would be happy to help enable using it directly.

but if you only use a small part of the library funcationality, then copying a few classes is probably better for maintainability than pulling in all the other functionality.

@dunglas
Copy link
Contributor

dunglas commented Jun 3, 2021

I just checked, and we indeed now use a method copied from FosHttpCache (it wasn't the case in the beginning): https://github.com/api-platform/core/blame/main/src/HttpCache/VarnishPurger.php#L47

I was planning to remove the dependency to Guzzle in API Platform to switch to Symfony HttpClient, as we try to keep the number of vendors we depend on as low as possible, and to add support for xkey and maybe Cloudflare directly in API Platform. So maybe should we switch to FosHttpCache directly instead of reinventing the wheel. To be honest, the only thing I don't like with FosHttpCache is its dependency on php-http. In my opinion it introduces unneeded complexity and is hard to install.

@stof
Copy link
Member

stof commented Jun 16, 2021

Well, a new version of the bundle might use Symfony HttpClient. The reason why php-http is used is because the bundle predates symfony/http-client

@dbu
Copy link
Contributor

dbu commented Jun 18, 2021

@bastnic i looked at

$tagExpression = sprintf('(^|,)(%s)(,|$)', implode('|', $tagchunk));
again. i think either APIP copied a very old version from FOSHttpCache or changed the regex to be worse. unless the heat is making me confused, the FOSHttpCache code should put the start/end code only once at the very beginning/end of the regex and not repeat it for each tag. or am i missing something?

i don't think we can do the other optimization steps in the generic case.

whats left is the xkey vs ykey discussion. i am updating the documentation in #498. ok if we close this issue?

@bastnic
Copy link
Author

bastnic commented Jun 18, 2021

I think you're right, I got confused.

@bastnic bastnic closed this as completed Jun 18, 2021
@dbu
Copy link
Contributor

dbu commented Jun 18, 2021

no worries. thanks for the report and the discussions - made me realize we should mention ykey (and also that i should talk to our customer about switching to ykey)

rbayet added a commit to Elastic-Suite/gally that referenced this issue Feb 13, 2022
Includes now defunct API Platform Varnish configuration.
From https://github.com/api-platform/api-platform/blob/v2.5.7/api/docker/varnish/conf/default.vcl

Please note that is a BAN based purge (which is kinda meh ?)
Please see also
FriendsOfSymfony/FOSHttpCache#495
api-platform/core#1856
api-platform/api-platform#1947
rbayet added a commit to Elastic-Suite/gally that referenced this issue Feb 14, 2022
Includes now defunct API Platform Varnish configuration.
From https://github.com/api-platform/api-platform/blob/v2.5.7/api/docker/varnish/conf/default.vcl

Please note that is a BAN based purge (which is kinda meh ?)
Please see also
FriendsOfSymfony/FOSHttpCache#495
api-platform/core#1856
api-platform/api-platform#1947
PierreGauthier pushed a commit to Elastic-Suite/gally that referenced this issue Jan 12, 2023
Includes now defunct API Platform Varnish configuration.
From https://github.com/api-platform/api-platform/blob/v2.5.7/api/docker/varnish/conf/default.vcl

Please note that is a BAN based purge (which is kinda meh ?)
Please see also
FriendsOfSymfony/FOSHttpCache#495
api-platform/core#1856
api-platform/api-platform#1947
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants