Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid using lookaheads to support PCRE in Caddy/Go #36

Open
1 task done
oprypkhantc opened this issue Dec 17, 2024 · 9 comments · May be fixed by #37
Open
1 task done

Avoid using lookaheads to support PCRE in Caddy/Go #36

oprypkhantc opened this issue Dec 17, 2024 · 9 comments · May be fixed by #37
Labels
enhancement New feature or request

Comments

@oprypkhantc
Copy link

Would you like to work on this feature?

  • Check this if you would like to implement a PR, we are more than happy to help you go through the process.

What problem are you trying to solve?

Hey.

Caddy (a web server, quite popular) is written with Go, which does not support lookaheads (and some other features) due to performance reasons. Unfortunately, there's no easy way to work this around.

The reason I care about this is because we're using Caddy's header_regexp() feature to match the User-Agent and redirect users to an "outdated browser" page, and this package uses a negative lookahead in two of it's regexp's:

https://github.com/TrigenSoftware/ua-regexes-lite/blob/main/index.js#L72
https://github.com/TrigenSoftware/ua-regexes-lite/blob/main/index.js#L186

I would like to remake those to not use lookaheads, but I don't get the point of them and tests for them do not use real user agents, so they are completely useless:

https://github.com/TrigenSoftware/ua-regexes-lite/blob/main/test/useragents.js#L95

Notice the Edg not Edge at the end of the user agent string; if I actually use the correct user agent with Edge, tests start to fail. So it seems like this regex has never worked properly anyway?

Describe the solution you'd like

Not use lookaheads, as that doesn't seem to be necessary.

Describe alternatives you've considered

Currently I'm doing this: sed -e 's#\.\*Safari\\/(?!\[\\d\.\]+ Edge\\/\[\\d\.\]+$)##g', which replaces the negative lookahead part. But of course I'd love to avoid these kind of hacks that can break at any moment.

Documentation, Adoption, Migration Strategy

No response

@oprypkhantc oprypkhantc added the enhancement New feature or request label Dec 17, 2024
@dangreen
Copy link
Member

@oprypkhantc Hi. Edge on Chromium has "Edg" in user agent

Снимок экрана 2024-12-17 в 20 14 40

@dangreen
Copy link
Member

@oprypkhantc Explore user agent strings examples you can here: https://explore.whatismybrowser.com/useragents/explore/software_name/edge/

@dangreen
Copy link
Member

dangreen commented Dec 17, 2024

@oprypkhantc

https://github.com/TrigenSoftware/ua-regexes-lite/blob/main/index.js#L72
https://github.com/TrigenSoftware/ua-regexes-lite/blob/main/index.js#L186

Early versions of Edge browser were based on own EdgeHTML engine, but user agent of these versions include "Chrome" substring. It was made to "support" websites what requires Chrome browser (or something like that), but in fact these engines are not compatible. So we need to differentiate user agents like this.

Same thing for Mobile IE (i guess from windows phone) and Android

I guess you are receiving this regexp from browserslist-useragent-regexp. If your browserslist query will not request Chrome <= 70 you will not receive regexp with lookahead

@oprypkhantc
Copy link
Author

Somehow I looked at modern user agents of Edge and saw "Edge" at the end of it, when obviously there was none. Sorry 🤦🏻

And yes, you're correct, I'm getting the regex from browserslist-useragent-regexp. Unfortunately I can't drop the chrome < 70 and never will, because it comes from a separate .browserslistrc.outdated we use to show "outdated browser" page to users, but only for browsers that are explicitly unsupported. E.g.:

chrome < 85
chromeandroid < 85
edge < 85
opera < 85
operamobile < 85
firefox < 76
firefoxandroid < 76
safari < 14.1
ios < 14.1
dead

What's also important is that edge < 85 is also there, so I guess I don't care about the EdgeHTML edge case? More insights here: browserslist/browserslist-useragent-regexp#1558

Maybe you have an insights on how to properly solve this?

@dangreen
Copy link
Member

dangreen commented Dec 17, 2024

@oprypkhantc We can try something like that:

/Chrom(ium|e)\/(\d+)\.(\d+)(\.(\d+)|)([\d.]+$|.*Safari\/[\d.]+($| [^E][^d][^g][^e]))/

but it risky because maybe there is some Chromium based browser that have some of this chars on that places

You can research Chromium <= 70 based browsers and their user agents to fit this workaround.

For example Brave doesn't add own name in ua, so it purely Chrome - this regexp should be ok
Opera adds, but no any chars in "OPR" matches "Edge"
But for example browser name "doge" will be not matched by this regexp

@oprypkhantc
Copy link
Author

Sorry, I'm still a bit confused by it all. Am I correct that old EdgeHTML browsers should NOT match chrome constraint? E.g. chrome < 70 should not match Edge 18? Because currently, chrome < 85 query does match old Edge:

image

What's even more strange is that edge < 85 does not match Edge 18:

image

Is this the expected behaviour? To me it seems rather strange and the opposite of what I'd expect 🤔

@dangreen
Copy link
Member

Am I correct that old EdgeHTML browsers should NOT match chrome constraint?

Yes.

Regexp on first screen is matched because you don't have end of line, you have new line char.

On second screen regex is not matched because of minor version mismatch. By default exact version from browserslist are embedded to regex. You can try --ignoreMinor option.

@oprypkhantc
Copy link
Author

Thanks. With multiline flag and --ignoreMinor it works as expected.

Chrom(ium|e)\/(([4-9]|[1-7]\d|8[01])\.0|8[34]\.0)(\.\d+|)([\d.]+$|.*(([^e]|[^g]e|[^d]ge|[^E]dge)\/[\d.]+$))

This is extremely ugly, but should work the same way the negative lookahead works. It matches regular Chrome, Opera, new Edg, but not Edge. Would you accept a PR changing this (and likely other regexes too) to not use lookaheads/lookbehinds, and a test to make sure they're not added in the future?

I understand that Go's limitations should not affect this project, but Caddy is quite nice other than this, and is even shipped by default in some cases (https://frankenphp.dev/), so it'd be nice to have this.

@dangreen
Copy link
Member

@oprypkhantc If it will work correctly - sure, i will accept

@oprypkhantc oprypkhantc linked a pull request Dec 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants