-
Notifications
You must be signed in to change notification settings - Fork 197
No support for lookbehind #571
Comments
I agree, lookaround is crucial! Using JS's RegExp is also weird considering Oniguruma is already used within Atom to parse grammar files as mentioned in this exchange. |
I as already mentioned on Discuss, using oniguruma for grammar made sense as it allows to reuse TextMate and SublimeText grammars out of the box, but AFAIK that's the only context where oniguruma is used. |
I don't know much about the subject, apart from lookbehinds not being part of JS regex. However, as a PHP dev who regular-ily (pun intended) regexes via PHP regex engine with lookbehinds, it sucks not being able to lookbehind in Atom's find and replace. Curious about this Oniguruma? |
This comment has been minimized.
This comment has been minimized.
@alexchandel I'm unfamiliar with Oniguruma. I'm all for more features, but is the syntax standard and testable on a site like regex101.com? |
@jesseleite It's what Atom uses for languages, and I've never had a problem testing any of those regexes on regex101.com. |
This comment has been minimized.
This comment has been minimized.
@bb-jakespirek It would if #698 happened |
Since #698 is closed as a duplicate, is this issue now for using Oniguruma for regexes? |
relevant: #570 Oniguruma would fix this too. |
@jesseleite https://regex101.com/r/dO0yG2/1 It depends on the regex engine, but yes it's almost de facto standard |
Would this also allow better handling of indentation errors (e.g. atom/language-python#22)? Which really shouldn't need special packages to solve (e.g. https://atom.io/packages/python-indent)... |
@lzkelley No, as find-and-replace has nothing to do with language grammars. Those are already using Oniguruma. |
Wait.. Atom already uses Oniguruma, but not for Find? Surely it should be a simple fix then? ;) |
Latest V8 releases have lookbehind available. It's behind the experimental JavaScript flag. |
@alexchandel your comment was deleted as a violation of the Atom Code of Conduct as it is insulting or derogatory. You may consider this an official warning. |
For those interested, @alexchandel was saying that while V8 has added experimental lookbehind support, it is still inferior to many alternative regex libraries in his opinion:
He then went on to suggest using an alternative like Oniguruma (and rather rudely implied that the atom team should be prioritising this issue higher — hence the deleted comment). |
I was trying to find the documentation of the Atom regex syntax (given that they are all different), and was pleased to find that it used Oniguruma (with the lookbehind I wanted), and then was gutted to discover this bug. It does seem rather extraordinary that Atom uses Oniguruma, but not for actual search-and-replace! |
ripgrep is great for a "find in files" solution, but it achieves that speed (at least in part) by not supporting certain functions (the other part is achieved by not looking in .gitignore'd files - which is also only relevant to "find in files". In particular, I think it doesn't support look-around (which is what I was after). I don't think raw speed is nearly so important for a complex search-and-replace (which is what I wanted this feature for). |
As a regex nerd looking for the ultimate editor, I was drawn to this conversation. Question- does Atom now support lookbehinds? Thanks in advance! |
@ai-danno : No it doesn't. The lack of lookbehind is what drew me to this thread in the first place. My comment was because switching to ripgrep would give us speed, but not lookaround (and I think that would be a poor tradeoff). |
@MartinBonner 👍 Thanks for the fast reply. It seems like it might bubble to the surface here in not too long, so I'll keep checking back. |
@MattSturgeon The obvious reason not to use Oniguruma for find and replace is that it involves work to stop using the Javascript regex and use Oniguruma instead. In particular, if the Javascript regex is going to acquire the requested features anyway, it may be better to just wait for that to happen. I'm not convinced that is a good enough reason, but I want to remind people it exists. |
@MattSturgeon Here's what I see happening if we just accept Oniguruma:
We're not going to play Whack-a-Mole. I want to see a list of features and performance targets. I want to get broad-based community consensus that the list of features and performance targets is acceptable. I want us to find a regex engine that meets those features and targets. Then when someone comes along saying that isn't good enough I can point to all the work we did and the rationales everyone had for the choices made. "We couldn't think of a reason not to at the time," isn't a convincing rationale. |
What is wrong with ripgrep then? grep is a pretty accepted standard usually? |
I have been monitoring this issue for a while and I'm sad of the state of the progress on this issue. I'm trying to make an honest feedback and I am trying to be constructive. I am myself a developer of an opensource framework, I know there is features wanted by some users that are not in the priority list for various reasons. While priorities are important let's not ignore features that are useful for the users. As a library developer several time I never imagined some usage that users had, and limitations in my library hit them, so I am enclined to listen about usage and help them if it is possible, sometime it can take a long time but we do it. Regarding the information this issue has, it is very limited it only have the enhancement tag, whatever the Atom team chose it should at least indicate the state of this issue be it delayed, stopped, re-scoped, low-priority, never, to be redefined... Feedback : I am a strong Regex users, and I am usually disappointed by tools that don't support powerful enough edition. Especialy when there is complicated pattern to find (wiht big data it matter). Regarding regex I usually expects the features found in perlRE. Now I usually code with JetBrains tools that are very very good as IDEs. This lack of search support drives me and my colleague to use JetBrains IDEs and as such away from Atom. So we have a solution that works for us, but it is disrupting, and instead of starting with Atom we avoid it. Some ideas :
|
Can you share real life example where javascript engine fail your need ? IMO big data is a separate problem. With large enough data, it become more practical to write your own small tool (or use specialized big data tools) than try to fit within the limitation of a general purpose text editor.
In general grep like tool lack the ability to do multiline matches. This is because speed-usability compromise is different. See for example BurntSushi/ripgrep#176 or atom/scandal#5. Ag in particular is free of this issue, but seems to be less compatible with window (?). |
Previously the Boyer-Moore-Horspool optimization gave up in the presence of a submatch. A submatch is where we record the current position so that we can go back to it, which is an essential part of the semantics of lookarounds (lookaheads and lookbehinds). This has been the case since Boyer-Moore-Horspool was implemented, but it was overly cautious. * For positive lookahead it is OK to use the patterns inside the lookahead to guide the BMS optimization. * For positive lookbehind we harmlessly fail to optimize when the patterns inside the lookbehind go backwards because TextNode::EatsAtLeast returns 0. * For negative lookarounds, the NegativeLookaroundChoiceNode::FillInBMInfo method (in jsregexp.h) knows to only look at the following pattern. This is in response to disappointing lookbehind performance in Atom. See atom/find-and-replace#571 R=yangguo@chromium.org BUG= Review-Url: https://codereview.chromium.org/2777583003 Cr-Commit-Position: refs/heads/master@{#44139}
My performance fix for lookbehinds (and lookaheads) in V8 has landed. Of course it will take some time to percolate through to Atom. |
@jeancroy Sorry for delayed answer.
When dealing with a lot of heterogenous data, it is useful to inspect data. There are powerful tool on the comand line. But when you have to discover the data, and something is wrong or you want to just look at the structure of the data, I use quite often regular expressions, and more than rarely I use (positive / negative) look-ahead or look-behind expressions to identify specific parts of these files. Ah yes your are correct it seems complicated to have ag on windows. There's yet another alternative to ag, it's written in go and called the platinium searcher (pt) : https://github.com/monochromegane/the_platinum_searcher |
I tried the More on topic, does anyone know why lookbehind didn't land in ES6? That seems like the root problem. Tried googling but didn't find much discussion. |
If it spins forever you may have written a regexp that takes forever. This is quite easy to do. Or you may have hit a bug. Can you share the regexp in question? There are people in this very thread who have tried the flag and it worked for them. As for why it was not part of ES6, this is because there was not agreement on what form it should take. Perl-style fixed width lookbehinds had been proposed and I was not very interested in that happening. In the end I helped push the current full lookbehind proposal through so that we would not end up stuck with the fixed width version which has caused much gnashing of teeth in perl. To get something on the TC39 standard you need a champion who wants to write the spec and argue for it in committee, and these days you also need to actually implement it. Sometimes this takes time. How do people feel about inline flags in regexps? eg |
I'm not sure I'd have much use for it. But if a single syntax is adopted I like the modifier on non-capturing group syntax. |
Is there any update on this? I found, that the following regular expression also doesn't work in Find in Projects: It says "Invalid regular expression |
In the interest of producing a concrete requirement, I (like many) require lookbehind, and in particular I want them to work for searches of the whole project. I would also point out that this was the requirement expressed in the original title of this issue: "positive and negative lookbehind (lookaround)." It only became a vague "better" due to a later edit. Using Atom 1.25.0, with Over two years have passed since the announcement that lookbehind support was coming to V8, and we still don't have a way to reliably use lookbehind to search a whole project. Therefore, I do not think waiting for V8 to solve the problem is an adequate solution. |
Sorry for the subjective title in the first place! Still waiting on lookaround in Atom... |
Is this really the regexp you used to test? If searching 13000 lines takes more than an hour then there's no way that's a V8 regexp performance issue. You probably need to file a new bug. Any way to work out which version of V8 this is? I just tested this on V8 version 6.2 and the two regexps /(?<!})foobar\b/ and /foobar\b/ are exactly the same speed, about 14 million lines per second. There is no reason for Atom to delay switching this on, that I can see. The proposal is finished: https://github.com/tc39/proposals/blob/master/finished-proposals.md |
Atom has had lookahead since forever, so the new title is not very accurate. |
I actually used a very similar regex with another word in place of foobar. However, testing As for the version of V8, I see that the Atom 1.25.0 release notes report that Electron was upgraded to 1.7.11, and the Electron 1.7.0 release notes report that V8 was upgraded to 5.8.283.38. |
V8 version 5.8 is now over a year old, from March 20 2017: https://v8project.blogspot.dk/2017/03/v8-release-58.html Unfortunately the performance fix was done on March 28 2017, so it has not got through to Atom yet. Nevertheless there must be a different issue here, since the improvement was only by a factor of 2-3, and you are seeing a factor of at least 1 million slowdown. |
In the meantime I still get invalid regular expression on lookbehind syntax... Were there any plans to fix this issue? |
Right, is there any traction on this in the release version? I skimmed the thread, but I'd prefer to have negative lookbehind with poor performance than not at all. |
Implementing JavaScript's native RegExp engine for a find/replace tool in a text editor seems like a dramatic oversight.
As find-and-replace leans on JS's RegExp, of course the following find expressions is invalid
Errors
This module is unusable without such features
The text was updated successfully, but these errors were encountered: