-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More improvements to the DFA's inner loop. #205
Conversation
Absolutely insane improvements, considering the change mostly amounts to removing a single pointer dereference:
|
A PR for this is coming soon, but I have RE2 hooked up to the benchmark harness. Not too shabby if I say so myself:
|
And this is cool. Even though we still run more instructions than
We of course aren't always this much better than
Running |
There were two important changes: 1. self.at is used sparingly in favor of a local `let at` binding. This seems to convince the compiler to use a register. 2. Switch the transition table from a `Vec<Box<[StatePtr]>>` to a row-major `Vec<StatePtr>`. (2) is the juicier of the two. It makes more efficient use of the cache. In particular, a critical aspect is that a StatePtr points to the start of a row in the table, which enables indexing in the inner loop with a single ADD instruction. (i.e., `si + byte` instead of `si * #classes + byte`.)
Looks like the failure is spurious. |
There were two important changes:
let at
binding.This seems to convince the compiler to use a register.
Vec<Box<[StatePtr]>>
to arow-major
Vec<StatePtr>
.(2) is the juicier of the two. It makes more efficient use of the cache.
In particular, a critical aspect is that a StatePtr points to the start
of a row in the table, which enables indexing in the inner loop with a
single ADD instruction. (i.e.,
si + byte
instead ofsi * #classes + byte
.)