-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement contract deduplication optimization #1631
Conversation
/// Push a pending contract to a vector of contracts if the contract to add isn't already | ||
/// present in the vector, according to the notion of contract equality defined in | ||
/// [crate::typecheck::eq]. | ||
pub fn push_dedup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is dwarfed by the cost of evaluating the contracts, but will we ever need to worry about the fact that this will be quadratic if we dedup a lot of (distinct) contracts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed before that it might be sufficient to just check the last contract applied, since most of the time duplicates come together. Is that indeed how it panned out in the real-world case we tested?
If it is a duplicate, we're likely to see that immediately since we do check the last applied contract first, but if it's not a duplicate we have to iterate through all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, isn't it problematic to dedup in this way?
foo | PosNumber | Number | PosNumber
is different than
foo | PosNumber | Number
e.g. the latter can't get passed to a function that takes a PosNumber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is discussed in the PR description:
Add push_dedup method on RuntimeContract, which conditionally pushes a contract to an existing list of pending contracts, given that it's not already in the list. Theoretically, this makes the complexity of merging two pending contract lists size(left)*size(right). In practice, we expect pending contracts list to be very small (< 10), and in fact the deduplication optimization typically participate in keeping the size small.
At least, that's the assumption. It could be interesting to add a bit of infrastructure to enable different level of optimizations, such that we can continue to check that the assumption is verified (at least it is on the codebase I'm experimenting on).
@Radvendii : deduplication currently only happens at runtime, for things that are merged together. That is, it doesn't impact typechecking at all (there's no dedup at this pointà), and if you write explicitly a duplicated contract annotation, such as {foo | Number | Number}
, it will not be deduplicated (after all, that's your intent). Deduplication would happen for example when doing {foo | Number} & {foo | Number} & {foo = 1}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed before that it might be sufficient to just check the last contract applied, since most of the time duplicates come together. Is that indeed how it panned out in the real-world case we tested?
I think in the real world case we have, most contracts are in fact singleton, so it probably doesn't change anything. In practice I'm not sure about this assumption of order though, because merge order is left unspecified (who knows, we might decide to reorder merges at some point for other optimizations - it's assumed to be commutative and idempotent), so I'm a bit wary of introducing any effect that depends on the order, unless we have practical clues that doing what we're doing now is actually an issue in practice (in that we exhibit quadratic behavior AND it has a noticeable impact: in practice, I'm yet to see a value with more than 3 distinct contracts applied, linear in 3 elements is not really an issue I suspect)
/// Push a pending contract to a vector of contracts if the contract to add isn't already | ||
/// present in the vector, according to the notion of contract equality defined in | ||
/// [crate::typecheck::eq]. | ||
pub fn push_dedup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed before that it might be sufficient to just check the last contract applied, since most of the time duplicates come together. Is that indeed how it panned out in the real-world case we tested?
If it is a duplicate, we're likely to see that immediately since we do check the last applied contract first, but if it's not a duplicate we have to iterate through all of them.
/// **Warning**: the contract equality check used in this function behaves like syntactic | ||
/// equality, and doesn't take the environment into account. It's unsound for execution (it | ||
/// could equate contracts that are actually totally distinct), but we use it only to trim | ||
/// accumulated contracts before pretty-printing. Do not use prior to any form of evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be sound because all the contracts are being applied in the same environment? There's no way for foo | Foo | Foo
to sneak in a shadow of for Foo
in the second contract application without being different syntactically.
If it's not sound, it seems problematic for pretty-printing as well. We've run into issues when the pretty-printer produces output that doesn't get parsed back the same way, right?
/// Push a pending contract to a vector of contracts if the contract to add isn't already | ||
/// present in the vector, according to the notion of contract equality defined in | ||
/// [crate::typecheck::eq]. | ||
pub fn push_dedup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, isn't it problematic to dedup in this way?
foo | PosNumber | Number | PosNumber
is different than
foo | PosNumber | Number
e.g. the latter can't get passed to a function that takes a PosNumber
#[derive(Clone, Copy)] | ||
pub struct EvalEnvsRef<'a> { | ||
pub eval_env: &'a eval::Environment, | ||
pub initial_env: &'a eval::Environment, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the issue here is that we can't construct a &EvalEnvsOwned
from two &Environment
s. Do I have that right? Is this a common pattern in rust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly: what I want is not a reference to a tuple but a tuple of references instead. I don't know if it's pattern to define a companion reference type and call it Ref
like that, but at least it seemed natural.
The previous attempt wasn't fruitful, because we don't pass the initial environment to the contract equality checker. This commit thread the initial environment through the virtual machine and various functions to make it available to the contract deduplicator
Use a special contract deduplication to avoid pretty printing many many identical type annotations when evaluating. The actual contract applications were already elided as far as evaluation is concerned, but contracts annotations were still accumulated without deduplication.
Co-authored-by: Taeer Bar-Yam <Radvendii@users.noreply.github.com>
Co-authored-by: jneem <joeneeman@gmail.com>
Co-authored-by: Viktor Kleen <viktor.kleen@tweag.io>
Fixes #1646. Partially fixes #1622. Related #1484.
Implement the contract deduplication optimization, which avoid applying many times the same contract to a field value. This optimization requires that contracts are idempotent, or it could change the meaning of programs.
Content
push_dedup
method onRuntimeContract
, which conditionally pushes a contract to an existing list of pending contracts, given that it's not already in the list. Theoretically, this makes the complexity of merging two pending contract listssize(left)*size(right)
. In practice, we expect pending contracts list to be very small (< 10), and in fact the deduplication optimization typically participate in keeping the size small.combine_dedup
method to combine contract annotation and deduplicate them based on an unsound, "syntax equality" like check. This is to avoid accumulating and printing many copies of the same initial contract when showing the result of evaluation.TermEnvironment
trait.query
subcommand as well.Results
On the codebase mentioned in #1622, this got the runtime down by roughly 50%, equivalently across
nickel export
or the normal evaluation.Future work