Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a persistent vector instead of an Rc<[..]> #2057

Merged
merged 18 commits into from
Oct 16, 2024
Merged

Use a persistent vector instead of an Rc<[..]> #2057

merged 18 commits into from
Oct 16, 2024

Conversation

jneem
Copy link
Member

@jneem jneem commented Sep 27, 2024

This is an experiment regarding array performance. Our current array representation (as essentially a Rc<[RichTerm]>) is problematic because it makes many common operations unnecessarily quadratic. This PR replaces it with a rpds::Vector<RichTerm> in reverse order, and the initial benchmarks look promising.

In more detail, the current arrays have a few performance characteristics that we might like to keep:

  • constant-time random access
  • constant-time slicing
  • some amount of memory sharing
    The main problem is that there's no constant-time "cons" operation, and the concatenation operator xs @ ys is O(xs.len() + ys.len()). This makes many functional-style list functions (like the stdlib implementations of reverse and filter) quadratic in the length of their input.

The Vector implementation in the rpds crate is a "persistent vector" aka "bitmapped vector trie", which offers persistence/sharing, fast random access, and fast appends. We can do the same slicing trick that we're current using for Rc<[RichTerm]> to also add fast slicing. Thanks to fast appends, we can do concatenation xs @ ys in O(ys.len()) time (provided that there are no contracts that need to be applied to xs; I'm also ignoring logarithmic terms). This is backwards from the more common concatenation pattern in functional languages, so we store arrays backwards in order to get time O(xs.len()) instead. (We could achieve the minimum of the two by storing an array as two Vectors, a backwards one followed by a forwards one.)

There are a few ways in which rpds::Vector isn't a perfect fit:

  • it insists on wrapping the vector elements in Arc or Rc, which we don't want because RichTerm already has a shared pointer
  • it doesn't support many of the optimizations we had for arrays with a single reference
  • there's no fast iteration over slices (it's linear in the number of elements you skip at the beginning)

Despite these, this PR gives a 35% improvement in random normal, and a few other improvements between 10 and 20%. I'd like to also try benchmarking im and/or imbl.

@github-actions github-actions bot temporarily deployed to pull request September 27, 2024 15:49 Inactive
Copy link
Contributor

github-actions bot commented Sep 27, 2024

🐰 Bencher Report

Branch2057/merge
Testbedubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!

Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results
BenchmarkLatencynanoseconds (ns)
fibonacci 10📈 view plot
⚠️ NO THRESHOLD
492,610.00
foldl arrays 50📈 view plot
⚠️ NO THRESHOLD
1,742,300.00
foldl arrays 500📈 view plot
⚠️ NO THRESHOLD
6,623,400.00
foldr strings 50📈 view plot
⚠️ NO THRESHOLD
7,128,100.00
foldr strings 500📈 view plot
⚠️ NO THRESHOLD
61,214,000.00
generate normal 250📈 view plot
⚠️ NO THRESHOLD
45,508,000.00
generate normal 50📈 view plot
⚠️ NO THRESHOLD
2,025,400.00
generate normal unchecked 1000📈 view plot
⚠️ NO THRESHOLD
3,432,500.00
generate normal unchecked 200📈 view plot
⚠️ NO THRESHOLD
759,960.00
pidigits 100📈 view plot
⚠️ NO THRESHOLD
3,170,700.00
pipe normal 20📈 view plot
⚠️ NO THRESHOLD
1,514,300.00
pipe normal 200📈 view plot
⚠️ NO THRESHOLD
9,980,000.00
product 30📈 view plot
⚠️ NO THRESHOLD
834,630.00
scalar 10📈 view plot
⚠️ NO THRESHOLD
1,545,100.00
sum 30📈 view plot
⚠️ NO THRESHOLD
826,770.00
🐰 View full continuous benchmarking report in Bencher

@yannham
Copy link
Member

yannham commented Sep 27, 2024

I'll see what it gives on the private benchmark

@github-actions github-actions bot temporarily deployed to pull request September 27, 2024 16:25 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 27, 2024 16:50 Inactive
@jneem
Copy link
Member Author

jneem commented Sep 30, 2024

I tried out imbl, but the performance is worse than rpds.

@jneem
Copy link
Member Author

jneem commented Oct 10, 2024

The current version uses a custom re-implementation of persistent vectors, and it seems to be a performance win across the board.

@jneem
Copy link
Member Author

jneem commented Oct 15, 2024

Github CI agrees with my local benchmarking: this gives modest gains in general, and big gains whenever quadratic array behavior is the bottleneck.

I don't see a nice UI for comparing results to master, but here is the report for this PR and here is the report for master.

@jneem jneem marked this pull request as ready for review October 15, 2024 03:00
@jneem jneem requested a review from yannham October 15, 2024 03:00
@jneem jneem changed the title experiment with rpds vectors Use a persistent vector instead of an Rc<[..]> Oct 15, 2024
@yannham
Copy link
Member

yannham commented Oct 15, 2024

@jneem Have you tried on the private bench?

@jneem
Copy link
Member Author

jneem commented Oct 15, 2024

Yes, I forgot to mention that. Performance is basically identical on all three sizes.

Copy link
Member

@yannham yannham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be nice to add a description to the vector crate.

I'm not too intimate with bitmapped vector tries, so I can't say I'm 100% sure that the whole implementation is flawless, but the general approach looks sane, the testing is also solid, and it's been thoroughly benchmarked.

core/src/eval/operation.rs Show resolved Hide resolved
Comment on lines 178 to 186
let new_ts = ts
.iter()
.cloned()
.map(|mut t| {
t.collect_free_vars(free_vars);
t
})
.collect();
*ts = new_ts;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my pet peeve, but I feel like this should be an imperative for, as we're just walking a structure and applying a mutation. Or is it that you just don't want to bother implementing iter_mut()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I went ahead and did mutable iteration. It has a bit more copy-paste from the other iterators, unfortunately, but maybe that will be more motivation to figure out a generic version...

//! [`Vector`] is a persistent vector (also known as a "bitmapped vector trie")
//! with cheap clones and efficient copy-on-write modifications. [`Slice`]
//! backs the implementation of arrays in Nickel. It's basically a [`Vector`]
//! with support for slicing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you followed a particular paper or source to implement them? Or took inspiration from another crate? If yes, it could be good to link it there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just https://hypirion.com/musings/understanding-persistent-vector-pt-1, which I linked a little later. I looked at rpds to see what choices they were making, but I didn't really imitate their implementation otherwise.

vector/src/lib.rs Outdated Show resolved Hide resolved
vector/src/slice.rs Show resolved Hide resolved
vector/src/vector.rs Outdated Show resolved Hide resolved
vector/src/vector.rs Outdated Show resolved Hide resolved
vector/src/vector.rs Show resolved Hide resolved
}
}

/// [`Vector`] is a persistent vector (also known as a "bitmapped vector trie").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I feel like this should be the module's documentation, and not Vector.

@yannham
Copy link
Member

yannham commented Oct 16, 2024

By the way, I had another side question: could this representation take advantage of the in-place modification when a value is 1-RC ? I think the answer is yes, from what I remember reading Clojure's persistent array blog post, but just to make sure.

@jneem
Copy link
Member Author

jneem commented Oct 16, 2024

could this representation take advantage of the in-place modification when a value is 1-RC ?

Yep, that should be the case already. We use Rc::make_mut for all the modifications, so it should have the in-place behavior whenever possible (including for subtrees -- if the root tree is shared but some subtree is uniquely owned, the root block will be copied but the uniquely owned part will be mutated). I'll add it to the module docs.

@jneem jneem enabled auto-merge October 16, 2024 09:03
@jneem jneem added this pull request to the merge queue Oct 16, 2024
Merged via the queue into master with commit 3506821 Oct 16, 2024
4 of 5 checks passed
@jneem jneem deleted the array-perf branch October 16, 2024 09:12
@yannham
Copy link
Member

yannham commented Oct 16, 2024

The last commit fails some test on Windows only it seems (but I think there is property-based testing, so the Windows part might be a red herring and it's just that some random path leading to the panic):

thread 'array_mutations' panicked at vector\tests\arbtest.rs:85:27:
attempt to calculate the remainder with a divisor of zero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants