Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce the number of calls to operator< made by lower_bound and upper… #2882

Merged
merged 9 commits into from
May 4, 2020

Conversation

sears
Copy link
Contributor

@sears sears commented Mar 30, 2020

…_bound #2877

@sears sears added in progress Use this for tasks you are actively working on and removed in progress Use this for tasks you are actively working on labels Mar 30, 2020
}

template<class T, class X>
void upper_bound(const Reference<PTree<T>>& p, Version at, const X& x, std::vector<const PTree<T>*>& f){
void lower_bound(const Reference<PTree<T>>& p, Version at, const X& x, std::vector<const PTree<T>*>& f) {
std::vector<bool> lessThan;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I follow, there's an implicit invariant that lessThan.size() == f.size(). Can we assert that f.size() == 0 here? We might also consider changing f to std::vector<std::pair<const PTree<T>*, bool>> then. Not sure which would perform better, but I think the vector of pairs closer matches the intent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the assert. f is actually the thing this method returns, and I didn't want to change the signature.

@@ -114,30 +114,51 @@ namespace PTreeImpl {
return contains(p->child(!less, at), at, x);
}

// TODO: Remove the number of invocations of operator<, and replace with something closer to memcmp.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool operator<(const StringRef&, const StringRef&) is implemented using memcmp already, is that not close enough? Is the goal to not lose the extra information memcmp provides? Then we wouldn't need to call operator< twice to confirm x == p->data below.

Btw #2875 seems related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to avoid losing the extra information. I'm working on a second PR that adds a compare operator to everything that's stored by these data structures (and remove the call to confirm ==), but that's a more invasive change.

if (!p) {
while (f.size() && !(x < f.back()->data))
while (f.size() && !(x < f.back()->data)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not use lessThan.back() instead of x < f.back()->data ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Without that, this change doesn't actually help anything.

@sears
Copy link
Contributor Author

sears commented Apr 1, 2020

The force push added the asserts and the lessThan.back() call.

@sears
Copy link
Contributor Author

sears commented Apr 3, 2020

@fdb-build test this please

return iterator(t);
t = t->child[d];
if (cmp == 0) return iterator(t);
t = t->child[cmp > 0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a functional change? I think the equivalent code would be t = t->child[cmp >= 0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just returned if cmp==0 on the previous line, so cmp > 0 is the same as cmp >= 0.

@@ -871,8 +874,6 @@ void IndexedSet<T,Metric>::erase( typename IndexedSet<T,Metric>::iterator begin,
// Removes all nodes in the set between first and last, inclusive.
// toFree is extended with the roots of completely removed subtrees.

ASSERT(!end.i || (begin.i && *begin <= *end));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this removed because it's expensive? Can we still call the assert in simulation at least?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. How do I add a simulation-only assert?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can guard it with g_network->isSimulated()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not available here (it's defined in flow.h, which includes this file). I put the assert back in place.

@etschannen etschannen marked this pull request as draft April 15, 2020 17:34
@sears sears force-pushed the lb_ub_perf branch 3 times, most recently from fe6054c to fd8f7a6 Compare April 16, 2020 22:25
@sears sears force-pushed the lb_ub_perf branch 2 times, most recently from 675eb75 to a88a5ba Compare April 23, 2020 22:11
@sears sears marked this pull request as ready for review April 23, 2020 22:11
@sears
Copy link
Contributor Author

sears commented Apr 23, 2020

This should be ready to get more eyeballs on it. I benchmarked it yesterday, and got these results.

18 of 42 experiments showed throughput differences at 99% confidence. Here they are (operators is the name of the experiment run; everything else should be self-explanatory).

+ IndexedSet-StringRef-operators-erase.ministat
	9.84384% +/- 2.67275%
+ IndexedSet-StringRef-operators-find.ministat
	9.71526% +/- 2.90086%
+ IndexedSet-StringRef-operators-find_sorted.ministat
	16.8519% +/- 1.78925%
+ IndexedSet-StringRef-operators-insert.ministat
	8.72576% +/- 3.0701%
+ IndexedSet-int-operators-find.ministat
	-36.0976% +/- 3.23827%
+ IndexedSet-int-operators-find_sorted.ministat
	-21.758% +/- 1.50071%
+ StdMap-StringRef-operators-scan.ministat
	-5.76112% +/- 3.91458%
+ StdMap-int-operators-find_sorted.ministat
	3.09072% +/- 2.28104%
+ StdMap-int-operators-lower_bound.ministat
	-3.33598% +/- 2.16448%
+ VersionedMap-StringRef-operators-erase.ministat
	-6.77906% +/- 1.96514%
+ VersionedMap-StringRef-operators-find.ministat
	14.6379% +/- 2.59791%
+ VersionedMap-StringRef-operators-find_sorted.ministat
	26.4875% +/- 1.7277%
+ VersionedMap-StringRef-operators-insert.ministat
	4.2692% +/- 2.35077%
+ VersionedMap-StringRef-operators-lower_bound.ministat
	18.2241% +/- 2.91559%

A quick check suggests I fixed the VersionedMap<StringRef>::erase regression, but our testbed is busy running something else.

Changing to the comparison struct only made a measurable improvement for IndexedSet<int>::find. The rest of the changes had broader impact.

flow/Arena.h Outdated
size_t minSize = std::min(size(), other.size());
if (minSize != 0) {
int c = memcmp(begin(), other.begin(), minSize);
if (c != 0) return { c < 0, false, c > 0 };
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this is faster than doing if (c != 0) return c < 0 ? { -1 } : { 1 };

@satherton
Copy link
Contributor

The changes in Redwood unfortunately will conflict with a lot of changes in
#2996
so I'd prefer it if you omit them from this PR and rebase them on the incoming branch in that PR.

@xumengpanda
Copy link
Contributor

@sears Is there an updated performance results?
Does positive number in the results under your previous comment mean performance improvement?
The results seems to have a mixed conclusion on improvement and regression.

@sears
Copy link
Contributor Author

sears commented May 1, 2020

I don't have updated numbers, but I was testing as I went along. Moving from int to comparison as the return type of compare had a measurable improvement on find for one of the two trees (I forget which one), but only for trees with int keys, which aren't a big fraction of the time we spend at runtime.

I couldn't get circus to run anything yesterday, so I don't have end-to-end numbers. The idea is to merge this, and see if we see any improvement in the nightlies.

inline bool operator > ( const ExtStringRef& lhs, const ExtStringRef& rhs ) { return lhs.cmp(rhs)>0; }
inline bool operator <= ( const ExtStringRef& lhs, const ExtStringRef& rhs ) { return lhs.cmp(rhs)<=0; }
inline bool operator >= ( const ExtStringRef& lhs, const ExtStringRef& rhs ) { return lhs.cmp(rhs)>=0; }
inline bool operator<(const ExtStringRef& lhs, const ExtStringRef& rhs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to replace these functions with global templates taking const T & for both arguments such that the template is ignored if T does not have a .compare() member.

// This is as good a place as any, I guess.

template <typename T>
typename std::enable_if<std::is_integral<T>::value, int>::type compare(T l, T r) {
Copy link
Contributor

@satherton satherton May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type should be a signed integer of the same precision, I think something of the form
typename std::make_signed<T>::type
will get you that, and then the implementation can just be l - r.

EDIT: Nevermind, the result can be wrong for unsigned types, the return type would actually need to be a larger precision signed number to catch all cases.

Copy link
Contributor

@satherton satherton May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this file is the best place for these definitions, but then I also don't have a better suggestion so I guess it's fine.

…eneration for that seems a bit more stable, and it is probably more commonly used than "(l > r) - (l < r)"
@satherton satherton merged commit 4ae356b into apple:master May 4, 2020
@alexmiller-apple
Copy link
Contributor

Apparently Rusty isn't in our CI as someone to automatically run for, and this broke the build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants