reduce the number of calls to operator< made by lower_bound and upper… #2882

sears · 2020-03-30T23:19:49Z

…_bound #2877

atn34 · 2020-03-31T17:33:56Z

fdbclient/VersionedMap.h

 	}

 	template<class T, class X>
-	void upper_bound(const Reference<PTree<T>>& p, Version at, const X& x, std::vector<const PTree<T>*>& f){
+	void lower_bound(const Reference<PTree<T>>& p, Version at, const X& x, std::vector<const PTree<T>*>& f) {
+		std::vector<bool> lessThan;


If I follow, there's an implicit invariant that lessThan.size() == f.size(). Can we assert that f.size() == 0 here? We might also consider changing f to std::vector<std::pair<const PTree<T>*, bool>> then. Not sure which would perform better, but I think the vector of pairs closer matches the intent.

I'll add the assert. f is actually the thing this method returns, and I didn't want to change the signature.

atn34 · 2020-03-31T17:37:25Z

fdbclient/VersionedMap.h

@@ -114,30 +114,51 @@ namespace PTreeImpl {
 		return contains(p->child(!less, at), at, x);
 	}

+	// TODO: Remove the number of invocations of operator<, and replace with something closer to memcmp.


bool operator<(const StringRef&, const StringRef&) is implemented using memcmp already, is that not close enough? Is the goal to not lose the extra information memcmp provides? Then we wouldn't need to call operator< twice to confirm x == p->data below.

Btw #2875 seems related.

The goal is to avoid losing the extra information. I'm working on a second PR that adds a compare operator to everything that's stored by these data structures (and remove the call to confirm ==), but that's a more invasive change.

atn34 · 2020-03-31T17:42:07Z

fdbclient/VersionedMap.h

 		if (!p) {
-			while (f.size() && !(x < f.back()->data))
+			while (f.size() && !(x < f.back()->data)) {


Can we not use lessThan.back() instead of x < f.back()->data ?

Good catch! Without that, this change doesn't actually help anything.

sears · 2020-04-01T18:58:11Z

The force push added the asserts and the lessThan.back() call.

sears · 2020-04-03T19:54:47Z

@fdb-build test this please

atn34 · 2020-04-14T20:54:15Z

flow/IndexedSet.h

-			return iterator(t);
-		t = t->child[d];
+		if (cmp == 0) return iterator(t);
+		t = t->child[cmp > 0];


Isn't this a functional change? I think the equivalent code would be t = t->child[cmp >= 0]

We just returned if cmp==0 on the previous line, so cmp > 0 is the same as cmp >= 0.

atn34 · 2020-04-14T20:56:07Z

flow/IndexedSet.h

@@ -871,8 +874,6 @@ void IndexedSet<T,Metric>::erase( typename IndexedSet<T,Metric>::iterator begin,
 	// Removes all nodes in the set between first and last, inclusive.
 	// toFree is extended with the roots of completely removed subtrees.

-	ASSERT(!end.i || (begin.i && *begin <= *end));


Is this removed because it's expensive? Can we still call the assert in simulation at least?

Yes. How do I add a simulation-only assert?

You can guard it with g_network->isSimulated()

That's not available here (it's defined in flow.h, which includes this file). I put the assert back in place.

sears · 2020-04-23T22:39:13Z

This should be ready to get more eyeballs on it. I benchmarked it yesterday, and got these results.

18 of 42 experiments showed throughput differences at 99% confidence. Here they are (operators is the name of the experiment run; everything else should be self-explanatory).

+ IndexedSet-StringRef-operators-erase.ministat
	9.84384% +/- 2.67275%
+ IndexedSet-StringRef-operators-find.ministat
	9.71526% +/- 2.90086%
+ IndexedSet-StringRef-operators-find_sorted.ministat
	16.8519% +/- 1.78925%
+ IndexedSet-StringRef-operators-insert.ministat
	8.72576% +/- 3.0701%
+ IndexedSet-int-operators-find.ministat
	-36.0976% +/- 3.23827%
+ IndexedSet-int-operators-find_sorted.ministat
	-21.758% +/- 1.50071%
+ StdMap-StringRef-operators-scan.ministat
	-5.76112% +/- 3.91458%
+ StdMap-int-operators-find_sorted.ministat
	3.09072% +/- 2.28104%
+ StdMap-int-operators-lower_bound.ministat
	-3.33598% +/- 2.16448%
+ VersionedMap-StringRef-operators-erase.ministat
	-6.77906% +/- 1.96514%
+ VersionedMap-StringRef-operators-find.ministat
	14.6379% +/- 2.59791%
+ VersionedMap-StringRef-operators-find_sorted.ministat
	26.4875% +/- 1.7277%
+ VersionedMap-StringRef-operators-insert.ministat
	4.2692% +/- 2.35077%
+ VersionedMap-StringRef-operators-lower_bound.ministat
	18.2241% +/- 2.91559%

A quick check suggests I fixed the VersionedMap<StringRef>::erase regression, but our testbed is busy running something else.

Changing to the comparison struct only made a measurable improvement for IndexedSet<int>::find. The rest of the changes had broader impact.

sears · 2020-04-23T23:56:07Z

flow/Arena.h

+		size_t minSize = std::min(size(), other.size());
+		if (minSize != 0) {
+			int c = memcmp(begin(), other.begin(), minSize);
+			if (c != 0) return { c < 0, false, c > 0 };


Not convinced this is faster than doing if (c != 0) return c < 0 ? { -1 } : { 1 };

satherton · 2020-04-24T21:20:50Z

The changes in Redwood unfortunately will conflict with a lot of changes in
#2996
so I'd prefer it if you omit them from this PR and rebase them on the incoming branch in that PR.

…_bound apple#2877

…e to 96 based on calculation in check()

xumengpanda · 2020-04-29T17:40:21Z

@sears Is there an updated performance results?
Does positive number in the results under your previous comment mean performance improvement?
The results seems to have a mixed conclusion on improvement and regression.

sears · 2020-05-01T17:54:56Z

I don't have updated numbers, but I was testing as I went along. Moving from int to comparison as the return type of compare had a measurable improvement on find for one of the two trees (I forget which one), but only for trees with int keys, which aren't a big fraction of the time we spend at runtime.

I couldn't get circus to run anything yesterday, so I don't have end-to-end numbers. The idea is to merge this, and see if we see any improvement in the nightlies.

satherton · 2020-05-04T20:38:07Z

fdbclient/SnapshotCache.h

-inline bool operator > ( const ExtStringRef& lhs, const ExtStringRef& rhs ) { return lhs.cmp(rhs)>0; }
-inline bool operator <= ( const ExtStringRef& lhs, const ExtStringRef& rhs ) { return lhs.cmp(rhs)<=0; }
-inline bool operator >= ( const ExtStringRef& lhs, const ExtStringRef& rhs ) { return lhs.cmp(rhs)>=0; }
+inline bool operator<(const ExtStringRef& lhs, const ExtStringRef& rhs) {


I think it is possible to replace these functions with global templates taking const T & for both arguments such that the template is ignored if T does not have a .compare() member.

satherton · 2020-05-04T21:22:07Z

flow/IRandom.h

+// This is as good a place as any, I guess.
+
+template <typename T>
+typename std::enable_if<std::is_integral<T>::value, int>::type compare(T l, T r) {


The return type should be a signed integer of the same precision, I think something of the form
typename std::make_signed<T>::type
will get you that, and then the implementation can just be l - r.

EDIT: Nevermind, the result can be wrong for unsigned types, the return type would actually need to be a larger precision signed number to catch all cases.

I'm not sure that this file is the best place for these definitions, but then I also don't have a better suggestion so I guess it's fine.

…eneration for that seems a bit more stable, and it is probably more commonly used than "(l > r) - (l < r)"

alexmiller-apple · 2020-05-05T10:29:03Z

Apparently Rusty isn't in our CI as someone to automatically run for, and this broke the build.

sears added in progress Use this for tasks you are actively working on and removed in progress Use this for tasks you are actively working on labels Mar 30, 2020

atn34 reviewed Mar 31, 2020

View reviewed changes

sears force-pushed the lb_ub_perf branch from f0ed137 to f2b3d6e Compare April 1, 2020 18:55

sears force-pushed the lb_ub_perf branch from e4bcec9 to cb98f1c Compare April 6, 2020 19:45

atn34 reviewed Apr 14, 2020

View reviewed changes

sears force-pushed the lb_ub_perf branch from e437b93 to cfb2a9d Compare April 14, 2020 23:39

etschannen marked this pull request as draft April 15, 2020 17:34

sears force-pushed the lb_ub_perf branch 3 times, most recently from fe6054c to fd8f7a6 Compare April 16, 2020 22:25

sears force-pushed the lb_ub_perf branch 2 times, most recently from 675eb75 to a88a5ba Compare April 23, 2020 22:11

sears marked this pull request as ready for review April 23, 2020 22:11

sears commented Apr 23, 2020

View reviewed changes

sears force-pushed the lb_ub_perf branch from 897d895 to b0eefef Compare April 24, 2020 18:38

sears added 8 commits April 29, 2020 10:10

simple benchmarks of IndexedSet, VersionedMap and std::map

1483661

reduce the number of calls to operator< made by lower_bound and upper…

0115d94

…_bound apple#2877

replace double invocations of operator<() with three way compare().

dcdda88

stack allocate fingers

5a8ec48

optimize PTreeFinger copy constructors

bfc6d63

remove the finger trimming loops from PTree::lower_bound and upper_bound

fa5855c

remove obsolete less field from PTreeFingerEntry, increase finger siz…

6ab3ad4

…e to 96 based on calculation in check()

pr comment: restore assert

7459fad

etschannen assigned satherton Apr 29, 2020

sears force-pushed the lb_ub_perf branch from 2b38b4a to 7459fad Compare April 29, 2020 17:35

satherton reviewed May 4, 2020

View reviewed changes

switch to the more obvious "l < r ? -1 : l == r ? 0 : 1" since code g…

17c0ad5

…eneration for that seems a bit more stable, and it is probably more commonly used than "(l > r) - (l < r)"

satherton approved these changes May 4, 2020

View reviewed changes

satherton merged commit 4ae356b into apple:master May 4, 2020

ajbeamon mentioned this pull request May 5, 2020

Revert "reduce the number of calls to operator< made by lower_bound and upper…" #3087

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce the number of calls to operator< made by lower_bound and upper… #2882

reduce the number of calls to operator< made by lower_bound and upper… #2882

sears commented Mar 30, 2020

atn34 Mar 31, 2020

sears Mar 31, 2020

atn34 Mar 31, 2020

sears Mar 31, 2020

atn34 Mar 31, 2020

sears Mar 31, 2020

sears commented Apr 1, 2020

sears commented Apr 3, 2020

atn34 Apr 14, 2020

sears Apr 21, 2020

atn34 Apr 14, 2020

sears Apr 21, 2020

atn34 Apr 21, 2020

sears Apr 23, 2020

sears commented Apr 23, 2020

sears Apr 23, 2020

satherton commented Apr 24, 2020

xumengpanda commented Apr 29, 2020

sears commented May 1, 2020

satherton May 4, 2020

satherton May 4, 2020 •

edited

Loading

satherton May 4, 2020 •

edited

Loading

alexmiller-apple commented May 5, 2020

reduce the number of calls to operator< made by lower_bound and upper… #2882

reduce the number of calls to operator< made by lower_bound and upper… #2882

Conversation

sears commented Mar 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sears commented Apr 1, 2020

sears commented Apr 3, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sears commented Apr 23, 2020

Choose a reason for hiding this comment

satherton commented Apr 24, 2020

xumengpanda commented Apr 29, 2020

sears commented May 1, 2020

Choose a reason for hiding this comment

satherton May 4, 2020 • edited Loading

Choose a reason for hiding this comment

satherton May 4, 2020 • edited Loading

Choose a reason for hiding this comment

alexmiller-apple commented May 5, 2020

satherton May 4, 2020 •

edited

Loading

satherton May 4, 2020 •

edited

Loading