exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102

tamirms · 2021-11-29T08:21:24Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

Previously we were representing the adjacency list for the graph as a mapping from asset string to the list of offers / pools which buy / sell that asset.

Now, every asset is assigned an integer id and the adjacency list is represented as an array where the integer id is used to index into the array. This new data structure is much more compact because the asset strings were very lengthy. The new data structure is also much faster because array indexing is significantly faster than looking up keys in a map.

Why

New benchmark:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	     100	  12064900 ns/op	 2811558 B/op	   67441 allocs/op
PASS

Old benchmark

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/exp/orderbook
cpu: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
BenchmarkVibrantPath
BenchmarkVibrantPath-12    	      16	  74524903 ns/op	17888031 B/op	   65145 allocs/op
PASS

The new code reduced the latency from 74.5 ms per call to 12 ms per call. Also, it reduced the space from 17.8 mb per call to 2.8mb per call.

Known limitations

[N/A]

2opremio · 2021-11-29T11:40:54Z

exp/orderbook/graph.go

+	if len(graph.vacantIDs) > 0 {
+		id = graph.vacantIDs[len(graph.vacantIDs)-1]
+		graph.vacantIDs = graph.vacantIDs[:len(graph.vacantIDs)-1]
+		graph.idToAssetString[id] = assetString
+	} else {
+		id = int32(len(graph.idToAssetString))
+		graph.idToAssetString = append(graph.idToAssetString, assetString)
+		graph.venuesForBuyingAsset = append(graph.venuesForBuyingAsset, nil)
+		graph.venuesForSellingAsset = append(graph.venuesForSellingAsset, nil)
+	}


I think this requires further clarification. Particularly as to why there are multiple ways to obtain the ID.

Also, do we really need the vacantIDs mechanism? It seems to complicate things and it's not obvious (to me at least) what gain it brings.

the purpose of vacantIDs is to make sure the assets array does not waste any cells.

let's say we start out with an empty graph . the assets array will be empty in that case.
then we add the following assets:
0 -> 'usd'
1 -> 'eur'
2 -> 'chf'
3 -> 'sek'

now, we remove all offers and pools which have the chf asset so we can remove it. at this point the array looks like:

0 -> 'usd'
1 -> 'eur'
2 -> ''
3 -> 'sek'

the cell at index 2 is vacant and we can reuse it the next time we add a new asset, for example 'yen'

0 -> 'usd'
1 -> 'eur'
2 -> 'yen'
3 -> 'sek'

without the vacantIDs mechanism we will either have to add 'yen' to index 4 and forever let cell 2 to be empty, or we could try to reshuffle the mapping so there are no empty cells when remove 'chf' but that would be an expensive operation

Ah, I see. Does wasting cells have a big performance impact? I presume we would be consuming a bit more memory, but maybe that's acceptable?

I think there is a small performance impact because we iterate over all the cells in the search algorithm here:

go/exp/orderbook/search.go

Line 141 in 41b735e

for currentAsset := int32(0); currentAsset < totalAssets; currentAsset++ {

For the empty cells we skip them through this check:

currentAmount := bestAmount[currentAsset] if currentAmount == 0 { continue }

I think a few empty cells wouldn't matter but if we didn't eventually fill in the empty cells I think the performance would slowly get worse over time as we accrue more and more empty cells until horizon restarts

exp/orderbook/graph.go

bartekn · 2021-11-29T14:20:21Z

exp/orderbook/graph.go

+		// we assign id to asset
+		graph.idToAssetString = append(graph.idToAssetString, assetString)
+		graph.venuesForBuyingAsset = append(graph.venuesForBuyingAsset, nil)
+		graph.venuesForSellingAsset = append(graph.venuesForSellingAsset, nil)


Shouldn't we clear venuesForBuyingAsset and venuesForSellingAsset when assigning to a vacant id? It seems we don't do that in maybeDeleteAsset either.

in order to get included in the vacant id list it is a necessary condition that graph.venuesForBuyingAsset[asset] and graph.venuesForSellingAsset[asset] are empty:

func (graph *OrderBookGraph) maybeDeleteAsset(asset int32) { buyingEdgesEmpty := len(graph.venuesForBuyingAsset[asset]) == 0 sellingEdgesEmpty := len(graph.venuesForSellingAsset[asset]) == 0 if buyingEdgesEmpty && sellingEdgesEmpty { delete(graph.assetStringToID, graph.idToAssetString[asset]) // When removing an asset we do not resize the idToAssetString array. // Instead, we allow the cell occupied by the id to be empty. // The next time we will add an asset to the graph we will allocate the // id to the new asset. graph.idToAssetString[asset] = "" graph.vacantIDs = append(graph.vacantIDs, asset) } }

sreuland · 2021-11-29T18:50:11Z

exp/orderbook/graph.go

+		return id
+	}
+	// before creating a new int32 asset id we will try to use
+	// a vacant id so that we can plug any empty cells in the


great design, just curious, would storing nil in idToAssetString equate to same result as maintaining separate vacancy state, i.e., iterate for idToAssetString=nil instead, perhaps for less code, but just wondering.

yeah, that's true we could avoid having a vacantIDs list entirely if we scan through idToAssetString to find the first empty cell. in the worst case if there are no empty cells we have to scan through the entire array before realizing we have to append to the end. Having vacantIDs makes the operation of adding a new asset faster

sreuland

great insight and improvement.

…f strings (#4102) Represent assets in orderbook graph as int32 instead of strings

…f strings (stellar#4102) Represent assets in orderbook graph as int32 instead of strings

tamirms force-pushed the int32-nodes branch from 1151309 to 4df229c Compare November 29, 2021 08:25

tamirms requested a review from a team November 29, 2021 08:26

Represent assets in orderbook graph as int32 instead of strings

06eebae

tamirms force-pushed the int32-nodes branch from 4df229c to 06eebae Compare November 29, 2021 08:33

2opremio reviewed Nov 29, 2021

View reviewed changes

exp/orderbook/graph.go Outdated Show resolved Hide resolved

2opremio reviewed Nov 29, 2021

View reviewed changes

exp/orderbook/graph.go Show resolved Hide resolved

2opremio mentioned this pull request Nov 29, 2021

xdr and exp/orderbook: Reduce path search allocations #4105

Merged

Add comments

41b735e

tamirms force-pushed the int32-nodes branch from 3057464 to 41b735e Compare November 29, 2021 14:14

bartekn reviewed Nov 29, 2021

View reviewed changes

tamirms mentioned this pull request Nov 29, 2021

services/horizon: Improve performance of path finding endpoint #4106

Closed

5 tasks

tamirms linked an issue Nov 29, 2021 that may be closed by this pull request

services/horizon: Improve performance of path finding endpoint #4106

Closed

5 tasks

2opremio approved these changes Nov 29, 2021

View reviewed changes

sreuland reviewed Nov 29, 2021

View reviewed changes

sreuland approved these changes Nov 29, 2021

View reviewed changes

tamirms and others added 2 commits November 29, 2021 20:59

Merge branch 'master' into int32-nodes

0e78847

Add tests for path resource adapter

ad9c53f

tamirms merged commit b5d2058 into stellar:master Nov 30, 2021

tamirms deleted the int32-nodes branch November 30, 2021 09:05

tamirms added a commit that referenced this pull request Dec 1, 2021

exp/orderbook: Represent assets in orderbook graph as int32 instead o…

940b9a0

…f strings (#4102) Represent assets in orderbook graph as int32 instead of strings

erika-sdf pushed a commit to erika-sdf/go that referenced this pull request Dec 3, 2021

exp/orderbook: Represent assets in orderbook graph as int32 instead o…

d5a9a5e

…f strings (stellar#4102) Represent assets in orderbook graph as int32 instead of strings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102

exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102

tamirms commented Nov 29, 2021

2opremio Nov 29, 2021

2opremio Nov 29, 2021

tamirms Nov 29, 2021

2opremio Nov 29, 2021

tamirms Nov 29, 2021 •

edited

Loading

bartekn Nov 29, 2021

tamirms Nov 29, 2021

sreuland Nov 29, 2021

tamirms Nov 29, 2021

sreuland left a comment

exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102

exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102

Conversation

tamirms commented Nov 29, 2021

PR Structure

Thoroughness

Release planning

What

Why

Known limitations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tamirms Nov 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sreuland left a comment

Choose a reason for hiding this comment

tamirms Nov 29, 2021 •

edited

Loading