-
Notifications
You must be signed in to change notification settings - Fork 370
Conversation
I did some benchmark on the
This is on the PR:
It looks like the results are pretty similar. Also on some tests the PR results are a bit slower. We will soon have better reproducible tests. I believe we can do better then the current code. More ideas in my upcoming review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to make this algorithm run faster:
-
When you scan the entire subtangle save an in memory map of hash to approvers. Use the in memory map to traverse to approvers in
getRating
method. You save on DB calls and thus increase performance -
Note that the size of the keyset of the map from step 1 is the CW of
entrypoint
. Using this will save you a largegetRating
calculation -
Even though on my testdb there were no exceptions, using actual recursion can lead to stack overflow exceptions. Use
ArrayDeque
as stack to implement an iterativegetRating()
.
|
||
/** | ||
* Calculates the weight recursively/on the fly instead of building the tree and calculating after | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a better description of the algorithm?
Try not to refer to the other implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe delete the other implementation, then you can call this one Cumulative Weight Calculator. Then we won't have to maintain more code. Also it will make you improve the javadoc (you can define CW in the doc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have copied the description but not the Link!!
The link explains the old algo which you deleted
Simply write exactly the following:
"Calculates the cumulative weight for each transaction above a certain entry point. Cumulative weight is an integer defined as the number of approving transactions plus one. We calculate it by doing a graph traversal from each transaction above a certain entry point and counting the number of different transactions we pass."
*/ | ||
public class RecursiveWeightCalculator implements RatingCalculator { | ||
|
||
public final Tangle tangle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why public?
@Override | ||
public UnIterableMap<HashId, Integer> calculate(Hash entryPoint) throws Exception { | ||
Set<Hash> toCheck = new HashSet<>(1); | ||
toCheck.add(entryPoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides here, where do you use this set?
toCheck.add(entryPoint); | ||
|
||
// Initial capacity of 16, as default for java maps and lists | ||
UnIterableMap<HashId, Integer> hashWeight = createTxHashToCumulativeWeightMap(16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of 16 let's try a more optimized number.
Let say we have 3 tps and normally we use depth 3. Let say a milestone comes every 5 minutes. So we 5 X 60 X 3 X 3 = 2700
Since you have the entry point you can know the exact depth btw
|
||
// Initial capacity of 16, as default for java maps and lists | ||
UnIterableMap<HashId, Integer> hashWeight = createTxHashToCumulativeWeightMap(16); | ||
calculateRatingDfs(entryPoint, hashWeight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the style of having the result as a return
value rather than an input parameter that changes
I have performed another test on an isolated node that has a mainnetdb synced to ms #110932. Spamming with a GTTA of 3 with Jmeter I obtained the following results now: Old CW:
New CW:
So the new implementation is approximately 3X faster |
Whoa, did not expect that @GalRogozinski. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can start by the change of removing the field...
I want to see how it affects the benchmark again
Then after we have the results we can decide on the other changes
private final Tangle tangle; | ||
private final SnapshotProvider snapshotProvider; | ||
|
||
private Map<Hash, ArrayDeque<Hash>> txToDirectApprovers = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this field makes the calculator stateful
Let's remove it since we will have caching soon
while (CollectionUtils.isNotEmpty(stack)) { | ||
Hash txHash = stack.peek(); | ||
if (!hashWeight.containsKey(txHash)) { | ||
Collection<Hash> appHashes = getTxDirectApproversHashes(txHash, txToDirectApprovers, txToDirectApprovers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose the fallkback is supposed to be the field...
Actually you are using the same parameter twice
*/ | ||
private ArrayDeque<Hash> getTxDirectApproversHashes(Hash txHash, | ||
Map<Hash, ArrayDeque<Hash>> txToDirectApprovers, | ||
Map<Hash, ArrayDeque<Hash>> fallback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for fallback
private int getRating(Hash hash, Set<HashId> seenHashes) throws Exception { | ||
int weight = 1; | ||
|
||
ArrayDeque<Hash> approvers = getTxDirectApproversHashes(hash, txToDirectApprovers, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If instead of the field you will pass the parameter from calculatingRatingDfs
you will also save on I/O calls
for (Hash approver : approvers) { | ||
if (!seenHashes.contains(approver)) { | ||
seenHashes.add(approver); | ||
weight += getRating(approver, seenHashes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to have something iterative rather than recursive calls. May prevent StackOverflows
...
You are simply doing a dfs
, so you can reuse the code of calculateRatingDfs
@kwek20 |
|
||
/** | ||
* Calculates the weight recursively/on the fly instead of building the tree and calculating after | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe delete the other implementation, then you can call this one Cumulative Weight Calculator. Then we won't have to maintain more code. Also it will make you improve the javadoc (you can define CW in the doc)
@@ -269,6 +283,7 @@ public void testTangleWithCircle2() throws Exception { | |||
} | |||
|
|||
@Test | |||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed this, but remind me again why are we ignoring a test that previously passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tests a broken thing in our previous system, collisions between HashIds of hashes.
transactionHash2
is a similar hash to the other one, and is therefore not included.
No idea why we ever did that, as it is literally skipping on transactions this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, id vote for removing the hashId entirely, having it might speed some things up, but if it blocks certain tx, its still not worth it. (in the test, tx1 and tx2 work because they reference to the same value in the end.)
An example on how to break it earlier is make tx3 and tx1 the same instead of 1 and 2. Then 3 should be value 1, and not 2, which will error
@@ -113,15 +107,14 @@ private Hash getAndRemoveApprover(Collection<Hash> appHashes) { | |||
* @return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this empty?
stack.push(entryPoint); | ||
|
||
while (CollectionUtils.isNotEmpty(stack)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use regular !isEmpty()
let's get rid of CollectionUtils
@@ -55,53 +53,49 @@ public RecursiveWeightCalculator(Tangle tangle, SnapshotProvider snapshotProvide | |||
|
|||
// Estimated capacity per depth, assumes 5 minute gap in between milestones, at 3tps | |||
UnIterableMap<HashId, Integer> hashWeight = createTxHashToCumulativeWeightMap( 5 * 60 * 3 * depth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it hashToWeight
or hashWeightMap
.
This variable has an integer name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you changed it in one place, but you didn't change it in the others
weight += getRating(approver, seenHashes); | ||
private int getRating(HashSet<Hash> nonDupes, Map<Hash, HashSet<Hash>> txToDirectApprovers) throws Exception { | ||
Deque<Hash> stack = new ArrayDeque<>(nonDupes); | ||
while (CollectionUtils.isNotEmpty(stack)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of CollectionUtils
|
||
/** | ||
* Calculates the weight recursively/on the fly instead of building the tree and calculating after | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should have copied the description but not the Link!!
The link explains the old algo which you deleted
Simply write exactly the following:
"Calculates the cumulative weight for each transaction above a certain entry point. Cumulative weight is an integer defined as the number of approving transactions plus one. We calculate it by doing a graph traversal from each transaction above a certain entry point and counting the number of different transactions we pass."
* Calculates the weight recursively/on the fly | ||
* Used to create a weighted random walks. | ||
* | ||
* @see <a href="cumulative.md">https://github.com/alongalky/iota-docs/blob/master/cumulative.md</a> | ||
*/ | ||
public class RecursiveWeightCalculator implements RatingCalculator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change name to CumulativeWeightCalculator
@@ -55,53 +53,49 @@ public RecursiveWeightCalculator(Tangle tangle, SnapshotProvider snapshotProvide | |||
|
|||
// Estimated capacity per depth, assumes 5 minute gap in between milestones, at 3tps | |||
UnIterableMap<HashId, Integer> hashWeight = createTxHashToCumulativeWeightMap( 5 * 60 * 3 * depth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you changed it in one place, but you didn't change it in the others
* @param startingSet | ||
* @param txToDirectApproversCache | ||
* @return | ||
* @throws Exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fill in the @param
and @return
If you decided to write a javadoc, do it correctly
* @param requester | ||
* @param stack | ||
* @return | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty javadoc?
appHashes = Collections.emptySet(); | ||
} else { | ||
appHashes = approvers.getHashes(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is cleaner to write it this way:
Collection<Hash> appHashes = (approvers == null || approvers.getHashes() == null)
? Collections.emptySet()
: approvers.getHashes();
Only change if you agree, you don't have to. I care less about that. It is just less lines.
*/ | ||
private int getRating(Set<Hash> startingSet, Map<Hash, Set<Hash>> txToDirectApproversCache) throws Exception { | ||
Deque<Hash> stack = new ArrayDeque<>(startingSet); | ||
while (stack.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After seeing this test I ran the unit tests locally
They failed...
* @return | ||
* @throws Exception | ||
*/ | ||
private int getRating(Set<Hash> startingSet, Map<Hash, Set<Hash>> txToDirectApproversCache) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the name to getWeight
.
Optional:
I think it will be cleaner to add + 1 when you return at the end instead. Or conversely instead of startingSet
you can have startingHash
or entryHash
. But then you will have that getApprovers
call that you did before in calculateRatingDfs
. It is presumably not so bad because you have the cache and the code will become more readable (you get the weight of the tx in the param).
@@ -283,7 +283,6 @@ public void testTangleWithCircle2() throws Exception { | |||
} | |||
|
|||
@Test | |||
@Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using the prefix again?
How come this is passing?
When I asked before I just didn't check what this unit test does.
Then you reminded and convinced me that this should be ignored and probably deleted later once we finalize tipsel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just delete the link please
* Implementation of {@link RatingCalculator} that calculates the cumulative weight | ||
* Calculates the weight recursively/on the fly for each transaction referencing {@code entryPoint}. <br> | ||
* Works using DFS search for new hashes and a BFS calculation. | ||
* Uses cached values to prevent double database lookup for approvers | ||
* | ||
* @see <a href="cumulative.md">https://github.com/alongalky/iota-docs/blob/master/cumulative.md</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this link
} | ||
|
||
return hashWeight; | ||
// If we have a circular reference, its already added, otherwise we save a big calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
circular -> self referencing
return hashWeight; | ||
// If we have a circular reference, its already added, otherwise we save a big calculation | ||
if (!hashWeightMap.containsKey(entryPoint)) { | ||
hashWeightMap.put(entryPoint, hashWeightMap.size() + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -18,7 +18,6 @@ | |||
import org.junit.AfterClass; | |||
import org.junit.Assert; | |||
import org.junit.BeforeClass; | |||
import org.junit.Ignore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're still using transforming map
You want to avoid doing more tests for now?
Fine with that, we can change it later
🎉 |
Description
Updated the CW calculation to be up to 3x faster.
Changing the Transforming map to a regular HashMap made another small improvement.
Could not find a clear increase in memory usage as no OOM occured during any testing, even with
-Xmx512M
Fixes #1428
Type of change
How Has This Been Tested?
Tests ran, benchmarked, network tests with GTTA
** DEPTH 3**
1000*5 threads
hashId:
Cache:
summary = 5000 in 00:19:06 = 4.4/s Avg: 1013 Min: 379 Max: 24359 Err: 0 (0.00%)
summary = 5000 in 00:18:35 = 4.5/s Avg: 986 Min: 356 Max: 3006 Err: 0 (0.00%)
dev:
summary = 5000 in 00:35:36 = 2.3/s Avg: 1866 Min: 612 Max: 5439 Err: 0 (0.00%)
Just Hash:
summary = 5000 in 00:18:03 = 4.6/s Avg: 951 Min: 289 Max: 2891 Err: 0 (0.00%)
summary = 5000 in 00:18:04 = 4.6/s Avg: 953 Min: 271 Max: 2676 Err: 0 (0.00%)
-- All HashId Removed
** DEPTH 15**
100*10 threads, stoppped at 83
summary = 83 in 00:16:50 = 0.1/s Avg: 113670 Min: 9931 Max: 234565 Err: 0 (0.00%)
** DEPTH 15**
10*20 threads
summary = 200 in 00:28:11 = 0.1/s Avg: 154674 Min: 8683 Max: 339657 Err: 0 (0.00%)
Checklist: