Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune for higher transaction processing. #2294

Closed
wants to merge 1 commit into from

Conversation

mtrippled
Copy link
Collaborator

No description provided.

@ripplelabs-jenkins
Copy link
Collaborator

ripplelabs-jenkins commented Dec 10, 2017

Jenkins Build Summary

Built from this commit

Built at 20180117 - 17:11:31

Test Results

Build Type Result Status
clang.debug.unity 975 cases, 0 failed, t: 386s PASS ✅
coverage 975 cases, 0 failed, t: 612s PASS ✅
gcc.debug.unity 975 cases, 0 failed, t: 436s PASS ✅
clang.debug.nounity 973 cases, 0 failed, t: 709s PASS ✅
clang.release.unity 974 cases, 0 failed, t: 470s PASS ✅
gcc.debug.nounity 973 cases, 0 failed, t: 756s PASS ✅
gcc.release.unity 974 cases, 0 failed, t: 512s PASS ✅

@codecov-io
Copy link

codecov-io commented Dec 10, 2017

Codecov Report

Merging #2294 into develop will decrease coverage by 0.05%.
The diff coverage is 59.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #2294      +/-   ##
==========================================
- Coverage    70.96%   70.9%   -0.06%     
==========================================
  Files          691     691              
  Lines        51663   51409     -254     
==========================================
- Hits         36661   36452     -209     
+ Misses       15002   14957      -45
Impacted Files Coverage Δ
src/ripple/core/JobQueue.h 100% <ø> (ø) ⬆️
src/ripple/json/json_value.h 93.93% <ø> (+0.18%) ⬆️
src/ripple/overlay/Overlay.h 78.94% <ø> (ø) ⬆️
src/ripple/overlay/impl/PeerImp.cpp 0% <0%> (ø) ⬆️
src/ripple/app/misc/HashRouter.h 86.48% <0%> (-13.52%) ⬇️
src/ripple/app/misc/HashRouter.cpp 87.03% <0%> (-12.97%) ⬇️
src/ripple/core/impl/JobQueue.cpp 86.38% <100%> (-0.19%) ⬇️
src/ripple/app/misc/NetworkOPs.cpp 62.77% <100%> (ø) ⬆️
src/ripple/resource/impl/Fees.cpp 100% <100%> (ø) ⬆️
src/ripple/app/main/Application.cpp 60.61% <100%> (-0.35%) ⬇️
... and 114 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 090d813...e24689b. Read the comment docs.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. I left one thought for a potential change. I think it's a good idea, but it's not required.

I don't have any opinion on the constant value changes. I'll defer to @ximinez on those.

Thanks for chasing this down.

@@ -45,6 +45,9 @@ using namespace std::chrono_literals;

namespace ripple {

// The maximum number of transactions to have in the job queue.
int const max_transactions = 1000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this constant is only used one place in the file, consider reducing its scope to just where it is needed. Here's an example of what I have in mind: scottschurr@99f3bfc

BTW, I like providing a name for the magic number. Good work with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant value changes decrease the cost by 10x for peers to send transactions to one another, while leaving the relative for all other activities the same.

@sublimator
Copy link
Contributor

sublimator commented Dec 11, 2017 via email

@scottschurr
Copy link
Collaborator

👍

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of comments identifying unused Charge values.

I also suggest updating the commit message to something like

Tune for higher transaction processing:

* Decrease the relative cost for `feeLightPeer`, which is charged when a transaction is received, by increasing all the other charges.

If not, at least remove the ..

Otherwise, looks good, so I'll approve once those changes are made.

Charge const feeHighBurdenRPC ( 3000, "heavy RPC" );

Charge const feeLightPeer ( 1, "trivial peer request" );
Charge const feeLowBurdenPeer ( 20, "simple peer request" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeLowBurdenPeer is unused. May as well remove it.

Charge const feeReferenceRPC ( 20, "reference RPC" );
Charge const feeExceptionRPC ( 100, "exceptioned RPC" );
Charge const feeLightRPC ( 50, "light RPC" ); // DAVID: Check the cost
Charge const feeLowBurdenRPC ( 200, "low RPC" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeLightRPC and feeLowBurdenRPC are unused. May as well remove them.


Charge const feeNewTrustedNote ( 100, "trusted note" );
Charge const feeNewValidTx ( 100, "valid tx" );
Charge const feeSatisfiedRequest ( 100, "needed data" );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeNewTrustedNote, feeNewValidTx, and feeSatisfiedRequest are unused. May as well remove them.

@mtrippled mtrippled force-pushed the transactions branch 2 times, most recently from 2695b2f to 565ab5a Compare December 14, 2017 00:51
Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 pending CI.

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 14, 2017
@JoelKatz
Copy link
Collaborator

We found an issue where transactions were being dispatched each time they were received from a peer. This caused transactions to be dropped due to the inflated backlog and servers to be dropped due to the inflated counts of extraneous transactions received from peers. We changed this to suppress dispatching a transaction received from a peer if that transactions was dispatched due to peer receipt within the past ten seconds.

@mtrippled mtrippled force-pushed the transactions branch 2 times, most recently from 355981e to c73fc8e Compare December 14, 2017 21:52
@mtrippled
Copy link
Collaborator Author

@JoelKatz changes LGTM for the new suppressor
👍 plus it syncs to the network. So pending jenkins I say OK.

Copy link
Collaborator

@JoelKatz JoelKatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:neckbeard:👍 Let's try to get this to the cluster soon to improve the performance there.

@mtrippled
Copy link
Collaborator Author

@JoelKatz No transactions were dropped. However, each transaction was being distributed through the network multiple times to each peer. The duplicates caused congestion control limits to kick in, causing some peers to sever connections with other peers. The only peers who experienced issues with this were the public-facing client handlers and not the validators. No transactions were dropped.

@sublimator
Copy link
Contributor

sublimator commented Dec 15, 2017 via email

@mtrippled mtrippled removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 20, 2017
@mtrippled
Copy link
Collaborator Author

@ximinez @scottschurr @nbougalis @JoelKatz I've reopened this PR because it has been modified since we last visited it. This PR prior to commit e24689b was put into 0.80.2. The latest commit (e24689b) mainly adds counters to track some behaviors surrounding desync events.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, nice looking changes. Especially if they help us identify performance issues.

I left comments that you may want to address in a few places.

std::string str(std::to_string(value));
value_.string_ = valueAllocator ()->duplicateStringValue (
str.c_str (), str.size() );
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit nervous about introducing these constructors. Once these constructors exist, if someone constructs a Value with an int32_t, the Value has intValue. But if they construct with an int64_t, the Value has stringValue. I think this will potentially lead to bugs where folks don't know what size of integer they are handling (auto anyone?) and sometimes produce strings instead of integers.

The change is convenient for your particular use case. But I think it has the potential to make bugs easier to write.

If you want to keep this change you'll need to convince me that it is safe for the code base in general, not just the code you're introducing today.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without those constructors, creating json from 64bit input won't compile (at least on gcc):

/home/mtravis/projects/rippled/src/ripple/app/misc/NetworkOPs.cpp:2392:34: error: conversion from ‘uint64_t {aka long unsigned int}’ to ‘Json::Value’ is ambiguous
info[jss::jq_trans_overflow] = app_.overlay().getJqTransOverflow();

Referring to NetworkOPs.cpp:2392:
info[jss::jq_trans_overflow] = app_.overlay().getJqTransOverflow();


No existing 64bit integers rendered as Json will be harmed, because none exist other than what I'm putting there.

Copy link
Collaborator

@bachase bachase Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places where we want to store 64-bit integers in JSON, we use to_string to make it clear that JSON treats them differently.

I agree with @scottschurr that doing this by default has the potential for confusing semantics going forward.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? ximinez@a51519b

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted the json 64bit constructor to go with the team's consensus that 64bit integers are not to be handled automatically but instead make the coder remember to make them strings. The only json change I kept is initializing the map_ object and protecting against double-free on destruction.

}

void
incPeerDisconnect()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be marked override.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}

void
incPeerDisconnectCharges()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be marked override.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -71,6 +71,18 @@ bool HashRouter::addSuppressionPeer (uint256 const& key, PeerShortID peer, int&
return result.second;
}

bool HashRouter::shouldProcess (uint256 const& key, PeerShortID peer, int& flags,
Stopwatch::time_point now, std::chrono::seconds interval)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make the caller responsible for now and interval? shouldRelay (which, yeah, I wrote) uses the clock from within the suppressionMap_.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, though using the clock from suppressionMap_ fails unit test so am using the C++ clock.

auto& s = result.first;
s.addPeer (peer);
flags = s.getFlags ();
return s.shouldProcess (now, interval);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use some unit tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

std::string str(std::to_string(value));
value_.string_ = valueAllocator ()->duplicateStringValue (
str.c_str (), str.size() );
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? ximinez@a51519b

@nbougalis nbougalis mentioned this pull request Jan 10, 2018
@@ -120,7 +120,7 @@ class NetworkOPsImp final
{
struct Counters
{
std::uint64_t transitions = 0;
std::uint32_t transitions = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to change this back to uint32_t? If so, then I think you could skip the uses of std::to_string() below. But I don't believe that's your intent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I intended to revert and I did revert the conversion for transitions. And to_string() was already in place for the dur member, so that's unchanged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark. It looks like the added std::to_string() calls deal with your removal of the 64-bit JSON interface. Sorry I didn't catch that before.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm good with these changes. I'll let @ximinez speak for the changes he suggested in HashRouter.cpp.

@@ -120,7 +120,7 @@ class NetworkOPsImp final
{
struct Counters
{
std::uint64_t transitions = 0;
std::uint32_t transitions = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mark. It looks like the added std::to_string() calls deal with your removal of the 64-bit JSON interface. Sorry I didn't catch that before.

@mtrippled mtrippled force-pushed the transactions branch 2 times, most recently from 77c5f7a to 43457be Compare January 13, 2018 05:38
Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test can work with the suppression map clock. I left a comment describing how.


BEAST_EXPECT(router.shouldProcess(key, peer, flags, 1s));
BEAST_EXPECT(! router.shouldProcess(key, peer, flags, 1s));
std::this_thread::sleep_for(2s);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this line with

++stopwatch;
++stopwatch;

Then change HashRouter::shouldProcess to use supressionMap_.clock().now() instead of steady_clock. Or cherry-pick ximinez@d1f584b

The test should now pass, and as a bonus, the tests aren't unnecessarily slowed down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ximinez Fixed: it's almost as if stopwatch is something that can be started and stopped explicitly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. Now that I think about it, stopwatch is a terrible name. It's more of a manualClock.

@mtrippled mtrippled added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jan 17, 2018
@sublimator
Copy link
Contributor

sublimator commented Jan 17, 2018 via email

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go. 👍

@ximinez ximinez added the Rebase label Jan 17, 2018
@sublimator
Copy link
Contributor

sublimator commented Jan 17, 2018 via email

Do not process a transaction received from a peer if it has
been processed within the past ten seconds.

Increase the number of transaction handlers that can be in
flight in the job queue and decrease the relative cost for
peers to share transaction and ledger data.

Additionally, make better use of resources by adjusting the
number of threads we initialize, by reverting commit
68b8ffd.

Performance counter modifications:
  * Create and display counters to track:
    1) Pending transaction limit overruns.
    2) Total peer disconnections.
    3) Peers disconnections due to resource consumption.

Avoid a potential double-free in Json library.
@scottschurr
Copy link
Collaborator

Incorporated into 0.90.0-b4 as 76ad06e.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants