Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Shard Family #3448

Closed
wants to merge 19 commits into from
Closed

Conversation

miguelportilla
Copy link
Contributor

@miguelportilla miguelportilla commented Jun 10, 2020

The App Family utilizes a single shared Tree Node and Full Below cache for all shards. This can create a problem when acquiring a shard that shares an account state node that was recently cached from another shard operation. The new Shard Family class solves this issue by managing separate Tree Node and Full Below caches for each shard.

The shard acquire process in LedgerMaster and InboundLedgers was improved with considering of the acquire reason.

@@ -34,35 +34,42 @@ class Family
public:
virtual ~Family() = default;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're in this area, I recommend deleting the two copy members (as you've done in the derived classes already). This will further require explicitly defaulting the default constructor.

Copy link
Contributor

@HowardHinnant HowardHinnant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job.

Left one nitpicky suggestion.

Copy link

@MarcelRaschke MarcelRaschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved changes ✅

@miguelportilla
Copy link
Contributor Author

Thank you, @MarcelRaschke

gregtatcam and others added 4 commits June 24, 2020 13:51
* Optimize parsing of compressed message headers
* Enforce protocol-defined message size maxima
* Update comments
 *Add majority timer configuration
 *FIXES: XRPLF#3396
Slice should, eventually, be replaced by std::string_view<std::uint8_t>
so begin adding some helpful functions to align its interface.
* Improve error reporting (more readable exception messages)
* Reduce function complexity (split oversized function to smaller pieces)
* Reduce code duplication
* Reduce buffer copying
Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good. I left some comments for your consideration, but nothing that would need to block merging. The pointer handling is no riskier than it was before this pull request 😉

@@ -142,8 +142,9 @@ class InboundLedger final : public PeerSet,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While digging around getting myself oriented in this code I noticed that TreeNodeCache.h includes two unnecessary lines. They don't hurt anything because the include guard works. But I was wondering if you would mind removing the spare lines in this pull request while you're vaguely in the area. The lines that could be removed are:

#include <ripple/shamap/TreeNodeCache.h>

and

class SHAMapAbstractNode;

Thanks for your consideration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion that may slightly improve usability of enum class Reason declared on line 51.

Uses of this enum currently look like this:

InboundLedger::Reason::GENERIC

It's not hard to make this a little shorter and easier to read. Consider adding the following static constexpr member declarations:

    // Retain the sturdiness of an enum class, but make the enum members
    // less verbose to use.
    static constexpr auto forHISTORY = Reason::HISTORY;
    static constexpr auto forSHARD = Reason::SHARD;
    static constexpr auto forGENERIC = Reason::GENERIC;
    static constexpr auto forCONSENSUS = Reason::CONSENSUS;

The old uses of the enum can remain, but they can also be shortened to this:

InboundLedger::forCONSENSUS

I find that a little easier to read. Just for your consideration. forGeneric reads a little oddly, but I think the style works well for the other three.

If you choose to do this I'd suggest also switching all of the old-style uses of the enum to the shorter style. This will encourage future users to adopt the shorter style.

mHaveHeader = false;
mHaveTransactions = false;
mHaveState = false;
mLedger.reset();
tryDB(*app_.shardFamily());

tryDB(app_.shardFamily()->db());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Application.h doesn't offer any assurances that a call to Application::shardFamily will not return a nullptr. If there's a reason we don't need to guard for that perhaps it deserves a comment, either here or in Application.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shards remain an optional feature. If configured, the shard store pointer is created in the Application ctor and is valid for the Application's object lifetime. It is the responsibility of the caller to check the validity of the pointer. The Application ctor has a comment but documentation could be improved, I'll add some.

In the InboundLedger(s) classes, we can safely assume the shard store pointer to be valid when the reason is SHARD. The initiating code in LedgerMaster, (the only code to assign this reason) only does so if the shard store pointer is valid. We also assert in InboundLedgers::acquire on the reason being SHARD with an invalid shard store pointer.

std::min(ledgerIndex, shardStore->lastLedgerSeq(shardIndex));
}

auto haveHash{getLedgerHashForHistory(ledgerIndex, reason)};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haveHash could be const.

@@ -735,7 +735,18 @@ LedgerMaster::tryFill(Job& job, std::shared_ptr<Ledger const> ledger)
void
LedgerMaster::getFetchPack(LedgerIndex missing, InboundLedger::Reason reason)
{
auto haveHash{getLedgerHashForHistory(missing + 1, reason)};
LedgerIndex ledgerIndex{missing + 1};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, ledgerIndex could be made const by turning the following if into a lambda. It may not be worth it. Your call.

Copy link
Contributor Author

@miguelportilla miguelportilla Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, I prefer const when possible.

// All ledgers have been acquired, shard is complete
acquireInfo_.reset();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment referring to acquireInfo_ in Shard.h line 211 that...

// If the shard is complete, this member will be null.

Is that comment still accurate with the acquireInfo_.reset() calls removed here and on line 293? I don't know, but I thought I would ask.

Copy link
Contributor Author

@miguelportilla miguelportilla Jun 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! The comment should read If the shard is final, this member will be null.

return false;
}

std::shared_ptr<FullBelowCache> getFullBelowCache(std::uint32_t) override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in Family.h about the ignored parameter.

return fbCache_;
}

std::shared_ptr<TreeNodeCache> getTreeNodeCache(std::uint32_t) override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in Family.h about the ignored parameter.


if (!child)
{
auto const& childHash = parent->getChildHash(branch);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spotting, reducing the scope.

, tnTargetSize_(app.config().getValueFor(SizedItem::treeCacheSize))
, tnTargetAge_(app.config().getValueFor(SizedItem::treeCacheAge))
{
assert(app.getShardStore());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you hit this assert I think you already had undefined behavior, since you assigned the contents of a nullptr to a reference. So all bets were already off. The assert may or may not fire.

Since you're in a .cpp file, you can work around this with the following trick...

static NodeStore::Database&
getShardStore (Application& app)
{
    auto const dbPtr = app.getShardStore();
    assert (dbPtr);
    return *dbPtr;
}

ShardFamily::ShardFamily(Application& app, CollectorManager& cm)
    : app_(app)
    , db_(getShardStore(app))
    , cm_(cm)
    , j_(app.journal("ShardFamily"))
    , tnTargetSize_(app.config().getValueFor(SizedItem::treeCacheSize))
    , tnTargetAge_(app.config().getValueFor(SizedItem::treeCacheAge))
{
}

{
JLOG(j_.error()) << "Missing node in ledger sequence " << seq;

// Prevent recursive invocation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you don't really mean "recursive" here. If it were recursive then I think you'd need a std:::recursive_mutex, which you're not using.

@carlhua carlhua added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jun 25, 2020
@carlhua carlhua removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jun 25, 2020
mtrippled and others added 9 commits June 25, 2020 10:49
evaluations for validated ledger age.
* Document delete_batch, back_off_milliseconds, age_threshold_seconds.
* Convert those time values to chrono types.
* Fix bug that ignored age_threshold_seconds.
* Add a "recovery buffer" to the config that gives the node a chance to
  recover before aborting online delete.
* Add begin/end log messages around the SQL queries.
* Add a new configuration section: [sqlite] to allow tuning the sqlite
  database operations. Ignored on full/large history servers.
* Update documentation of [node_db] and [sqlite] in the
  rippled-example.cfg file.
* Resolves XRPLF#3321
* This was always a stub and enabled no functional changes
Corrects the public_key parameter name in the comment.
See XRPLF/xrpl-dev-portal#854 for context.
 *The tecUNFUNDED code is actively used when attempting to create payment
  channels; the messages incorrectly list it as deprecated. Meanwhile, the
  tecUNFUNDED_ADD code actually is an unused legacy code, dating back to
  when there was a WalletAdd transactor.

 *Engine result messages are not part of the binary format and are
  documented as subject to change without notice, so this should not
  require an amendment nor a new API version.

 *Mark terLAST, terFUNDS_SPENT deprecated
HowardHinnant and others added 5 commits June 25, 2020 10:49
* Fixes XRPLF#3269

* Work on a version 2 of the XRP Network API has begun.
  The new API returns the code `notSynced` in place of
  `noClosed`, `noCurrent`, and `noNetwork`.  And
  `invalidParams` is returned in place of `lgrIdxInvalid`.
* The version 2 API can be specified by adding
  "api_version" = 2 to your json request.  The default
  version remains 1 (if unspecified), except for the command
  line interface which tracks version 2.
* It can be re-enabled by setting ApiMaximumSupportedVersion to 2.
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
@miguelportilla miguelportilla added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jun 25, 2020
This was referenced Jun 25, 2020
This was referenced Jun 26, 2020
@miguelportilla miguelportilla deleted the acquire_family branch August 11, 2020 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.