Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base uint2 #3646

Closed
wants to merge 7 commits into from
Closed

Base uint2 #3646

wants to merge 7 commits into from

Conversation

nbougalis
Copy link
Contributor

@nbougalis nbougalis commented Oct 26, 2020

This PR builds on top of #3640 (the bottom-most commits are from there) and involves a refactoring and general code cleanup of several classes involved with SHAMaps. In addition to improving memory usage by 8 bytes per SHAMapAbstractNode, the changes also improve type safety by having distinct classes for each type of leaf we have, instead of a single class.

Each change is a separate commit (hence the volume of commits). Since this is critical code, I wanted to make sure that the changes were small enough to be easy to verify by inspection. For the same reason, I (mostly) avoided running clang-format and expect that there are several violations.

This is complex code and it can be challenging to grok, especially since the rationale behind some of these decisions exists only in the heads of three people: Arthur, David and myself. To that end, I made several improvements to the documentation as well, which should help make this code more tractable for others. Honestly, this is a WIP.

You only need to review the commits tagged [WIP] but please review them VERY CAREFULLY. If you're even slightly unsure that a change is correct, or if something is unclear, please leave a comment.

@nbougalis nbougalis added Tech Debt Non-urgent improvements Documentation README changes, code comments, etc. labels Oct 26, 2020
node.getNodeHash().as_uint256(), node.getSeq());
// FIXME: node.owner() is the wrong thing to pass: fetchNodeObject expects
// a ledger sequence, not a copy-on-write identifier.
dbRotating_->fetchNodeObject(node.getHash().as_uint256(), node.owner());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After mentioning this bug in chat, it was fixed by @miguelportilla in #3643 so it will be resolved in a different PR, and this comment will go away during the merge/conflict resolution.

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this pull request carefully, one commit at a time. As far as I can see everything is in order.

I have two further suggestions for this pull request.

  • I think now would be good time to rename SHAMapTreeNode to SHAMapLeafNode. That seems to me to be what the class represents.

  • I really like the way you renamed SHAMap::seq_ to cowid_ Similarly, the rename of member variable SHAMapAbstractNode::mSeq to SHAMapAbstractNode::cowid_ is a good choice. However I think the member function SHAMapAbstractNode::owner() (which returns cowid_) would be better named SHAMapAbstractNode::cowid(). The value is easier to track and audit when it does not change names in the middle of usage. Similarly, I think the parameters of SHAMap*Node member functions named owner would be easier to follow if they were renamed cowid.

Neither of these suggests affect correctness, but I think they will make the code easier to follow.

// Inner nodes must be at a level strictly less than 64
// but leaf nodes (while notionally at level 64) can be
// at any depth:
if (newNode->isInner() && iNodeID.getDepth() == 64)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment! It leads me to ask the question of whether it is valid for any node to have a depth greater than 64. I wonder whether this might be a better check:

            // Inner nodes must be at a level strictly less than 64
            // but leaf nodes (while notionally at level 64) can be
            // at any depth up to 64:
            if (iNodeID.getDepth() > 64 ||
                (newNode->isInner() && iNodeID.getDepth() == 64))
            {
                // Map is provably invalid
                state_ = SHAMapState::Invalid;
                return SHAMapAddNode::useful();
            }

Copy link
Contributor Author

@nbougalis nbougalis Nov 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but note that a previous set of changes has made it impossible to construct such a node; or rather, the constructor of the object has an assert that enforces this invariant. Still, an extra (basically free) check here seems like a good idea.

{
}

explicit SHAMapAbstractNode(std::uint32_t seq, SHAMapHash const& hash)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explicit does no harm here, but it's not really useful since this constructor takes more than one argument.

Copy link
Collaborator

@seelabs seelabs Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scottschurr just to be pedantic, putting explicit on multi-parameter constructors prevents them from being initialized with braces. I.e. explicit Foo(int, int) prevents a Foo from being initialized with {1,2}. I'd probably prefer we keep the explicit, but I don't feel that strongly about it (and in fact I often leave it off multi-paramter constructors myself). One place this would matter is when a function takes the class as a parameter and it is called with braces. I.e the explicit two param constructor prevents bar(Foo f) from being called as bar({1,2}).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seelabs, thanks for that correction. Excellent point.

src/ripple/app/consensus/RCLConsensus.cpp Show resolved Hide resolved
src/ripple/shamap/impl/SHAMap.cpp Show resolved Hide resolved
getType() const override
{
return mType;
}

bool
isLeaf() const override
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider marking as final those virtual methods in this class that are not supposed to be overridden by further derived classes. In the current final state of this branch I believe those methods are isLeaf(), isInner(), invariants(), and getString().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reopening this so this comment is seen. Marking these function final without the override is causing gcc8 to warn; we need both the final and the override. However, some of these classes are already marked as final. For final classes, I would not mark these functions as final (just to cut down on the noise of final override). I'll make a patch to resolve the warnings.

Copy link
Collaborator

@seelabs seelabs Nov 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbougalis
Here's a patch to reformat with clang-format: seelabs@e9c5db5
Here's a patch to resolve the suggest-override warnings: seelabs@814a529

Hopefully this will resolve the CI issues. Also note, I'm not able to test those locally, as I do not have access to gcc 8 (gcc 9 and up do not warn on this).

src/ripple/app/ledger/impl/LedgerMaster.cpp Outdated Show resolved Hide resolved
src/ripple/shamap/SHAMap.h Outdated Show resolved Hide resolved
src/ripple/app/ledger/impl/LedgerMaster.cpp Show resolved Hide resolved
@@ -64,6 +64,10 @@ class Message : public std::enable_shared_from_this<Message>
int type,
boost::optional<PublicKey> const& validator = {});

/** Retrieve the size of the packed but uncompressed message data. */
std::size_t
getBufferSize();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBufferSize() const?

src/ripple/app/ledger/impl/LedgerMaster.cpp Show resolved Hide resolved
Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Very nice set of cleanups. We do need to reformat this with clang-format before it's merged tho.

src/ripple/app/ledger/impl/InboundLedgers.cpp Outdated Show resolved Hide resolved
src/ripple/app/ledger/impl/LedgerMaster.cpp Show resolved Hide resolved
src/ripple/shamap/impl/SHAMap.cpp Show resolved Hide resolved

if ((pEnd - pBegin) & 1)
*pOut++ = charUnHex(*pBegin++);
throw c;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this improved API for base_uint, but I don't recommend using exceptions for flow control. It is expensive both in terms of performance and code size. Furthermore the static_cast is in the wrong place to be effective since the expression (hi << 4) + lo promotes everything back to int.

Here is what I recommend:

    [[nodiscard]] bool
    parseHex(std::string_view sv)
    {
        zero();

        if (sv == "0")
            return true;

        if (sv.size() != bytes * 2)
            return false;

        auto out = data();

        auto in = sv.begin();

        while (in != sv.end())
        {
            auto const hi = charUnHex(*in++);
            auto const lo = charUnHex(*in++);
            if (hi == -1 || lo == -1)
                return false;
            *out++ = static_cast<value_type>((hi << 4) + lo);
        }

        return true;
    }

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I left a few notes on the changes since my last review. Also, I think Message::getBufferSize() could be const. But none of the notes I left are mandatory. Thanks for your work here!


SHAMapTreeNode publicly inherits directly from SHAMapAbstractNode. It holds the
SHAMapLeafNode publicly inherits directly from SHAMapAbstractNode. It holds the
following data:

1. A shared_ptr to a const SHAMapItem.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the section in question (since it didn't change), but I looked through the section starting on line 326, SHAMap Improvements. Are there any changes suggested by that section which have not been done yet? Would it make sense to remove that section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look.

@@ -196,7 +196,7 @@ class SHAMapAbstractNode

/** Make a copy of this node, setting the owner. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to change "owner" to "cowid" in this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -337,36 +338,36 @@ class SHAMapInnerNode : public SHAMapAbstractNode,

// SHAMapTreeNode represents a leaf, and may eventually be renamed to reflect
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment looks obsolete.

@param hash The hash associated with this node, if any.
*/
/** @{ */
explicit SHAMapAbstractNode(std::uint32_t owner) noexcept : cowid_(owner)
explicit SHAMapAbstractNode(std::uint32_t cowid) noexcept : cowid_(cowid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this file, SHAMapTreeNode, no longer represents any class in the code base. You could consider renaming the file to SHAMapAbstractNode. But I think maybe a better idea would be to rename the SHAMapAbstractNode class to SHAMapTreeNode, since that is precisely what a SHAMapAbstractNode is.

Not a requirement, just a thought.

@@ -568,7 +572,7 @@ SHAMapInnerNode::setFullBelowGen(std::uint32_t gen)

// SHAMapTreeNode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated.

src/ripple/app/ledger/impl/LedgerMaster.cpp Show resolved Hide resolved
@@ -17,14 +17,14 @@ or account state can be used to navigate the trie.
A `SHAMap` is a trie with two node types:

1. SHAMapInnerNode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README.md still contains a few references to SHAMapTreeNode. Are those intentional, or should they be expunged?

@seelabs
Copy link
Collaborator

seelabs commented Nov 13, 2020

The CI issues need to be resolved as well (mostly looks like missing overrides).

Edit: I do not see these warnings on my local machine. The difference seems to be gcc 10 does not warn about missing overrides when you include final (correctly, IMHO). But gcc 8, which we use for CI does warn.

The existing SHAMapNodeID object has both a valid and an invalid state
and requirs callers to verify the state of an instance prior to using
it. A simple set of changes removes that restriction and ensures that
all instances are valid, making the code more robust.

This change also:

1. Introduces a new function to construct a SHAMapNodeID from a
   serialized blob; and
2. Reduces the amount of constructors the class exposes.
- Provide separate functions for serializing depending on whether
  one wants a "wire" version of a node, or one suitable for hashing.
- Remove unused functions
- Simplify and consolidate code for parsing hex input.
- Replace beast::endian::order with boost::endian::order.
- Simplify CountedObject code.
- Remove pre-C++17 workarounds in favor of C++17 based solutions.
- Improve `base_uint` and simplify its hex-parsing interface by
  consolidating the `SexHex` and `SetHexExact` methods into one
  API: `parseHex` which forces callers to verify the result of
  the operation; as a result some public-facing API endpoints
  may now return errors when passed values that were previously
  accepted.
- Remove the simple fallback implementations of SHA2 and RIPEMD
  introduced to reduce our dependency on OpenSSL. The code is
  slow and rarely, if ever, exercised and we rely on OpenSSL
  functionality for Boost.ASIO as well.
The Travis build matrix included builds for gcc 7 and clang 7 which
do not appear to have sufficient C++17 support to compile rippled
any longer.
This commit combines a number of cleanups, targeting both the
code structure and the code logic. Large changes include:

 - Using more strongly-typed classes for SHAMap nodes, instead of relying
   on runtime-time detection of class types. This change saves 16 bytes
   of memory per node.
 - Improving the interface of SHAMap::addGiveItem and SHAMap::addItem to
   avoid the need for passing two bool arguments.
 - Documenting the "copy-on-write" semantics that SHAMap uses to
   efficiently track changes in individual nodes.
 - Removing unused code and simplifying several APIs.
 - Improving function naming.
@seelabs
Copy link
Collaborator

seelabs commented Nov 17, 2020

We've got a linker error on mac:

Undefined symbols for architecture x86_64:

  "std::__1::__itoa::__u64toa(unsigned long long, char*)", referenced from:

      std::__1::__itoa::__traits_base<unsigned long long, void>::__convert(unsigned long long, char*) in libxrpl_core.a(unity_6_cxx.cxx.o)

@HowardHinnant
Copy link
Contributor

This comes from a new use of std::to_chars and does not happen except on platforms where the macOS is older than current-3 (I'm testing on current-3 and am linking). Apple simply does not ship std::to_chars on the older versions of macOS.

I think this is safe to ignore.

@scottschurr
Copy link
Collaborator

Great continues to look even better. I found one last remaining reference to SHAMapAbstractNode in a comment in SHAMap.cpp. It looks like this:

    @note The node must have already been unshared by having the caller
          first call SHAMapAbstractNode::share().

I think if you change that to SHAMapTreeNode::share() you'll have it all cleaned out.

@carlhua carlhua added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Nov 18, 2020
@nbougalis nbougalis removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Nov 18, 2020
@seelabs
Copy link
Collaborator

seelabs commented Nov 19, 2020

@nbougalis This patch should fix the CI linker error: eae4367

@nbougalis
Copy link
Contributor Author

I've addressed all remaining concerns.

@nbougalis nbougalis linked an issue Nov 23, 2020 that may be closed by this pull request
Copy link
Collaborator

@seelabs seelabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nbougalis The mac linker errors still appear to be there. We need the patch referenced in this comment: #3646 (comment)

Edit: I see it's been added to the rollup PR. NVM.

Copy link
Collaborator

@gregtatcam gregtatcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@nbougalis nbougalis mentioned this pull request Dec 4, 2020
@carlhua carlhua added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 4, 2020
@nbougalis nbougalis deleted the base_uint2 branch October 16, 2023 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation README changes, code comments, etc. Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. Tech Debt Non-urgent improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor base_uint SetHex and SetHexExact methods
6 participants