[IBD patch 4] Deprioritise slower downloaders (bitcoin#970)

* Update the block availaibility for all nodes during IBD One of the problems with IBD is that we end up downloading blocks from the same peer. This is because after downloading all the headers at startup the block availability is only being updated for that peer. We need to assume that all connected or newly connected peers have the blocks that we need but we only do this during IBD or when IsInitialBlockDownload() is true. * Increase the block download window to 1024 This helps to keep IBD moving forward when we have a peer that may be serving us blocks more slowly. And now that blockvailability is being updated correctly we are getting blocks from all connected peers so the chance of having a slow peer is now much higher. * Reset the single peer request mode age to 24 hours One week isn't needed anymore because the new DAA is working well and we're not falling behind in terms of blocks mined per day. * Remove the check for node ping in the request manager This feature is causing occasional hangups during IBD and is not really effective in selecting peers especially during IBD. Since it is not used when the chain is sync'd then there is no need for this feature to remain which only adds another level of complexity to IBD. KISS can be applied here for a better IBD experience. * Only update xthin stats when IsChainNearlySyncd() There is no need to lock vNodes and check all peers for thinblocks in flight unless we are IsChainNearlySyncd() because we would not have asked for any xthins if !IsChainNearlySyncd(). This is an expensive operation and which makes a performance hit during IBD. * Disconnect non NETWORK nodes during initial sync We only want to have nodes we can download the full blockchain from, connected to us while we do our initial sync. * Increase the default max outbound connections to 16 Having a few more outbound connections can help especially during the process of IBD. * During IBD, batch the block reqeusts if possible. By batching the blocks together we save a little bandwidth and time in requesting blocks. This method can then be applied to batching transactions which should make a very good improvement during periods where transaction rates are high. * Replace missing cs_main lock * fix formatting * Fix locking order issue with cs_objectDownloader and cs_vNodes cs_objectDownloader must be locked first to prevent a possible deaklock. * Fix formatting * Use Logging namespace to access LogAcceptCategory LogAcceptCategory and the enum used to define available logging category is defined inside the Logging namespace. If used directly outside of util.{h,cpp} reference to such namespace is needed. This commit fix issue bitcoin#950 * Don't favor XTHIN nodes when sending messages This has a negative effect on IBD * Use more descriptive variable names for nodes we are disconnecting ptemp1 and ptemp2 are getting hard to follow. We'll say what they are: pNonXthinNode and pNonNodeNetwork * Take the cs_objDownloader lock earlier when requesting multiple objects This prevents anyone else from asking for these same objects before we've notified the request manager of their existence. This could happen for in stance during IBD where we look for the next blocks to download but before we've notified the request manager of them all we then go back and possibly request them again. Also keeping the lock here allows the request manager to prepare a better batch request of blocks and thereby keep a better order of block requests. * Call MarkBlocksAsInFlight() before asking for blocks This ensures you don't receive any blocks back before you mark the block in flight as could happen on regtest. * Add/release node refs when making batch requests We need to track the node refs correctly so we don't get disconnected before we're done our work. * Don't ask for the same blocks twice when doing IBD If we've already asked for a block, we don't have to ask for it again in FindNextBlocksToDownload(), instead we can rely o the request manager handle potential re-requests. * Calculate MaxBlocksInTransitPerPeer, but on an individual node basis Before we were using the overall resonse times regardless of which peer they came from. This made the selection of how many blocks to donwload at one time not responsive to faster or slower nodes. By tracking response times on a per node basis we can deprioritize slower downloaders and get more blocks from the faster nodes. In HEADERS net processing, remove MAX_BLOCKS_IN_TRANSIT_PER_PEER This has been replaced with pnode->nMaxBlocksInTransitPerPeer use atomic store, load, fetch_add * Fixes bitcoin#941: Rotate vNodes by one peer every 60 seconds when sending messages We do this only during IBD and this has the effect of distributing the load more evenly between the peers. Previously, because we are using PV, the very first peer to connect would always be favored and we would end up downloading a disproportionate amount of blocks from that peer during IBD. However, we still will download more from some peers based on their download response time performance. * Small net optimization when sending messages Just copy the entire vector rather than bothering to allocate memory and then push_back for each entry as we iteration through vNodes. Make sure to LOCK vNodes then add the ref * Fix getchaintips.py spurious failures The hang results from a PV bug which is caused by a condition where threads that are waiting for validation are forced to quit during chain tip rollback. We don't in fact have to force any threads to quit since any validation threads on the current chain will fail automatically once we disconnect the tip. And then the waiting block validation thread for the new chain can still continue without being inadvertanly forced to quit. * Do not to use an incorrect or invalid state Use the node id from the iterator rather than looking up state by the node we think we just passed in. Just in case they don't match for some reason. Also DbgAssert if the state is NULL and return false if we assert. * Small edit to logprint Add category BLK to logprint remove printf * During IBD only request blocks from peers that have block availability Request from peers that are NODE_NETWORK and have demonstrated that they have the blocks that we need. Also move ProcessBlockAvailablity and UpdateBlockAvailabilty to RequestManager.cpp * Move FindNextBlocksToDownload() to the requestManager.cpp Also change all NULL's to nullptr. * Move MarkBlockAsInFlight() and MarkBlockAsReceived() to requestManager.cpp This ia a Move only. Alhtough there are edits in order to bring these into the requestManager class structure there are no logic changes. C++ nullptr replaces NULL's * Update block availablity during the process of downloading headers During initial sync we have to download a set of all headers but we also need to update the block availabiity for each connected peer so that when the request manager starts downloading blocks it can have the correct list of peers that have available blocks to download from, rather than just assuming every peer has all blocks. * EXIT cs_objDownloader earlier. We don't have to repeatedly take and release the locks. Just do it once for all items in the entire batch request. * Only ask for headers to updateblockavailability if chain work is behind When doing the initial request for headers and we're in the process of also updating the block availability for each connected peer we only need to ask for a header if any peer is actually behind in terms of chain work. This prevents first of all requesting headers we don't need but also any possible attack where we're being fed an invalid group of headers which then causes us to request large number of additional headers.
ptschip · Mar 2, 2018 · f7cb6ff · f7cb6ff
1 parent dbe1288
commit f7cb6ff
Show file tree

Hide file tree

Showing 10 changed files with 632 additions and 447 deletions.
diff --git a/src/main.cpp b/src/main.cpp
diff --git a/src/main.h b/src/main.h
@@ -105,12 +105,6 @@ static const unsigned int VERACK_TIMEOUT = 60;
 /** Number of headers sent in one getheaders result. We rely on the assumption that if a peer sends
  *  less than this number, we reached its tip. Changing this value is a protocol upgrade. */
 static const unsigned int MAX_HEADERS_RESULTS = 2000;
-/** Size of the "block download window": how far ahead of our current height do we fetch?
- *  Larger windows tolerate larger download speed differences between peer, but increase the potential
- *  degree of disordering of blocks on disk (which make reindexing and in the future perhaps pruning
- *  harder). We'll probably want to make this a per-peer adaptive value at some point. */
-// static const unsigned int BLOCK_DOWNLOAD_WINDOW = 1024;
-/** Time to wait (in seconds) between writing blocks/block index to disk. */
 static const unsigned int DATABASE_WRITE_INTERVAL = 60 * 60;
 /** Time to wait (in seconds) between flushing chainstate to disk. */
 static const unsigned int DATABASE_FLUSH_INTERVAL = 24 * 60 * 60;
@@ -435,9 +429,6 @@ bool TestLockPointValidity(const LockPoints *lp);
  */
 bool CheckSequenceLocks(const CTransaction &tx, int flags, LockPoints *lp = NULL, bool useExistingLockPoints = false);
 
-/** Update tracking information about which blocks a peer is assumed to have. */
-void UpdateBlockAvailability(NodeId nodeid, const uint256 &hash);
-
 /**
  * Class that keeps track of number of signature operations
  * and bytes hashed to compute signature hashes.

diff --git a/src/net.cpp b/src/net.cpp
@@ -1745,7 +1745,8 @@ void ThreadOpenConnections()
         int nOutbound = 0;
         int nThinBlockCapable = 0;
         set<vector<unsigned char> > setConnected;
-        CNode *ptemp = nullptr;
+        CNode *pNonXthinNode = nullptr;
+        CNode *pNonNodeNetwork = nullptr;
         bool fDisconnected = false;
         {
             LOCK(cs_vNodes);
@@ -1759,7 +1760,11 @@ void ThreadOpenConnections()
                     if (pnode->ThinBlockCapable())
                         nThinBlockCapable++;
                     else
-                        ptemp = pnode;
+                        pNonXthinNode = pnode;
+
+                    // If sync is not yet complete then disconnect any pruned outbound connections
+                    if (IsInitialBlockDownload() && !(pnode->nServices & NODE_NETWORK))
+                        pNonNodeNetwork = pnode;
                 }
             }
             // Disconnect a node that is not XTHIN capable if all outbound slots are full and we
@@ -1768,9 +1773,18 @@ void ThreadOpenConnections()
             if (nOutbound >= nMaxOutConnections && nThinBlockCapable <= min(nMinXthinNodes, nMaxOutConnections) &&
                 nDisconnects < MAX_DISCONNECTS && IsThinBlocksEnabled() && IsChainNearlySyncd())
             {
-                if (ptemp != nullptr)
+                if (pNonXthinNode != nullptr)
                 {
-                    ptemp->fDisconnect = true;
+                    pNonXthinNode->fDisconnect = true;
+                    fDisconnected = true;
+                    nDisconnects++;
+                }
+            }
+            else if (IsInitialBlockDownload())
+            {
+                if (pNonNodeNetwork != nullptr)
+                {
+                    pNonNodeNetwork->fDisconnect = true;
                     fDisconnected = true;
                     nDisconnects++;
                 }
@@ -1793,8 +1807,12 @@ void ThreadOpenConnections()
                 MilliSleep(500);
                 {
                     LOCK(cs_vNodes);
-                    if (find(vNodes.begin(), vNodes.end(), ptemp) == vNodes.end())
+                    if (find(vNodes.begin(), vNodes.end(), pNonXthinNode) == vNodes.end() ||
+                        find(vNodes.begin(), vNodes.end(), pNonNodeNetwork) == vNodes.end())
+                    {
+                        nOutbound--;
                         break;
+                    }
                 }
             }
         }
@@ -2081,23 +2099,22 @@ void ThreadMessageHandler()
         vector<CNode *> vNodesCopy;
         {
             LOCK(cs_vNodes);
-            vNodesCopy.reserve(vNodes.size());
-            // Prefer thinBlockCapable nodes when doing communications.
-            for (CNode *pnode : vNodes)
+
+            // During IBD and because of the multithreading of PV we end up favoring the first peer that
+            // connected and end up downloading a disproportionate amount of data from that first peer.
+            // By rotating vNodes evertime we send messages we can alleviate this problem.
+            // Rotate every 60 seconds so we don't do this too often.
+            static int64_t nLastRotation = GetTime();
+            if (IsInitialBlockDownload() && vNodes.size() > 0 && GetTime() - nLastRotation > 60)
             {
-                if (pnode->ThinBlockCapable())
-                {
-                    vNodesCopy.push_back(pnode);
-                    pnode->AddRef();
-                }
+                std::rotate(vNodes.begin(), vNodes.end() - 1, vNodes.end());
+                nLastRotation = GetTime();
             }
+
+            vNodesCopy = vNodes;
             for (CNode *pnode : vNodes)
             {
-                if (!pnode->ThinBlockCapable())
-                {
-                    vNodesCopy.push_back(pnode);
-                    pnode->AddRef();
-                }
+                pnode->AddRef();
             }
         }
 
@@ -2807,6 +2824,8 @@ CNode::CNode(SOCKET hSocketIn, const CAddress &addrIn, const std::string &addrNa
     nXthinBloomfilterSize = 0;
     addrFromPort = 0; // BU
     nLocalThinBlockBytes = 0;
+    nAvgBlkResponseTime = -1.0;
+    nMaxBlocksInTransit = 16;
 
     nMisbehavior = 0;
     fShouldBan = false;

diff --git a/src/net.h b/src/net.h
@@ -79,7 +79,7 @@ static const size_t SETASKFOR_MAX_SZ = 2 * MAX_INV_SZ;
 /** The maximum number of peer connections to maintain. */
 static const unsigned int DEFAULT_MAX_PEER_CONNECTIONS = 125;
 /** BU: The maximum numer of outbound peer connections */
-static const unsigned int DEFAULT_MAX_OUTBOUND_CONNECTIONS = 12;
+static const unsigned int DEFAULT_MAX_OUTBOUND_CONNECTIONS = 16;
 /** BU: The minimum number of xthin nodes to connect */
 static const uint8_t MIN_XTHIN_NODES = 8;
 /** BU: The daily maximum disconnects while searching for xthin nodes to connect */
@@ -392,6 +392,10 @@ class CNode
     uint32_t nXthinBloomfilterSize; // The maximum xthin bloom filter size (in bytes) that our peer will accept.
     // BUIP010 Xtreme Thinblocks: end section
 
+    CCriticalSection cs_nAvgBlkResponseTime;
+    double nAvgBlkResponseTime;
+    std::atomic<int64_t> nMaxBlocksInTransit;
+
     unsigned short addrFromPort;
 
 protected:

diff --git a/src/nodestate.cpp b/src/nodestate.cpp
@@ -11,16 +11,17 @@
 */
 CNodeState::CNodeState()
 {
-    pindexBestKnownBlock = NULL;
+    pindexBestKnownBlock = nullptr;
     hashLastUnknownBlock.SetNull();
-    pindexLastCommonBlock = NULL;
-    pindexBestHeaderSent = NULL;
+    pindexLastCommonBlock = nullptr;
+    pindexBestHeaderSent = nullptr;
     fSyncStarted = false;
     nDownloadingSince = 0;
     nBlocksInFlight = 0;
     nBlocksInFlightValidHeaders = 0;
     fPreferredDownload = false;
     fPreferHeaders = false;
+    fRequestedInitialBlockAvailability = false;
 }
 
 /**
@@ -33,6 +34,6 @@ CNodeState *State(NodeId pnode)
 {
     std::map<NodeId, CNodeState>::iterator it = mapNodeState.find(pnode);
     if (it == mapNodeState.end())
-        return NULL;
+        return nullptr;
     return &it->second;
 }
diff --git a/src/nodestate.h b/src/nodestate.h
@@ -48,6 +48,9 @@ struct CNodeState
     bool fFirstHeadersReceived;
     //! Our current block height at the time we requested GETHEADERS
     int nFirstHeadersExpectedHeight;
+    //! During IBD we need to update the block availabiity for each peer. We do this by requesting a header
+    //  when a peer connects and also when we ask for the initial set of all headers.
+    bool fRequestedInitialBlockAvailability;
 
     std::list<QueuedBlock> vBlocksInFlight;
     //! When the first entry in vBlocksInFlight started downloading. Don't care when vBlocksInFlight is empty.