Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing support for emptyBlockPeriodSeconds in QBFT (Issue #3810) #6965

Merged
merged 44 commits into from
Sep 24, 2024

Conversation

amsmota
Copy link
Contributor

@amsmota amsmota commented Apr 18, 2024

PR description

Implementing xemptyblockperiodseconds for producing empty blocks at a specific interval independently of the value of the existing blockperiodseconds setting

Fixed Issue(s)

#3810

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

@amsmota
Copy link
Contributor Author

amsmota commented Apr 19, 2024

Hi @siladu and @matthew1001 can youse please review this one?

Cheers.

Copy link
Contributor

@matthew1001 matthew1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments. Was there any follow up to my earlier comment about what the behaviour should be if a new TX arrives during an empty block interval? I'm not sure waiting the entire remaining empty block interval to include the TX in a block is the right thing to do.

@matthew1001
Copy link
Contributor

Hi @siladu and @matthew1001 can youse please review this one?

Cheers.

Thanks for updating the PR @amsmota I've left a few more comments, the main one being about what the behaviour is when a new TX arrives during the empty block period?

@amsmota
Copy link
Contributor Author

amsmota commented Apr 22, 2024

Left a few more comments. Was there any follow up to my earlier comment about what the behaviour should be if a new TX arrives during an empty block interval? I'm not sure waiting the entire remaining empty block interval to include the TX in a block is the right thing to do.

Hi @matthew1001 , thanks for that. In relation to that comment above I did reply in the previous PR, as per my tests everithing looks OK, please look there the test log.

#6944 (comment)

I'll review your new comments ASAP too.

Cheers.

Signed-off-by: amsmota <amsmota@gmail.com>
@macfarla macfarla mentioned this pull request Apr 24, 2024
Signed-off-by: amsmota <amsmota@gmail.com>
@matthew1001
Copy link
Contributor

I've just pulled your latest commits on the PR @amsmota and I'm still seeing the behaviour of a new block not being produced if a TX arrives during the empty block period. E.g. if I have 2 second block interval, 30 second empty block interval the behaviour is:

  1. Block 1 - empty
    • (30 seconds)
  2. Block 2 - empty
    • (30 seconds)
  3. Block 3 - empty
    • (10 seconds pass, then I submit a new TX)
    • (20 more seconds until...)
  4. Block 4 - 1 TX

I'm not sure that's the right behaviour. I think it probably ought to be something like:

  1. Block 1 - empty
    • (30 seconds)
  2. Block 2 - empty
    • (30 seconds)
  3. Block 3 - empty
    • (10 seconds pass, then I submit a new TX)
    • block period timer of 2 seconds starts
    • (2 more seconds until...).
  4. Block 4 - 1 TX

It would be good to see what others think though. @siladu any thoughts?

@amsmota
Copy link
Contributor Author

amsmota commented Apr 24, 2024

It is not the behaviour I see in my tests. Did you saw the comment I mentioned above?

#3810 (comment)

2024-04-13 **15:10:57**.174+01:00 | QbftBesuControllerBuilder | Imported empty block #49 / 0 tx / 3 pending / 0 (0.0%) gas / (0x0ee851152f1a6faaa199d3f595234141c050966f7be46a65241f01908ce34d07)
2024-04-13 **15:11:27**.233+01:00 | PeerDiscoveryAgent | Writing node record to disk. NodeRecord{seq=2, publicKey=0x027c19f5309e268264c9a986704a2f24e01f466a541ef910f68cc831324ba1ee2c, udpAddress=Optional[/127.0.0.1:30303], tcpAddress=Optional[/127.0.0.1:30303], asBase64=-Je4QKHhLqyEBF13upUKqj-i_cALkD-_-h3yPWZgkdFdnsjZZZCyzpd63wDaWlVWriBpGu_JPciWtEhwgqy-XSiQp0YCg2V0aMfGhOcD3GaAgmlkgnY0gmlwhH8AAAGJc2VjcDI1NmsxoQJ8GfUwniaCZMmphnBKLyTgH0ZqVB75EPaMyDEyS6HuLIN0Y3CCdl-DdWRwgnZf, nodeId=0xc92329278474852a32bd72b721afbac4d309b7fa6e40862eaee847cc8f544045, customFields={tcp=30303, udp=30303, ip=0x7f000001, eth=[[0xe703dc66, 0x]], id=V4, secp256k1=0x027c19f5309e268264c9a986704a2f24e01f466a541ef910f68cc831324ba1ee2c}}
2024-04-13 **15:11:27**.258+01:00 | QbftBesuControllerBuilder | Imported empty block #50 / 0 tx / 3 pending / 0 (0.0%) gas / (0x4a43cebe16dc6a3c341cb92b109404580bb1bbe756cd2c855ecdc826d14ea9c1)
2024-04-13 **15:11:57**.359+01:00 | QbftBesuControllerBuilder | Imported empty block #51 / 0 tx / 3 pending / 0 (0.0%) gas / (0x02b83a1b6d33dcbafed7c6ceae9bd32cd757874151fb9232124681ac41befba1)
2024-04-13 **15:12:02**.377+01:00 | QbftBesuControllerBuilder | **Produced block #52** / 1 tx / 3 pending / 3,000,000 (63.8%) gas / (0x32dd8d703933b1de951c7f99d3a5e50b66e573494ecf8eafe003df578df1e204)
2024-04-13 **15:12:32**.236+01:00 | QbftBesuControllerBuilder | Imported empty block #53 / 0 tx / 3 pending / 0 (0.0%) gas / (0xc51bba3cc60545422f6f920827790ed5ba75306608e494501e970ae1cd84fbe9)

Look at non-empty block 52, it was produced after 5 seconds of the the last empty block and then the next one 30 seconds after (an empty-block). This was with

"blockperiodseconds" : 5,
"emptyblockperiodseconds" : 30

Can you show me the genesis config you used? And if you can post some logs please do.

Anyway, that test was 2 weeks ago, I'll repeat the test asap...

I've just pulled your latest commits on the PR @amsmota and I'm still seeing the behaviour of a new block not being produced if a TX arrives during the empty block period. E.g. if I have 2 second block interval, 30 second empty block interval the behaviour is:

  1. Block 1 - empty

    • (30 seconds)
  2. Block 2 - empty

    • (30 seconds)
  3. Block 3 - empty

    • (10 seconds pass, then I submit a new TX)
    • (20 more seconds until...)
  4. Block 4 - 1 TX

I'm not sure that's the right behaviour. I think it probably ought to be something like:

  1. Block 1 - empty

    • (30 seconds)
  2. Block 2 - empty

    • (30 seconds)
  3. Block 3 - empty

    • (10 seconds pass, then I submit a new TX)
    • block period timer of 2 seconds starts
    • (2 more seconds until...).
  4. Block 4 - 1 TX

It would be good to see what others think though. @siladu any thoughts?

amsmota added 4 commits April 24, 2024 18:39
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
@matthew1001
Copy link
Contributor

matthew1001 commented Apr 25, 2024

Sorry I couldn't find your previous comment, but that's great if that's the current behaviour. I think I pulled and re-built with your latest changes but I'll check. Here's the QBFT config I used:

"qbft" : {
      "blockperiodseconds" : 2,
      "emptyblockperiodseconds" : 30,
      "epochlength" : 30000,
      "requesttimeoutseconds" : 4
    }

I was using a single node to test though. It's possible that just using a single node gives different behaviour?

@amsmota
Copy link
Contributor Author

amsmota commented Apr 25, 2024

Sorry I couldn't find your previous comment, but that's great if that's the current behaviour. I think I pulled and re-built with your latest changes but I'll check. Here's the QBFT config I used:

"qbft" : {
      "blockperiodseconds" : 2,
      "emptyblockperiodseconds" : 30,
      "epochlength" : 30000,
      "requesttimeoutseconds" : 4
    }

I was using a single node to test though. It's possible that just using a single node gives different behaviour?

Please note that I'm not dismissing your findings, as I said the tests I did (my original post is in the Issue itself) were done 2 weeks ago, there were some merges and commits by now. I will test again as soon as I can, and I will test the single-node scenario. I suspect that won't behave as intended, at least considering that having my 4 nodes network with 1 node not correctly configured gave me all those issues I mentioned above.

My local network of 4 nodes was created following this page.

The rest of your comments are now all implemented too.

I see below those 4 workflows awaiting approval, what should I do to be able to run them?

@amsmota
Copy link
Contributor Author

amsmota commented Apr 25, 2024

Just tested again the emnpty/non-empty block times, it seems good to me. It's based on

  "blockperiodseconds": 10,
  "emptyblockperiodseconds": 30,

The block that contains 1 tx is the line 5: so the flow is

1: --> 30s --> 2: --> 30s --> 3: --> 30s --> 4: --> 10s --> 5: --> 30s --> 6: --> 30s --> 7: --> 30s --> 8:

1: 2024-04-25 16:22:10.096+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #88 / 0 tx / 0 pending / 0 (0.0%) gas / (0x77729663d055235bb84c8bd80425010dbce8dba6a348956d21352b9bfe30f3d8)
2: 2024-04-25 16:22:40.154+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #89 / 0 tx / 1 pending / 0 (0.0%) gas / (0x7224a81a9d0a53a23660d7103a859cbf09970a72f257ea4ce494ebe010e32e03)
3: 2024-04-25 16:23:10.148+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #90 / 0 tx / 0 om / 0 (0.0%) gas / (0x5a4d0eb71ff7ea9c31fb49df659a7b0c97a7ff4d8e62202a40de98ff52e01df3) in 0.005s. Peers: 3
4: 2024-04-25 16:23:40.123+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #91 / 0 tx / 0 om / 0 (0.0%) gas / (0xe11cf442d69007de17b954ddceaa527c1adad32ca35743d35ffd5ca120ef77af) in 0.001s. Peers: 3
5: 2024-04-25 16:23:50.138+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #92 / 1 tx / 0 om / 3,000,000 (63.8%) gas / (0x7825a0381ee19519db2880f0546af325b15fd417a777052e53b83dd9bd865ebc) in 0.009s. Peers: 3
6: 2024-04-25 16:24:20.189+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #93 / 0 tx / 0 pending / 0 (0.0%) gas / (0x3a2f5bdc9d1976d97165b453925d9635d55e0165138ec77be9286dae671b937e)
7: 2024-04-25 16:24:50.183+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #94 / 0 tx / 0 pending / 0 (0.0%) gas / (0x032cdc8694057113331d3a6e4a2e743dcb4bfe77f2c9702725e8036b28836bea)
8: 2024-04-25 16:25:20.230+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #95 / 0 tx / 0 pending / 0 (0.0%) gas / (0x6cf6facb3fd887b490ab140cf76335b92eb7b503029c3c8543908123a2634ccc)

Anyway, to be sure is better if somebody else confirms this data.

@amsmota
Copy link
Contributor Author

amsmota commented Apr 25, 2024

One thing I notice in those logs above, in the previous logs I copied from tests all(most) blocks, profuced or imported, appeared on the QbftBesuControllerBuilder output, which is not the case now:

2024-04-25 16:22:10.096+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #88 / 0 tx / 0 pending / 0 (0.0%) gas / (0x77729663d055235bb84c8bd80425010dbce8dba6a348956d21352b9bfe30f3d8)
2024-04-25 16:22:40.154+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #89 / 0 tx / 1 pending / 0 (0.0%) gas / (0x7224a81a9d0a53a23660d7103a859cbf09970a72f257ea4ce494ebe010e32e03)
2024-04-25 16:23:10.148+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #90 / 0 tx / 0 om / 0 (0.0%) gas / (0x5a4d0eb71ff7ea9c31fb49df659a7b0c97a7ff4d8e62202a40de98ff52e01df3) in 0.005s. Peers: 3
2024-04-25 16:23:40.123+01:00 | EthScheduler-Workers-0 | INFO  | PersistBlockTask | Imported #91 / 0 tx / 0 om / 0 (0.0%) gas / (0xe11cf442d69007de17b954ddceaa527c1adad32ca35743d35ffd5ca120ef77af) in 0.001s. Peers:  3

I wonder if this is expected ot it may indicate some issue...

I actually added some BPs on QbftBesuControllerBuilder and PersistBlockTask to check the upflow but I couldn't see any logic on the diference...

@matthew1001
Copy link
Contributor

I see below those 4 workflows awaiting approval, what should I do to be able to run them?

I've just approved them to run, so they're all going now

@matthew1001
Copy link
Contributor

Anyway, to be sure is better if somebody else confirms this data.

I'll pull your latest commits today and try with a 1 node and 4 node network, but given your findings I expected it's working fine now.

@amsmota
Copy link
Contributor Author

amsmota commented Apr 29, 2024

Anyway, to be sure is better if somebody else confirms this data.

I'll pull your latest commits today and try with a 1 node and 4 node network, but given your findings I expected it's working fine now.

I'm afraid I found a issue when testing with one node only but unfortunately I couldn't replicate again yet. It seems it happens with 2 non-empty blocks in a row:

...2024-04-29 13:04:30.025+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #72 / 0 tx / 0 pending / 0 (0.0%) gas / (0x0ed61b79ce98b122aca8c20088d514c0e2923d1047844fedabb031e6c2dc19f6)
...2024-04-29 13:05:00.030+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #73 / 0 tx / 0 pending / 0 (0.0%) gas / (0x76ca568e1d313d4f0e4b2e46493b0bf05948a0c18b7f01445e6dbcaf211ca9c2)
***2024-04-29 13:05:10.047+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #74 / 1 tx / 0 pending / 3,000,000 (63.8%) gas / (0xa81a1f4d165a8f13c70e2837e29b04b2434027ca4606e5d740ea9035e3006494)
***2024-04-29 13:05:40.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #75 / 1 tx / 0 pending / 3,000,000 (63.8%) gas / (0x1fda1fa7735207af755385f4a6d3686b33154e0621c754427e401ec58f8c847d)
...2024-04-29 13:06:10.034+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #76 / 0 tx / 0 pending / 0 (0.0%) gas / (0x5313c0d4ab90ef06698758268d9e5d1f08fbea769c86f7901733410b03e34c3b)
...2024-04-29 13:06:40.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #77 / 0 tx / 0 pending / 0 (0.0%) gas / (0xcc5b47a2528df351db18a8500c30a8a554a45fa97170db1ae5034ce8f254ff19)

Block 74 is created correctly but the next one is not, it should take 10 seconds and it took 30... It's this what you observed @matthew1001 ?

I'm testing with remix and metamask, and it's hard to send 2 transactions in a row, is there a more expedite way to do it?

Cheers.

Signed-off-by: amsmota <amsmota@gmail.com>
@matthew1001
Copy link
Contributor

matthew1001 commented Apr 29, 2024

I've just pulled your latest changes and re-tested and I do see the same issue still. I've captured some log snippets of Besu with timestamps and a hardhat script with timestamps to can see the timings. This is a 4 validator network with 2 second block period, 30 second empty block period:

Here are the Besu logs:

2024-04-29 17:20:05.062+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #644 / 0 tx / 0 pending / 0 (0.0%) gas / (0xedba0be8a01e17273fb5f452e8098db74d0e503edc264615b2db58a1b64602e5)
2024-04-29 17:20:35.084+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #645 / 0 tx / 0 pending / 0 (0.0%) gas / (0xc56079c814d91d7098394036a7a4dd6d55353627b7b4f442e69b418d108e5d06)
2024-04-29 17:21:05.281+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #646 / 10 tx / 0 pending / 19,048,280 (0.0%) gas / (0x11e38bc403d73d7657580365d334f90560c277768924067f14ce3a36695f7e45)

Here is the hardhat script which deploys 10 contracts and prints the timestamp before and after:

Mon Apr 29 17:20:39 BST 2024
Next nonce (latest):  351
Deployed contract. Address = 0xf7ED3b17A04A3bD2335b52D27821199C1B4e597E
Deployed contract. Address = 0xcF2D1dD61af473529ee0b539D7d0b14290Eb50Ad
Deployed contract. Address = 0x05B59cdc7244e378E9d75b0aF3BbB211ffC3d9D9
Deployed contract. Address = 0x1B4f02a84534a6576844607b3c8E4e60aF311B38
Deployed contract. Address = 0x66EA9eF99fc1E3650712c4194090B0C65128A58A
Deployed contract. Address = 0x6AF6D200c6047dab80420ABfab8bb09004A8C1A8
Deployed contract. Address = 0x4de391161FaCe36eAb1135368267b74490119329
Deployed contract. Address = 0x19f86338cE087E7D4cE087Cc25B8B07ED810CEF2
Deployed contract. Address = 0x3849a4B696Ea143D59DFbcC4AaF32ad47Fa59986
Deployed contract. Address = 0x3a273e5976409598ABc79c2154d78b62259cB438
Next nonce (latest):  351
Next nonce (pending):  361
Mon Apr 29 17:20:44 BST 2024

So by 17:20:44 all 10 transactions are in the TX pool. But the next block isn't until 17:21:05

@amsmota
Copy link
Contributor Author

amsmota commented Apr 30, 2024

Hi @matthew1001 , that seems different from what I saw in my one-node test, but it's hard for me to test correctly with only remix and metamask. I'm not very familiar with that hardhat, even if I;m installing right now.Can you send me that script so I can test the exact same way as you?

Thanks.

I've just pulled your latest changes and re-tested and I do see the same issue still. I've captured some log snippets of Besu with timestamps and a hardhat script with timestamps to can see the timings. This is a 4 validator network with 2 second block period, 30 second empty block period:

Here are the Besu logs:

2024-04-29 17:20:05.062+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #644 / 0 tx / 0 pending / 0 (0.0%) gas / (0xedba0be8a01e17273fb5f452e8098db74d0e503edc264615b2db58a1b64602e5)
2024-04-29 17:20:35.084+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Imported empty block #645 / 0 tx / 0 pending / 0 (0.0%) gas / (0xc56079c814d91d7098394036a7a4dd6d55353627b7b4f442e69b418d108e5d06)
2024-04-29 17:21:05.281+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #646 / 10 tx / 0 pending / 19,048,280 (0.0%) gas / (0x11e38bc403d73d7657580365d334f90560c277768924067f14ce3a36695f7e45)

Here is the hardhat script which deploys 10 contracts and prints the timestamp before and after:

Mon Apr 29 17:20:39 BST 2024
Next nonce (latest):  351
Deployed contract. Address = 0xf7ED3b17A04A3bD2335b52D27821199C1B4e597E
Deployed contract. Address = 0xcF2D1dD61af473529ee0b539D7d0b14290Eb50Ad
Deployed contract. Address = 0x05B59cdc7244e378E9d75b0aF3BbB211ffC3d9D9
Deployed contract. Address = 0x1B4f02a84534a6576844607b3c8E4e60aF311B38
Deployed contract. Address = 0x66EA9eF99fc1E3650712c4194090B0C65128A58A
Deployed contract. Address = 0x6AF6D200c6047dab80420ABfab8bb09004A8C1A8
Deployed contract. Address = 0x4de391161FaCe36eAb1135368267b74490119329
Deployed contract. Address = 0x19f86338cE087E7D4cE087Cc25B8B07ED810CEF2
Deployed contract. Address = 0x3849a4B696Ea143D59DFbcC4AaF32ad47Fa59986
Deployed contract. Address = 0x3a273e5976409598ABc79c2154d78b62259cB438
Next nonce (latest):  351
Next nonce (pending):  361
Mon Apr 29 17:20:44 BST 2024

So by 17:20:44 all 10 transactions are in the TX pool. But the next block isn't until 17:21:05

@amsmota
Copy link
Contributor Author

amsmota commented May 1, 2024

Here's a test I did today with a one-node network based on a 5/10 seconds intervals. Block 3 (the first non-empty) took `10 seconds when it should take 5. However, all the other non-empty blocks seem OK.

2024-05-01 19:44:41.377+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #1 / 0 tx / 0 pending /
2024-05-01 19:44:51.022+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #2 / 0 tx / 0 pending /
2024-05-01 19:45:01.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #3 / 2 tx / 0 pending /
2024-05-01 19:45:06.019+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #4 / 1 tx / 0 pending /
2024-05-01 19:45:11.027+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #5 / 1 tx / 0 pending /
2024-05-01 19:45:16.042+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6 / 1 tx / 0 pending /
2024-05-01 19:45:26.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #7 / 0 tx / 0 pending /
2024-05-01 19:45:36.031+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #8 / 0 tx / 0 pending /
2024-05-01 19:45:46.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #9 / 0 tx / 0 pending /
2024-05-01 19:45:51.037+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #10 / 2 tx / 0 pending /
2024-05-01 19:45:56.039+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #11 / 1 tx / 0 pending /
2024-05-01 19:46:06.026+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #12 / 0 tx / 0 pending /
2024-05-01 19:46:16.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #13 / 0 tx / 0 pending /
2024-05-01 19:46:26.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #14 / 0 tx / 0 pending /

I'll try again to check if that 1st non-empty block is reproducible...

@amsmota
Copy link
Contributor Author

amsmota commented May 2, 2024

Here's my latest test with 5/10 seconds, one-node network. I logged some messages to try and understand the issue.

2024-05-02 18:37:46.387+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #1 / 0 tx / 0 pending /

2024-05-02 18:37:46.402+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:37:51.018+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=2, Round=0}
2024-05-02 18:37:51.018+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:37:56.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has no transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=2, Round=0}
2024-05-02 18:37:56.044+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #2 / 0 tx / 0 pending /
CORRECT

2024-05-02 18:37:56.064+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:01.021+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=3, Round=0}
2024-05-02 18:38:01.063+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #3 / 3 tx / 0 pending /
CORRECT

2024-05-02 18:38:01.079+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:06.011+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=4, Round=0}
2024-05-02 18:38:06.028+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #4 / 2 tx / 0 pending /
CORRECT

2024-05-02 18:38:06.041+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:11.016+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=5, Round=0}
2024-05-02 18:38:11.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:16.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has no transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=5, Round=0}
2024-05-02 18:38:16.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #5 / 0 tx / 0 pending /
CORRECT

2024-05-02 18:38:16.034+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:26.009+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:26.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6 / 1 tx / 0 pending /
INCORRECT

2024-05-02 18:38:26.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:31.016+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:31.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:36.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:36.037+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #7 / 1 tx / 0 pending /
INCORRECT

2024-05-02 18:38:36.045+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:41.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=8, Round=0}
2024-05-02 18:38:41.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:46.008+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has no transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=8, Round=0}
2024-05-02 18:38:46.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced empty block #8 / 0 tx / 0 pending /
CORRECT

You'll notice that the 2 INCORRECT ones have one thing in common, the 2 QbftBlockHeightManager lines have the first saying Block has NO transactions and the second Block has transactions !!!

2024-05-02 18:38:16.034+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:21.010+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:26.009+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=6, Round=0}
2024-05-02 18:38:26.024+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #6 / 1 tx / 0 pending /
INCORRECT

2024-05-02 18:38:26.038+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO BLOCKPERIODSECONDS:  5
2024-05-02 18:38:31.016+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has NO transactions but EMPTYBLOCKPERIODEXPIRYTIME NOT EXPIRED YET: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:31.017+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | BlockTimer | NEW CURRENTBLOCKPERIODSECONDS SET TO EMPTYBLOCKPERIODSECONDS:  10
2024-05-02 18:38:36.023+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBlockHeightManager | Block has transactions and this node is a proposer so it will send a proposal: ConsensusRoundIdentifier{Sequence=7, Round=0}
2024-05-02 18:38:36.037+01:00 | BftProcessorExecutor-QBFT-0 | INFO  | QbftBesuControllerBuilder | Produced block #7 / 1 tx / 0 pending /
INCORRECT

So after staring at the screen and the occasionally banging my head on the table, I remembered the phrase of that Vulcan guy (or was it Sherlock Holmes?) "When you eliminate the impossible, whatever remains, however improbable, must be the truth" and the only explanation I can find is this:

AFAICS the way this works is:

  1. A BlockTimer is created
  2. When that timer expires the handleBlockTimerExpiry kicks in
  3. If the node is a Proposer, it send a ProposalMessage to the other nodes and at some point the block is added to the chain

The above is what's happening at the moment, with no emptyBlockPeriodSeconds... With it will be:

  1. A BlockTimer is created
  2. When that timer expires the handleBlockTimerExpiry kicks in
  3. If the node is a Proposer AND has TXs it send a ProposalMessage to the other nodes and at some point the block is added to the chain
  4. If the block HAS NO TXs
    • it checks if the emptyBlockPeriodSeconds also has expired
    • if yes, it sends a ProposalMessage to the other nodes and at some point the block is added to the chain
    • if not, it sets the BlockTimer to the emptyBlockPeriodSeconds
    • at some point the BlockTimer with the new period expires
    • and gets back to 1.

So, the only explanation for those INCORRECT blocks is that they got their transaction between the 5th and the 10th second after the previous block... It's only logical...

If this explanation is correct, then it's very bad because there is no way to correct it because, well, there's nothing to correct, it's just the way it is... There may be some solution but nothing that comes to mind now... Or I may be wrong, of course...

What do you guys think?

P.S. I wonder if we can create a PendingTransactionAddedListener in the QbftBlockHeightManager, that may do the trick...

@amsmota
Copy link
Contributor Author

amsmota commented May 7, 2024

Hi @siladu and @matthew1001 and everybody, I need some help here related to my last comment, at the moment I think this PR is working correctly (if my findings on my last post are correct), I have the idea of having a mempool listener in the QbftBlockHeightManager but I'm afraid that even if that works it'll add too much complexity.

Just to recall, the only explanation for the INCORRECT blocks is that the non-empty-bock was created after 10 seconds and not 5 is because the block got the transaction(s) between the 5th and the 10th second after the previous block...

Comment PLEASE...

@matthew1001
Copy link
Contributor

Thanks for doing some extra testing @amsmota At the moment I'm going to find it hard to put the time in to diagnose things in detail. I'm wondering if we want to consider putting the feature in documented as experimental since the basic behaviour seems to work and the exact behaviour in certain situations just needs sorting out. The main thing would be to ensure that we're confident nothing has regressed in the basic case (i.e. when the user hasn't set emptyblockperiodseconds).

@siladu what are your thoughts? @non-fungible-nelson do you have a view on whether this could be released as experimental to get more people trying it out and more focus over the coming releases on ironing out any remaining issues?

@amsmota
Copy link
Contributor Author

amsmota commented May 8, 2024

Thanks for doing some extra testing @amsmota At the moment I'm going to find it hard to put the time in to diagnose things in detail.

Yeah, I totally understand that, and I thank you for your valuable time. 💯

I'm wondering if we want to consider putting the feature in documented as experimental since the basic behaviour seems to work and the exact behaviour in certain situations just needs sorting out.

Well, I don't know what exactly implies being "experimental" but I agree with it...

The main thing would be to ensure that we're confident nothing has regressed in the basic case (i.e. when the user hasn't set emptyblockperiodseconds).

I'm pretty sure that there is no problem there, but looking at the code I do see that there are no provision or validation for that, I may change it to make it explicitly, and anyway I'm going to test that scenario more exhaustively

@siladu what are your thoughts? @non-fungible-nelson do you have a view on whether this could be released as experimental to get more people trying it out and more focus over the coming releases on ironing out any remaining issues?

That part of having more people trying it out I this it's crucial. Being tested by only a couple of persons is too limited, we really need a broader area of testing...

Anyway, I'm going to keep on testing, going to create a "standard" test to send transactions in a predefined way, not randomly like I'm doing now, and improve my 4 nodes network because it's very difficult to aggregate meaningful info from all nodes...

Thank youse all for your support.

@matthew1001
Copy link
Contributor

matthew1001 commented Sep 20, 2024

I don't have permission to push commits to your branch, but I've pushed your branch + a merge commit to my repo. If you take a look at c38970d you should be able to duplicate or cherry-pick that single commit into your branch.

Also you currently have a bunch of commented-out tests in the PR?

@amsmota
Copy link
Contributor Author

amsmota commented Sep 20, 2024

Also you currently have a bunch of commented-out tests in the PR?

Yeah, I commented on one of your suggestions above. for you to review

I remmeber now why I did that, since there was a createAndSendProposalMessage I actually break it in two, so I created

createBlock
sendProposalMessage
where createBlock + sendProposalMessage = createAndSendProposalMessage, so it have some kind of simetry.

But now I checked and createAndSendProposalMessage is not used anywhere except on 4 tests in QbftRoundTest, so I think we can delete both methods ands tests, what do you think?

The only thing I don't like is that IBFT has that createAndSendProposalMessage, it seems IBFT and QBFT are mirror implemenations, and this will break that simmetry... (yes, I suffer a liitle from OCD...).

Anyway, I commented both createAndSendProposalMessage and the tests for you to review...

@matthew1001
Copy link
Contributor

Yeah, I commented on one of your suggestions above. for you to review

Ah yes, apologies. I think if they're not needed they can be removed then. And any functions that are currently only called by tests can probably be removed as well.

Regarding the IBFT branch, I don't see any changes in the IBFT-specific code. Personally I'm OK with just concentrating on QBFT for now as it's the more commonly used protocol, and will be enough for the feature to be tested out before changing it from experimental in the future.

…radle props.

Signed-off-by: amsmota <amsmota@gmail.com>
@amsmota
Copy link
Contributor Author

amsmota commented Sep 20, 2024

Ok, just pushed with

  • Your change
  • Removed comented code
  • Added a few "x" in the empty block periods...

Please review your change but I think it's good...

Copy link
Contributor

@matthew1001 matthew1001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - excellent job on a challenging PR!

Could you add a CHANGELOG entry please? Then I think it's good to merge. I've approved so if you add one feel free to merge the PR.

@amsmota
Copy link
Contributor Author

amsmota commented Sep 20, 2024

Could you add a CHANGELOG entry please? Then I think it's good to merge. I've approved so if you add one feel free to merge the PR.

I just added it... 👍 There's still this message, can you check it?

Required statuses must pass before merging

Cheers.

Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amsmota - just putting a blocker on this so it doesn't prematurely get merged before I get chance to re-review the latest changes, which I will do ASAP.

My most recent outstanding list was this:
#6965 (comment)

I believe you've resolved item (2) and (3), but I'm still not sure about item (1).

If this issue turns out to be limited to the experimental feature, I would be happy to merge the PR even if it requires a follow-up fix, however I just want to double check it's not a regression when the feature is disabled.

Also thanks @matthew1001 for taking the time to complete the review :)

@siladu
Copy link
Contributor

siladu commented Sep 24, 2024

@amsmota I approved the CI run and there are some NPEs in the unit tests
https://github.com/hyperledger/besu/actions/runs/10996157224/job/30554884643?pr=6965

Copy link
Contributor

@siladu siladu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you've resolved item (2) and (3), but I'm still not sure about item (1).

@amsmota I've retested the test case that was concerning me:

When no txs and round timer expires, node2 proposes

  • missing QbftBlockHeightManager | Round change from 0x... log messages

and the latest code appears to be working as I would expect now 🎉
At least, I can't detect any difference in round change behaviour/logs compared to the main branch (other than the expected empty block timings).

I want to echo @matthew1001's congrats on sticking with this PR and getting it through! It has also been a weight from my shoulders as I wasn't able to complete this before I had to move on to others tasks :)

I would be happy to merge this in once the unit tests are passing.

Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: amsmota <amsmota@gmail.com>
@amsmota
Copy link
Contributor Author

amsmota commented Sep 24, 2024

@amsmota I approved the CI run and there are some NPEs in the unit tests https://github.com/hyperledger/besu/actions/runs/10996157224/job/30554884643?pr=6965

Hi @siladu, thanks for that, it was quite strange, I implemend that test back in April but for some reason one of the merges decided to drop one line (only one, line 160 below!!!) that caused this NPE now...

image

I just hope there are no missing lines elsewhere...

And thank you for your review and approval... 👍

@siladu siladu enabled auto-merge (squash) September 24, 2024 11:00
@siladu
Copy link
Contributor

siladu commented Sep 24, 2024

@amsmota I have enabled auto-merge. I don't have permission to merge main into your branch so you'll have to do the final step please...and actually I'll have to reapprove the CI build

@amsmota
Copy link
Contributor Author

amsmota commented Sep 24, 2024

I just did a merge from main, I hope nothing's wrong...

Cheers.

@siladu siladu merged commit aed6bb0 into hyperledger:main Sep 24, 2024
43 checks passed
Wolmin pushed a commit to lukso-network/network-besu that referenced this pull request Sep 27, 2024
…ledger#3810) (hyperledger#6965)

Implemented support for emptyBlockPeriodSeconds in QBFT (Issue hyperledger#3810)

Introduces experimental xemptyblockperiodseconds genesis config option for producing empty blocks at a specific interval independently of the value of the existing blockperiodseconds setting.

hyperledger#3810

---------

Signed-off-by: Antonio Mota <antonio.mota@citi.com>
Signed-off-by: amsmota <amsmota@gmail.com>
Signed-off-by: Wolmin <lamonos123@gmail.com>
@amsmota amsmota mentioned this pull request Oct 30, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants