Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: off-by-one causing "no further headers to download" bug #3264

Merged

Conversation

sdbondi
Copy link
Member

@sdbondi sdbondi commented Aug 30, 2021

Description

When entering the synchonize_headers function, a chain of headers has
been downloaded and validated but not committed. If there less than
1000 (not equal to as before), the function can exit without streaming
more as there are no more to send.

This PR correctly handles the case where the node is exactly 1000 headers behind
by: (1) correcting the off-by-one "no further headers to download" conditional and
(2) commiting headers before starting streaming if the PoW is stronger,
in case no further headers would be streamed.

Motivation and Context

Header sync ends prematurely when receiving exactly 1000 "pre-sync" headers.

How Has This Been Tested?

Manually - Sync from scratch.

@sdbondi sdbondi changed the title fix: off-by-one cause "no further headers to download" bug fix: off-by-one causing "no further headers to download" bug Aug 30, 2021
@aviator-app
Copy link
Contributor

aviator-app bot commented Aug 30, 2021

PR queued successfully. Your position in queue is: 10

@sdbondi sdbondi force-pushed the core-header-sync-off-by-one branch 2 times, most recently from 7770192 to 26d00a1 Compare August 30, 2021 11:22
delta1
delta1 previously approved these changes Aug 30, 2021
Copy link
Contributor

@delta1 delta1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK

When entering the `synchonize_headers` function, a chain of headers has
been downloaded and validated but not committed. If there less than
1000 (not equal to as before), the function can exit without streaming
more as there are no more to send.

This PR correctly handles the case where the node is exactly 1000 headers behind
by: (1) correcting the off-by-one "no further headers to download" conditional and
(2) commiting headers before starting streaming if the PoW is stronger,
in case no further headers would be streamed.
@aviator-app aviator-app bot merged commit 3502b39 into tari-project:development Aug 30, 2021
@sdbondi sdbondi deleted the core-header-sync-off-by-one branch August 31, 2021 05:05
stringhandler added a commit that referenced this pull request Sep 1, 2021
* Update LibWallet recovery task event handling

The recovery task broke out of its monitoring loop before getting the `UtxoScannerEvent::Completed` event. This PR just moves that break statement so that the final completed callback is made.

Also Ignore `test_store_and_forward_send_tx` due to it being flakey on CI and the functionality is covered by Cucumber tests.

* Update LibWallet `wallet_import_utxo` function to include valid TariScript

The `wallet_import_utxo` FFI function in LibWallet just used defaults for a number of the new UTXO fields when importing a faucet UTXO. The Faucet UTXO provided by the client is just the spending key and amount. The `metadata_signature` and `sender_offset_public_key` can both remain as default values as they are not used in spending an UTXO.  A Nop script is assumed and the spending key is used as the `script_private_key`. The final update is that the `input_data` it set as the public key of the spending key.

To test that the base node is happy for an UTXO imported in this way to be spent a Cucumber test is provided which imports a UTXO into a wallet and zeroes out the `metadata_signature` and`sender_offset_public_key` and updates the `input_data` and `script_private_key` in the same way as described above. This imported Faucet utxo is then successfully spent to another wallet.

* Move not found to status, and not boolean flag

* Introduce cache update cool down to console wallet

The current architecture of the wallet is that the AppState contains a cache of the current state that the UI uses to draw from and a second instance of the data that is updated in the background when events are received from the wallet services without interfering with drawing. When the background data has been updated by the event monitoring thread it flips a flag telling the UI that the cache has been invalidated so when the drawing is done on a Tick event the UI thread will clone the background data into the cache for future drawing calls.

A problem was found where when a large number of transactions were being processed by the wallet the UI would become unresponsive. The reason for this is that with a large amount of transactions there is quite a lot of AppState that is copied from the background into the UI cache which could take 300-400ms and this cache was being invalidated very often as the transactions are being handled by the wallet services. This would mean that a Cache copy occurred after every single draw cycle and would block the processing of Key events for 300-400ms.

This PR proposes a solution of introducing a cache update cooldown period, initially set to 2 seconds, so that if a cache update has occurred the soonest that the next update can occur is 2 seconds later giving the UI thread time to handle key events. In normal wallet operation state update events do not occur that often so this approach will allow the cache update to occur as soon as the cache is invalidated but will force a fast subsequent update to wait at least 2 seconds. In the mean time the background data can be worked on in the background thread.

* v0.9.2

* Improve cucumber tx reliability

* Miningcore Transcoder

Update proxy.rs

WIP

Added stratum configuration to miner.
Mostly implemented stratum.
Mostly implemented stratum controller.
Partially implemented stratum miner.

Rebased to latest dev

Import PR#3006

Rebased to latest dev and updated version to 0.9.0

Fixed tari_stratum_ffi tests

Clippy and cargo-fmt

Bug fixes

Return blockheader as json object.
Retrieve recipients from params instead of directly from body of request.
Fix bug in GetHeaderByHeight

Update stratum miner to receive blockheader instead of block

clippy

update

Update

Implemented keepalive
Bug fix for transfer results
Implemented stratum error code response handling in tari_mining_node

Rebase fix

Update stratum.rs

Update stratum.rs

Review Comments

Update and Fixes

Added ResumeJob to MinerMessage.
Fixed disconnection bug where miner would not resume solves on reconnect.
Added transcoder_host_address config variable to stop using proxy_host_address as it is already used by mm_proxy, this enables them both to be run simultaneously.

Update cucumber config variables

* fix: add timeout to protocol notifications + log improvements

- protocol notifications now have a set "safety" timeout.
- add log for inbound comms pipeline concurrency usage

* Fix: Correctly deal with new coinbase transactions for the same height

A bug was observed that now and then a Coinbase Transaction would not conclude its monitoring with this error:
`ValueNotFound(CompletedTransaction(15645503741743694020))`

This occurred when due to a small network reorg the miner would request a block for the same height. The new request would often have a different amount of fees which would mean this transaction needs to be cancelled in favour of the new transaction. However, with the transaction being cancelled the coinbase monitoring task will come around to checking the DB if the transaction is still there so it can continue to poll the base node and will get the ValueNotFound error. This is correct behaviour so should not have been logged as an error. The error case also means that the TransactionCancelled event is never fired which resulted in the console wallet UI not updating the state of that transaction which left it in the transaction list erroneous.

So this PR handles that “Error” properly and sends the event before ending the coinbase monitoring	task.

* Improve co mining cucumber test

increase key length

* Added links to mobile wallets' repos

* Fix search_kernel command

* Stratum Transcoder Config Cleanup

Moved stratum_transcoder config variables to its' own section.
Fixed bug in defaults for stratum_transcoder.
Updated example config and cucumber variable to reflect changes in configuration.
Added stratum mode configuration variables for tari_mining_node in sample config, commented out by default.

* test: cucumber for all wallet commands

* fix: in wallet block certain keys during popup

* chore(deps): bump tar from 6.1.0 to 6.1.6 in /integration_tests

Bumps [tar](https://github.com/npm/node-tar) from 6.1.0 to 6.1.6.
- [Release notes](https://github.com/npm/node-tar/releases)
- [Changelog](https://github.com/npm/node-tar/blob/main/CHANGELOG.md)
- [Commits](isaacs/node-tar@v6.1.0...v6.1.6)

---
updated-dependencies:
- dependency-name: tar
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* simplify and improve test

* fix: ban peer when merkle roots mismatch

* Fix two cucumber tests to be less flaky

- An "FundsPending" error returned from the coin split step will now be handled
  gracefully and not stop the test
- Wallet A will monitor transactions to be at least broadcast when sending
  one-sided transactions to wallet B before mining will commence

* Fix UTXO scan edge case when PC awakes from sleep

This PR fixes an issue where bottled up tokio interval events are fired
successively for the edge case where a computer wakes up from sleep. This
was evident in the UTXO scanning service where many many UTXO scanning tasks
would be started in parallel afterwards instead of only one.

* update to use sydney more address

* Fix flaky cucumber tests involving transactions

Added exception handling to gRPC methods submitting transactions in
cucumber integration test steps. Async timing is not always favourable
to conclude a transaction; retrying a transaction if it returns an
error is important for test robustness.

* Washing Machine Update

Fixed network bug in walletProcess.
Fixed grpcPort bug in walletProcess for custom grpc address.
Updated network in washing machine.
Added routing mechanism.
Added excludeTestEnvrs.

Added notifications via web hook if url is supplied

Update washing_machine.js

Reduced output noisiness

* feat: shared p2p rpc client session pool

Adds an RPC client session pool to efficiently maintain multiple shared
RPC sessions on the p2p network.

* fix: better method for getting an open port in cucumber tests

* fix(cucumber): update to @grpc/grpc-js library

Potentially fixes GRPC connection issues we've been having

* misc: update package lock

* fix(cucumber): retry grpc connect before giving up

* fix(cucumber): retry wallet grpc connections

* fix wallet resize

* feat: wallet connectivity service

Adds a service responsible for wallet connectivity. This service
is responsible for and abstracts any complexity in the management of
the base node connections and RPC session management.

This PR makes use of this service in the base node montoring service but
does not "plumb" the WalletConenctivityService into the protocols. This
is left as a TODO, but we should expect this to remove many lines of
code and greaty simplify these protocols by removing the budren of
connection management in the various wallet components.

A number of simplifications on the peer connection and substream code,
debatably reducing places that bugs could hide.

* Improve prune mode to stop panics.

* fix build binaries

* Update handling of SAF message propagation and deletion

This PR adds two changes to the way SAF messages are handled to fix two subtle bugs spotted while developing cucumber tests.

The first issue was that when a Node propagates a SAF message it was storing to other nodes in its neighbourhood the broadcast strategy it was using only chose from currently connected base nodes. This meant that if the Node had an active connection to a Communication Client (wallet) it would not just directly send the SAF message to that client but to other base nodes in the network region. This meant that the wallet would only receive new SAF message when it actively requested them on connection even though it was directly connected to the node.

This PR adds a new broadcast strategy called `DirectOrClosestNodes` which will first check if the node has a direct active connection and if it does just send the SAF message directly to its destination.

The second issue was a subtle problem where when a node starts to send SAF messages to a destination it would remove the messages from the database based only on whether the outbound messages were put onto the outbound message pipeline. The problem occurs when the TCP connection to that peer is actually broken the sending of those messages would fail at the end of the pipeline but the SAF messages were already deleted from the database.

This PR changes the way SAF messages are deleted. When a client asks a node for SAF message it will also provide a timestamp of the most recent SAF message it has received. The Node will then send all SAF messages since that timestamp that it has for the node and will delete all SAF messages from before the specified Timestamp.  This serves as a form of Ack that the client has received the older messages at some point and they are no longer needed.

* fix unit flaky test

* fix(dailys): use non-zero exit status when wallet recovery fails

Non-zero error code on failure and return more information about the error.
Adds logfile output for processes

* Fix wallet CLI cucumber tests

(1) When using the wallet CLI mode, the wallet is not always in the correct
    state to send transactions, depending on the last known metadata status
    from the previously connected base node. This PR introduces exception
    handling for wallet CLI commands so that it will automatically retry
    to execute the command if it fails.
    A follow-up PR should let the wallet wait for a base node connection
    before it executes certain commands.
(2) Added a new test step to explicitly test transaction statuses in the
    wallet.
(3) Fixed the UTXO scanning service so it would not go into an infinite loop
    trying to scan UTXOs if the blockchain does not have any UTXOs yet.
(4) Fixed an erroneous error return in the output manager service.

* Update log4rs for tari_stratum_transcoder

Fix typos

* mark cucumber test as flaky

* ci: remove flaky tests from ci

* Mining worker name for tari_mining_node

Adds optional field to config for tari_mining_node to allow the miner to be named when using the stratum configuration.

Updated successful server connection status from warn to info.

* chore(deps): bump jszip from 3.6.0 to 3.7.0 in /integration_tests

Bumps [jszip](https://github.com/Stuk/jszip) from 3.6.0 to 3.7.0.
- [Release notes](https://github.com/Stuk/jszip/releases)
- [Changelog](https://github.com/Stuk/jszip/blob/master/CHANGES.md)
- [Commits](Stuk/jszip@v3.6.0...v3.7.0)

---
updated-dependencies:
- dependency-name: jszip
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Set robust limits for busy a blockchain

Updated limits for the base node and wallet that would be robust for
a busy blockchain - this was simulated with two sizeable stress tests
of 15,000 transactions each.

* test: add wallet ffi testing

* Add wallet reorg cucumber tests

Added wallet reorg cucumber tests for coinbase and normal
transactions. These will be failing until wallets can
handle reorgs properly - failing steps have been commented
out.

* v0.9.3

* libwallet-0.17.3

* Add network selection to wallet_ffi

Update lib.rs

Review comments

* re_add validation test

* Cleanup stratum config terminal output in tari_mining_node

* Cleanup stratum config terminal output in tari_mining_node

* Fix stratum miner speed for tari_mining_node

Removed call to thread::sleep accidentally left in while debugging a prior issue.

Added nonce to display for solution.

Co-Authored-By: Hansie Odendaal <39146854+hansieodendaal@users.noreply.github.com>

Review comments

Co-Authored-By: Hansie Odendaal <39146854+hansieodendaal@users.noreply.github.com>

* Remove old unused integration tests

* fix: ensure peers are added to peer list before recovery starts

* fix: enforce unique commitments in utxo set

Adds a unique commitment db index for the UTXO set as well as unique
commitment check in the block validator.

* Update docs with pooled SHA3 mining

* Handle receiver cancelling an inbound transaction that is later received

This PR addresses the following scenario spotted by @stanimal:

 - NodeA sends to nodeB(offline) 
- NodeA goes offline 
- NodeB receives tx, and cancels it (weird I know) 
- NodeA comes online and broadcasts the transaction 
- NodeB is not aware of the transaction, transaction complete for NodeA

This is handled by adding logic that if a FinalizedTransaction is received with no active Receive Protocols that the database is checked if there is a matching cancelled inbound transaction from the same pubkey. If there is the receiver might as well restart that protocol and accept the finalized transaction.

A cucumber test is provided to test this case.

This required adding in functionality to the Transaction and Output Manager service to reinstate a cancelled inbound transaction, unit tests provided for that.

* add rfc docs to inclode unique kernels

* Fix console wallet buffer size bug

* feat: use nodejs cron for dailies, improve washingmachine reporting

- implement cron using `cron` nodejs library
- move "utils" to "daily_tests"
- daily_tests has its own dependencies and package.json
- improve washing machine MM reporting
- refactor dailies to allow them to be exported as modules

* feat: add sync rpc client pool to wallet connectivity

- add sync pool and `obtain_base_node_sync_rpc_client`
- add `get_header` to base node rpc

* Change RPC connection issues log status

Not all pertinent RPC connection issues were logged as warning or error

* fix dev after test merges

* [skip ci] auto deploy tags to s3

* v0.9.4

* ci: Add libwallet iOS build

* ci: Fix libwallet android github action

* Expose `get_mempool_stats` via gRPC and add cucumber test

This PR exposes the `get_mempool_stats` method via the base node gRPC interface. It also adds a cucumber test to add 5 transactions to the mempool and tests that the base node reports the correct stats.

* fix: Fix or remove ignored tests in pow_data.rs

This PR aimed to remove the ignore from the tests in pow_data.rs. These tests are failing tests so two of them that panicked with updated with the `should_panic` flag but the out of memory test aborts and cannot be handled in a test so it was removed.

* Remove `test_harness` feature from wallet

In the early days of wallet development we didn’t have a working base nodes or cucumber infrastructure for the Mobile Clients to be able to test transacting. The `test_harness` functionality in the wallet was created to allow wallet clients to generate test data. This is no longer used so this PR removes this code.

This code was also used to generate test data in the wallet_ffi integration test. There will soon be Cucumber infrastructure to test the FFI wrapper which is a far better way to perform this integration test so that huge test is removed from the FFI library.

* Make `send_transaction` request handling async in transaction service

It was noted that under stress test the transaction service select! loop was blocking for longer than 500ms at times. This turned out to be during the send_transaction calls which did the initial transaction setup synchronously, this involves selecting UTXOs and building the initial transaction which with a large UTXO database can take time.

In order to reduce this impact the `handle_request(…)`  function is changed in the transaction service. Instead of calling that function, waiting for a synchronous response and then sending the response down the reply one shot channel the one-shot channel is passed into the `handle_request(…)` function. For the send_transaction case the intensive work is then moved into the asynchronous `transaction_send_protocol` task and the reply channel is also sent into that task. The task is spawned and runs asynchronously when the `handle_request` method can end and return to the select! loop. Once the task has completed the initial tranasction setup it can send the response to the service API caller via the reply channel.

This is only implemented in the asynchronous way for `send_transaction` in this PR, all the other API requests will still do their work synchronously and send the response over the reply channel but the infrastructure is now there to convert any one of those API calls to reply asynchronously if needed. At the moment the other API requests don’t appear to take long enough to require this effort just yet.

`transaction_sender_protocols` are indexed by their TxId so in order to start the task with the ID before building the initial transaction the tx_id is now generated in the Transaction Service. The Transaction Protocol builder is updated to allow an optional manual specification of the TxId.

* fix: correct regexp for recovery and sync tests

* fix: improve p2p RPC robustness

- Correctly handles edge case where the client can receive a response after
  the deadline + grace period has expired due to extreme latency.
- Minor code simplifications
- Adds integration stress tests to comms
- Increase client-side deadline grace period. This is needed because at
  any point we could be transferring a lot of data, causing delays,
  which the client must tollerate.
- Minor performance optimisations (e.g removed usage of `.split()` which uses a BiLock)

* separate ffi tests

* test: add more wallet ffi testing

* test: Add integration test for ListHeaders

* wallet: Add NodeId to console wallet Who Am I tab

Just adds the console wallet’s own NodeId to the Who am I tab in the console wallet.

* fix: division by zero

* [wallet_ffi] Add null check for `transport_type` in FFI

The `transport_type` argument is not checked is it null before it is used in this method. This PR adds the check and appropriate error response.

* wip

* fix: bug in wallet base node peer switching

- Connectivity retries connecting indefinitely, however previously if
  this continuously fails and the user updates their peer, the new peer will not be
  read until the previous peer was connected to. This PR fixes this.
- Add protocl information to comms RPC logs
- Cleanup peer state, making the wallet connectivity service the source
  of truth for the base node peer
- Adds an `Ack` flag to RPC protocol, this is not currently used but
  could be implemented in the client side in future if required without
  breaking the network (server supports it, client support may or may
  not be needed).

* chore: better logging for lmdb_delete

* test: cucumber check block heights in order test (#3219)

## Description
Cucumber check block heights in order test

## Motivation and Context
Cucumber tests

## How Has This Been Tested?
npm test -- --name "Base node lists heights"

## Checklist:
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* test: almost complete ffi wallet test (#3220)

<!--- Provide a general summary of your changes in the Title above -->

## Description
Almost complete ffi wallet test.
- The async base node connection is not there.
- The SAF test is broken (the ffi wallet doesn't change the status, but the receiver node has correct status) 

## How Has This Been Tested?
npm test -- .\features\WalletFFI.feature

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* cucumber: Add mempool test for unconfirmed tx to mined tx (#3222)

## Description
Add mempool test for unconfirmed tx to mined tx

## Motivation and Context
cucumber tests

## How Has This Been Tested?
npm test -- --name "Mempool unconfirmed transaction to mined transaction"

## Checklist:
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* fix: chain error caused by zero-conf transactions and reorgs (#3223)

<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
If a zero-conf transaction was in a block and the block is rewound. The block_chain backend will try to delete in input which was never marked as unspent as it was immediately spent. When we rewound this block we should check this and not try and delete an output that was never an unspent output on the chain. 

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* ci: prebuild mining node on cucumber tests (#3221)

Co-authored-by: mergequeue[bot] <48659329+mergequeue[bot]@users.noreply.github.com>

* Add extra logging detail for wiremode warnings (#3216)

<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
Minor log warning improvements for the case where a wire format byte is not received 

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->
Adding more log info for some observed warnings 

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Replaced existing base node and started syncing a new node successfully 

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* fix: edge-case fixes for wallet peer switching in console wallet (#3226)


<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
- set peer using the watch to allow the connectivity service to
  immediately be aware of the new peer
- aborted the dial early if necessary, should the user set a different peer
- slightly reduce busy-ness of the wallet monitor by monitoring for less comms connectivity events
-  monitor for wallet connectivity peer status changes to improve the responsiveness of the status ui update. 

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->
When a peer is selected, and the previous peer is offline, it appears as if the new peer is offline.
This allows the state to be immediately be updated (though there is still a delay where the frontend 
gets refreshed - probably waiting for a tick)
The wallet event monitor was kept very busy with all the comms connectivity events incl. events that 
have nothing

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Tested on existing wallet

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* chore: simpler pull request template (#3231)

Simpler PR template

* feat: add `ping()` to all comms RPC clients (#3227)

<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
Adds a `ping()` function to all comms RPC clients. `ping()` sends an RPC
request with the `ACK` flag set. The server will immediately reply with
an `ACK` response. This accurately measures RPC latency without a
potentially slow backend. `ping()` is now used in the wallet monitor.

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->

Previously, `get_last_request_latency` would refer to the latency of `get_tip_info` 
which will increase whenever the blockchain db is busy, for e.g:
- the base node is syncing
- another node(s) syncing from the base node
- one or many wallets scanning UTXOs from the base node
- one or many wallets running a recovery from the base node
- lots of lmdb writes e.g large reorg

A client wallet would, of course, have no idea that this is occurring
and simply display poor latency. This could be perceived to be a poor
RPC performance, when in fact, there are a number or non-network/RPC
related reasons why a ping > 1/2 seconds  is displayed.

`get_last_request_latency` is a better measure when determining
current base node performance (caveats: depending on the RPC method impl, 
current performance does not predict future performance). However,
it is misleading to use this as a user-facing value presented as network latency.

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
Unit test, console wallet test

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* fix: show warnings on console (#3225)

Show warnings and errors on apps as well as logs

Also moved a struct that was in the middle of a method into its own file

* test: improve comms limits to combat network timeouts (#3224)

<!--- Provide a general summary of your changes in the Title above -->

## Description
This PR:
- Improved various comms configuration settings to combat RPC connection and response timeouts
- Improved wire mode logging

The philosophy here is to rather wait for a connection or response than to abandon all and try again.

These settings were tested on two separate systems performing system-level tests where RPC timeouts and connection problems were previously prevalent:
- while base node/console wallet pairs were only monitoring the network or were linked to SHA3 or RandomX miners
- while performing a stress test and compiling Rust at the same time.

The former proved to run virtually without any errors while the latter registered some timouts, especially when performing Rust compilations (~ two orders of magnitude less).

**Edit:** 
- Fixed flaky cucumber test `Node should not sync from pruned node`
- Fixed magic numbers in unit test `test_txo_validation_rpc_timeout`

## Motivation and Context
See above

## How Has This Been Tested?
See above

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [X] I'm merging against the `development` branch.
* [X] I have squashed my commits into a single commit.

* chore: add extra seed node (#3234)

Description
---
Add additional seed node

Motivation and Context
---
N/A

How Has This Been Tested?
---
Tested with new temporary console wallet

* v0.9.5

* ci: add pr title check

* test: Add rebuild-db integration test (#3232)

Description:

Adds a cucumber test to cover the `--rebuild-db` blockchain recovery functionality on the base node 

Motivation and Context:

Improved test coverage

How Has This Been Tested?

`npm test -- --name "Blockchain database recovery"`

* Remove OpenSSL from Windows runtime

- Removed OpenSSL installers and dependencies from Windows runtime
- Removed stdout as appender from console wallet's log4rs logger

* fix: exit command and free up tokio thread (#3235)

Description
---
Exit command didn't exit the cli loop.
Rustyline was holding up a tokio thread - used a blocking thread instead

Motivation and Context
---
Bug

How Has This Been Tested?
---
Manually on base node

* fix: add status output to logs in non-interactive mode (#3244)

Description
---
Add status output to logs in non-interactive mode

Motivation and Context
---
Base node in non-interactive mode was logging status

How Has This Been Tested?
---
Manually run base node in non-interactive mode

* feat: add tab for error log to the wallet

* feat: add support for forcing sync from seeds (#3228)

## Description
Add support for forcing sync to a seed node. Specify index for the node from peer_seeds list.

## How Has This Been Tested?
Manually.

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* test: cucumber forced sync (#3230)

## Description
Add cucumber test that tests forced sync to single node.

## How Has This Been Tested?
npm test -- --name "Force sync many nodes agains one peer"

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* test: add multiple wallet recovery from peer (#3240)

Description
Added multiple wallet recovery from peer

Motivation and Context
Additional tests

How Has This Been Tested?
Manually (npm test -- --name "Multiple Wallet recovery from seed node")

* feat: base_node prompt user to create id if not found (#3245)

Description
Prompt user to create id if not found

Motivation and Context
Improvement to base node startup, specifically on first run.

How Has This Been Tested?
Manually

* fix: daily wallet recovery fixes (#3229)


<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
- Use the GRPC client connection given `WalletProcess`
- Outputs error details in webhook in recovery cron job
- Adds very basic mocha tests

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->
Recovery daily is failing even though it succeeds. 

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* feat!: tell an FFI client that a recovery is in progress on `wallet_create` (#3249)

Description
---
### Note: This is a breaking change to LibWallet FFI

Currently if a wallet recovery was in progress and the wallet was shutdown the next time that wallet is start by an FFI client using the ‘wallet_create’ method there is no way for the FFI client to know that the recovery should be continued.

The wallet is able to resume the recovery from where it left off and it should so as not to lose funds but the FFI client must restart the recovery process with the same seed words. The FFI client has to do the restarting so that it can provide the callback through which the process is monitored.

Furthermore, the wallet does not respond to P2P transaction negotiation message if a recovery process is in progress so it is important that an FFI client completes any outstanding recoveries ASAP.

How Has This Been Tested?
---
untested in the backend.

* fix: fix base_node_service_config not read (#3251)

Description
---
Fixed the base_node_service_config not being initialized with values from the config file.

Motivation and Context
---
See above

How Has This Been Tested?
---
System level testing

* test: add flag to have Cucumber exit when tests are complete (#3252)

Description
---
Add the —exit flag to the Cucumber CI commands to force Cucumber to end when the tests are completed. This doesn’t solve the issue where something is keeping the Cucumber process running due to a poor shutdown though.

How Has This Been Tested?
---
N/A

* docs: rfc staged security (#3246)

Description
---

This Request for Comment (RFC) aims to describe Tari's ergonomic approach to securing funds in a hot wallet.
The focus is on mobile wallets, but the strategy described here is equally applicable to console or desktop wallets.

Motivation and Context
---

This philosophy has been partially implemented in Aurora already but has not been captured in community documentation before.

How Has This Been Tested?
---
N/A

* test: add tracing to comms via --tracing-enabled (#3238)

Description
---
Add tracing to comms to debug timings via the `--tracing-enabled` flag

Motivation and Context
---
It's currently difficult to understand the timings of network calls and errors in the application.

How Has This Been Tested?
---
Tested manually

* refactor: additional DB audit of methods (#2864)

<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->

This provides a audit removing and increasing security over the following db methods from the WriteOperation enum:

```rust
InsertChainOrphanBlock(Arc<ChainBlock>),
InsertInput {
 header_hash: HashOutput,
input: Box<TransactionInput>,
mmr_position: u32,
},

InsertKernel {
 header_hash: HashOutput,
kernel: Box<TransactionKernel>,
mmr_position: u32,
},

InsertOutput {
 header_hash: HashOutput,
output: Box<TransactionOutput>,
mmr_position: u32,
},

DeleteHeader(u64),

DeleteOrphanChainTip(HashOutput),

InsertOrphanChainTip(HashOutput),

SetBestBlock {
 height: u64,
hash: HashOutput,
accumulated_difficulty: u128,
},

SetPruningHorizonConfig(u64),

SetPrunedHeight {
 height: u64,
kernel_sum: Commitment,
utxo_sum: Commitment,
},
```

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
This synced to tip, and passed all unit tests

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [ ] I have squashed my commits into a single commit.

* feat: allow DHT to be configured to repropagate messages for a number of rounds (#3211)

<!--- Provide a general summary of your changes in the Title above -->

## Description
<!--- Describe your changes in detail -->
Use the dedup cache hit count to allow certain duplicate messages through a
configurable number of times. 

## Motivation and Context
<!--- Why is this change required? What problem does it solve? -->
<!--- If it fixes an open issue, please link to the issue here. -->

~~This improves mempool synchronization.~~ Implements gossip repropagation that could be used for some message types in future.

## How Has This Been Tested?
<!--- Please describe in detail how you tested your changes. -->
<!--- Include details of your testing environment, and the tests you ran to -->
<!--- see how your change affects other areas of the code, etc. -->
New unit test. More manual system tests need to be done

## Checklist:
<!--- Go over all the following points, and put an `x` in all the boxes that apply. -->
* [x] I'm merging against the `development` branch.
* [x] I have squashed my commits into a single commit.

* refactor: refactor wallet ffi cucumber tests (#3259)

Description
Refactored WalletFFI.feature into a working state, tested locally.

Further refactoring and dead code removal would be beneficial.

Motivation and Context
Necessary to get WalletFFI.feature working.

How Has This Been Tested?
Tested locally, each scenario tested with:
`./node_modules/.bin/cucumber-js --name "${scenario_name}"`

* fix: add periodic connection check to wallet connectivity service (#3237)


Description
---
- Adds a periodic check of the connection status and attempts a
  reconnect if no longer connected. Previously it was assumed that this
  can be done lazily because some caller will always call
  `obtain_base_node_wallet_rpc_client`, but this may not be the case. A
  periodic check is added.
- Clean up some state checking to use the wallet connectivity service.

Motivation and Context
---
Improves snappiness of the connectivity and chain state updates in the wallet

How Has This Been Tested?
---
Manually on the console wallet + existing tests

* fix: send transactions to all connected peers (#3239)

Description
---
 Send transactions to all connected peers as we do with block propagation

Motivation and Context
---
Alternative to #3211.

How Has This Been Tested?
---
Existing tests / single line code change

* test: add random transactions to empty cucumber blocks (#3253)


Description
---
This adds in random transactions spending each other and unused coin bases to fill in blocks.

Motivation and Context
---
This is to allow us to test more thoroughly with all blocks having transactions and not just blocks that were explicitly created with transactions.  These are limited to 10 transactions per block to not make it too slow at the current validation speeds. This might be revisited in a later stage.

How Has This Been Tested?
---
Manually confirmed that the blocks do have the transactions in them. 
Ran all cucumber tests with the flags: critical and not broken and not flaky

* feat: improve basenode switch from listening to lagging mode (#3255)

Description
---
This PR changes the peer metadata push to listing mode speed to push every time it receives a chain metadata ping or pong message. 

Motivation and Context
---
This is introduced to allow a node to switch faster and not wait till it received it all the pings and pongs from a node.

How Has This Been Tested?
---

Run all unit tests and manually ran node.

* fix: small display bug (#3257)

Description
---
The escape sequence was eating up the string "Starting recovery at height: ".

How Has This Been Tested?
---
Manually/visually.

* feat: add Igor testnet (#3256)

Description
---
This PR adds support for the Igor testnet to the repo. This involves adding Igor to the Network Enum, adding a Igor generic block and adding a config file with the details of 4 Igor seed nodes (still to be rolled out)

Motivation and Context
---
We need a second testnet to test network switching

How Has This Been Tested?
---
Manually ran the network to generate seed nodes details.

* chore: tokio 1 and other crate upgrades (#3258)

Description
---
- upgrades to tokio 1.10
- upgrade multiaddr to 1.13
- updates select loops to use tokio::select!
- updates to use tokio mpsc and oneshot channels
- remove max_threads config
- removed tari_wallet dependency from tari base node
- moved emoji id library out of tari wallet into tari core (in order to
  remove dependency on `tari_wallet` for tari base node)
- Wait for bootstrap with mempool sync moved to the initializer
- Unit and integration test fixup
- Upgraded following crates that use or are required by tokio 1:
  `bytes`, `prost`, `tonic`, `reqwest`, `hyper`, `trust-dns-client`

~~Include changes from #3237 merged

Motivation and Context
---
Tokio runtime is perhaps the most critical dependency we have and was very out of date (was 0.2.x).
This PR takes advantage of bug fixes and optimisations of tokio 1.10.

How Has This Been Tested?
---
- Existing unit and integration tests run and pass
- Existing cucumber tests pass
- Ran all tari applications (base node, console wallet, miner, mm proxy, stratum transcoder)
- Ran a washing machine test on two upgraded wallets connected to an upgraded base node

* fix: off-by-one causing "no further headers to download" bug (#3264)

Description
---
When entering the `synchonize_headers` function, a chain of headers has
been downloaded and validated but not committed. If there less than
1000 (not equal to as before), the function can exit without streaming
more as there are no more to send.

This PR correctly handles the case where the node is exactly 1000 headers behind
by: (1) correcting the off-by-one "no further headers to download" conditional and
(2) commiting headers before starting streaming if the PoW is stronger,
in case no further headers would be streamed.

Motivation and Context
---
Header sync ends prematurely when receiving exactly 1000 "pre-sync" headers.

How Has This Been Tested?
---
Manually - Sync from scratch.

* fix: revert mining_node default logging config (#3262)

Description
---
The mining_node relies on its stdout logging for output for the binary and a recent global update to the logging filtered out the debug and info messages to the std out.

This PR updates the default logging config for the mining node so that debug and info messages are logged to stdout.

How Has This Been Tested?
---
Manually

* feat: allow network to be selected at application start (#3247)

Description
Network selection for applications

Motivation and Context
Allows network to be selected at application start

How Has This Been Tested?
Manually

* feat: add ability to bypass rangeproof (#3265)

Description
---
Adds the ability to bypass rangeproof verification. 

Motivation and Context
---
Warning: This should not be done by default as it can cause a fork. By default this should always be set to verify rangeproofs, but in some scenarios, you want to disable it to quickly download a chain or run on a slim device. 

The rangeproof verification also takes the majority of time when profiling, so by disabling it, we can monitor other performance bottlenecks

How Has This Been Tested?
---
Manually

> Note that I disabled checking of rangeproofs during wallet sending because it adds little value to validate a rangeproof that you created

* test: add trace tag to liveness data (#3269)

Description
---

Added trace tag info into the liveness log messages for improved tracing of ping-pong messages

Motivation and Context
---

This will help to investigate why ping-pong messages are not robust when using a single forced sync peer.

How Has This Been Tested?
---

System-level testing

* fix: auto update continuously checks auto_update_check_interval is disabled (#3270)

Description
---
Continuously checks for updates when auto_update_check_interval is disabled.

Thanks @mikethetike. for finding it and for the fix 

Add check to if no auto update URIs are configured 

Motivation and Context
---
Bug fix

When check_interval is disabled, stream::empty() is used to disable the update checking, however it
returns None continuously when polled, causing the update to continuously be checked.

Also sets MissedTickBehaviour::Skip - which will prevent bursts of checks if intervals are missed

How Has This Been Tested?
---
Ran base node with auto_update_check_interval = 0 (or equivalently without this setting set)

* fix: remove cucumber walletffi.js file that got re-included in rebase (#3271)

Description
Code was moved to ffiInterface.js and updated. Mistakenly got re-included when fixing a conflict in a rebase.

Motivation and Context
---

How Has This Been Tested?
---

* test: early subscription to connectivity events for mempool sync (#3272)

Description
---
Subscribe to connectivity events before waiting for the state machine to bootstrap

Motivation and Context
---
Causes cucumber ` Scenario: Transactions are synced` to fail. Could cause mempool sync not to happen in some fairly unlikely but possible cases in base node.

How Has This Been Tested?
---
Cucumber Scenario: Transactions are synced passes

* refactor: enable compile without sqlite, move emoji id and common types to tari_common_types (#3266)

Description
---
It moves emoji id and common types to tari_common_types

Motivation and Context
---
The main problem here was a dependency on tari_base_node -> tari_wallet for `EmojiId`. Then EmojiId references PublicKey, so ended up moving a whole bunch around. 

> Note: Hidden in all of this is feature to compile SQLite without having it installed as a lib

How Has This Been Tested?
---
Manually

* fix: make logging less noisy (#3267)

Description
---
Remove logging of errors from tracing instrument macros.

Motivation and Context
---
Was reported as making the base node unusable. Hopefully we are not swallowing important information, but probably the right choice

How Has This Been Tested?
---
Manually

> ~~Note: This PR is based on #3266 to enable compilation without SQLite installed~~

* chore: add network to Base Node status line (#3278)

Description
---
We want to see the network in the base node status line.

How Has This Been Tested?
---
manually

Co-authored-by: Philip Robinson <simian@tari.com>
Co-authored-by: mergequeue[bot] <48659329+mergequeue[bot]@users.noreply.github.com>
Co-authored-by: SW van Heerden <swvheerden@gmail.com>
Co-authored-by: Mike the Tike <mikethetike@tari.com>
Co-authored-by: striderDM <51991544+StriderDM@users.noreply.github.com>
Co-authored-by: Stanimal <sdbondi@users.noreply.github.com>
Co-authored-by: mongolsteppe <75075420+mongolsteppe@users.noreply.github.com>
Co-authored-by: Martin Stefcek <35243812+Cifko@users.noreply.github.com>
Co-authored-by: Martin Stefcek <gcifko@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Hansie Odendaal <pluto@tari.com>
Co-authored-by: Hansie Odendaal <39146854+hansieodendaal@users.noreply.github.com>
Co-authored-by: Cayle Sharrock <CjS77@users.noreply.github.com>
@sdbondi sdbondi restored the core-header-sync-off-by-one branch February 3, 2022 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants