Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: bridge panics if there's an error retrieving gas price #93

Merged
merged 3 commits into from
May 30, 2018

Conversation

yrashk
Copy link
Contributor

@yrashk yrashk commented May 25, 2018

Solution: instead, log an error and use previous price

Solution: instead, log an error and use previous price
@yrashk yrashk requested a review from DrPeterVanNostrand May 25, 2018 08:44
@ghost ghost assigned yrashk May 25, 2018
@ghost ghost added the in progress label May 25, 2018
Copy link
Contributor

@akolotov akolotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrashk and @DrPeterVanNostrand, what sort of tests did you perform to check these changes? How did you simulate incorrect oracle response to cover all possible error cases? Thanks

@@ -44,6 +45,8 @@ impl GasPriceStream {
speed: node.gas_price_speed,
request_timer: timer.clone(),
interval: timer.interval_at(Instant::now(), CACHE_TIMEOUT_DURATION),
last_price: node.default_gas_price,
request_timeout: node.request_timeout,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrashk my expectation is that gas_price_timeout should be used. Since it was introduced for this. I can read the following in README.md:

home/foreign.gas_price_timeout - the number of seconds to wait for an HTTP response from the gas price oracle before using the default gas price. Defaults to 10 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, you are correct. That config option was added for this reason. I'm not sure how necessary having this config option it is though. Its plausible that we use a single timeout duration for all http requests (web3 and gas-price).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is different systems and logically they could have different requirements to availability. For example, the timeout to handle response from parity node (it is my understanding from @yrashk explanation how it works) could be larger than the timeout to just getting response from gas price oracle. That's why I think they needs to be differentiated.

@@ -12,7 +12,6 @@ use config::{GasPriceSpeed, Node};
use error::Error;

const CACHE_TIMEOUT_DURATION: Duration = Duration::from_secs(5 * 60);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have CACHE_TIMEOUT_DURATION changeable through the configuration file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easy enough to add it to the config file, the only problem that I could see is that if a user accidentally set their cache-duration too low, they could possibly DOS the gas price oracle.

The duration of the cache should reflect Ethereum's gas-price volatility. For example, if the gas-price was rapidly changing, you would want a shorter cache duration, however, looking at the current sate of the Ethereum network, the average gas-price across all confirmed transactions stays within the [5, 25] GWEI range, and has been that way for quite some time. To me, this indicates that cache duration should be a constant.

What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for good reasoning! I agree that cache-duration should not be too low. And that's why I do not see any reason it was chosen to be 5 minutes. Could it be greater: 15 minutes for example? Since we could have different view based on different analysis I suggest to leave the hardcoded value at this moment and open a new issue to make this parameter configurable in the next updates. Any objections?

@DrPeterVanNostrand
Copy link
Contributor

@akolotov To test, I just ran cargo test basic and cd integration-tests && cargo test -- --test-threads=1 --nocapture. However, not all integration tests were passing.

When testing, I logged out the retrieved gas prices and verified them manually, by going to the Oracle url in my browser. This is definitely not the ideal way to test, but I found that the test suite was pretty confusing (in code layout, what the tests were printing out, sometimes a parity node would be left running in the background after a test failed).

With the changes in this PR, the two error cases for the dynamic gas-price (http request error, unrecognized JSON schema) will just default to the last retrieved price. So i'm not sure if adding any new tests would be beneficial. Do you guys @akolotov @yrashk have any suggestions for better ways to do testing around something like this?

@akolotov
Copy link
Contributor

@DrPeterVanNostrand is it enough to run just integration tests to cover all possible error cases addressed in the changes? Just would like to get sure that we confirm that changes are safe before running them on bridge-testnet since deployment and testing there require at least one person-day of work. Are you able to run the bridge instance on real contracts (https://github.com/poanetwork/poa-bridge-contracts/releases/tag/1.0) and try to disconnect (e.g. through hosts) the oracle, provide incorrect output from oracle etc.?

@yrashk
Copy link
Contributor Author

yrashk commented May 28, 2018

@akolotov I ran a number of manual tests on getting wrong responses from an oracle. However, I'm working on decomposing the gas price retriever to be able to simulate these scenarios in automated tests.

yrashk added 2 commits May 28, 2018 10:04
Solution: extract price retrieving from GasPriceStream
and test it.
However, this is different infrastructure and
might have different requirements.

Solution: use gas_price_timeout
@yrashk
Copy link
Contributor Author

yrashk commented May 28, 2018

I've added automatic tests and switched to gas price specific timeout

Copy link
Contributor

@akolotov akolotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrashk @DrPeterVanNostrand if you have no more obvious things to add, let's merge the changes and proceed with testing.

@DrPeterVanNostrand
Copy link
Contributor

@akolotov I don't have anything else to add

@akolotov akolotov merged commit 6cb2c56 into omni:master May 30, 2018
@ghost ghost removed the in progress label May 30, 2018
noot pushed a commit to noot/poa-bridge that referenced this pull request Jul 18, 2018
Problem: bridge panics if there's an error retrieving gas price
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants