-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reorged docker compose #1579
Conversation
9330f31
to
4e3b5c8
Compare
c22055b
to
b4891f5
Compare
fd0abb8
to
e5a54b8
Compare
It seems not all tests can survive reorgs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -62,6 +63,7 @@ defmodule InFlightExitsTests do | |||
@gas_process_exit_price 1_000_000_000 | |||
|
|||
setup do | |||
Reorg.finish_reorg() | |||
# as we're testing IFEs, queue needs to be empty | |||
0 = get_next_exit_from_queue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is still a problem with eth hash nodes returning revert reasons rather then 0 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a hack for it 1ef4b25
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@InoMurko It seems event with this hack, in-flight tests still fail
** (MatchError) no match of right hand side value: {:error, %{"code" => 3, "data" => "0x08c379a00000000000000000000000000000000000000000000000000000000000000020000000000000000000000000000000000000000000000000000000000000004050696767796261636b20697320706f737369626c65206f6e6c7920696e20746865206669727374207068617365206f6620746865206578697420706572696f64", "message" => "execution reverted: Piggyback is possible only in the first phase of the exit period"}}
I think reorgs leave these test in inconsistent state
8abc82c
to
201593c
Compare
2870dac
to
75acd8d
Compare
75acd8d
to
73a6cdb
Compare
4cbfa8f
to
22734ca
Compare
1e389e5
to
3ba3fa6
Compare
.circleci/config.yml
Outdated
- run: | ||
name: (Perf) Format generated code and check for warnings | ||
command: | | ||
cd priv/perf | ||
# run format ONLY on formatted code so that it cleans up quoted atoms because | ||
# we cannot exclude folders to --warnings-as-errors | ||
mix format apps/*_api/lib/*_api/model/*.ex | ||
make format-code-check-warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this too
case Jason.decode!(response.body)["data"] do | ||
%{ | ||
"result" => "complete", | ||
"transactions" => [ | ||
%{ | ||
"sign_hash" => sign_hash, | ||
"typed_data" => typed_data, | ||
"txbytes" => txbytes | ||
} | ||
] | ||
} -> | ||
{:ok, [sign_hash, typed_data, txbytes]} | ||
|
||
%{"code" => "create:client_error", "messages" => %{"code" => "operation:service_unavailable"}} = result -> | ||
if tries == 0 do | ||
result | ||
else | ||
Process.sleep(1_000) | ||
create_transaction(amount_in_wei, input_address, output_address, currency, tries - 1) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should break this into functions.
result = Jason.decode!(response.body)["data"]
case process_transaction_result() do
{:ok, [sign_hash, typed_data, txbytes]} -> {:ok, [sign_hash, typed_data, txbytes]}
:unavailable ->
if tries == 0 do
result
else
Process.sleep(1_000)
create_transaction(amount_in_wei, input_address, output_address, currency, tries - 1)
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does operation:service_unavailable
happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's happening during reorgs. when one node is paused. I think it happens because Nginx continues to redirect to paused node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, so perhaps an alarm is raised (ethereum connection error?)
get-alarms:
echo "Child Chain alarms" ; \
curl -s -X GET http://localhost:9656/alarm.get ; \
echo "\nWatcher alarms" ; \
curl -s -X GET http://localhost:${WATCHER_PORT}/alarm.get ; \
echo "\nWatcherInfo alarms" ; \
curl -s -X GET http://localhost:${WATCHER_INFO_PORT}/alarm.get
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any alarms. but in Nginx logs I see
nginx | 2020/07/06 11:36:53 [error] 29#29: *37837 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.25.0.105, server: , request: "POST / HTTP/1.1", upstream: "http://172.25.0.103:8545/", host: "172.25.0.104:80"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alarms would be raised on enpoints watchers and childchain enpoints: /alarm.get
. Perhaps logged as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue is in Nginx. during reorgs when one node is paused, Nginx still tries to access it
@@ -295,6 +291,10 @@ defmodule Itest.Poller do | |||
Process.sleep(@sleep_retry_sec) | |||
submit_typed(typed_data_signed, counter - 1) | |||
|
|||
%{"messages" => %{"code" => "operation:service_unavailable"}} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it didn't dissapear?
end | ||
|
||
defp hash(message), do: ExthCrypto.Hash.hash(message, ExthCrypto.Hash.kec()) | ||
def with_retries(func, total_time \\ 510, current_time \\ 0) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixing private and public functions.
also, what is the reason we need to retry
. Is it so that all nodes unlock an account? is there a better way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any other way. It is related to Nginx timeout. Some requests can no finish in 10 seconds. I set it 10 because during reorg Nginx continues to redirect to the paused node (til the first failure?), paused node accepts a request but do not return any response for N (timeout limit) seconds
9ed6bb0
to
49976e5
Compare
49976e5
to
240005e
Compare
@@ -96,4 +100,19 @@ defmodule Itest.PlasmaFramework do | |||
Map.put(acc, key, value) | |||
end) | |||
end | |||
|
|||
def with_retries(func, total_time \\ 120, current_time \\ 0) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixing private and public functions
amount | ||
|> Currency.to_wei() | ||
|> Client.deposit(alice_account, Itest.PlasmaFramework.vault(Currency.ether())) | ||
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the anon. function, the balance = Client.get_exact_balance(alice_account, expecting_amount)
would def. return the correct and LAST balance because the reorg is at this point finished, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really, it takes some time for childchain to sync the latest data
priv/cabbage/apps/itest/lib/reorg.ex
Outdated
if Application.get_env(:cabbage, :reorg) do | ||
pause_container!(@node1) | ||
unpause_container!(@node2) | ||
|
||
Process.sleep(10_000) | ||
|
||
func.() | ||
|
||
Process.sleep(10_000) | ||
|
||
pause_container!(@node2) | ||
unpause_container!(@node1) | ||
|
||
Process.sleep(30_000) | ||
|
||
response = func.() | ||
|
||
Process.sleep(30_000) | ||
|
||
unpause_container!(@node2) | ||
unpause_container!(@node1) | ||
|
||
:ok = Poller.wait_until_peer_count(1) | ||
|
||
Process.sleep(20_000) | ||
|
||
response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like these sleeps
, because they're undeterministic. Is there a way for us to make this less prone to error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can request the block number from nodes and wait for 5 generated blocks. I'll do this now.
I've noticed another issue: elixir-omg/priv/cabbage/apps/itest/test/itest/deposits_test.exs Lines 61 to 88 in f713eb6
Perhaps a better approach, scalable at least, would be to get the curent ethereum height and compare it with Or a even better approach, would be to get the block number at which the deposit transaction was made and compare it with |
@InoMurko how does it look? |
Overview
Adds a step to circle ci that runs cabbage tests using 2 node ethash geth that can be used to trigger reorgs