Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart strategy #1565

Merged
merged 3 commits into from
Jun 3, 2020
Merged

Conversation

InoMurko
Copy link
Contributor

@InoMurko InoMurko commented Jun 2, 2020

Overview

Increasing the restart strategy.

Changes

Screen Shot 2020-06-02 at 22 40 26

Testing

/

@InoMurko InoMurko force-pushed the inomurko/increase-supervisor-restart-strategy branch from faf7a72 to f0f477f Compare June 2, 2020 20:50
@boolafish
Copy link
Contributor

wow...copy a explanation from slack by Ino:


our etheruem connection monitor checks the connection every ~8 seconds. so the connection monitor didn’t catch the network glitch and it didn’t raise a ethereum_connection_error alarm.
the processes that are dependent to a working ethereum connection started crashing, but the crash limit on the supervisor was really low (3 crashes in the 5 second time frame). so tthat means the supervisor will crash itself:

    # Assuming the values max_restarts and max_seconds,
    # then, if more than max_restarts restarts occur within max_seconds seconds,
    # the supervisor terminates all child processes and then itself.
    # The termination reason for the supervisor itself in that case will be shutdown.
    # max_restarts defaults to 3 and max_seconds defaults to 5.

# really off, HeightMonitor should catch that in max 8 seconds and raise an alarm.
max_restarts = 3
max_seconds = 5
opts = [strategy: :one_for_one, max_restarts: max_restarts * 5, max_seconds: max_seconds]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HeightMonitor should catch that in max 8 seconds and raise an alarm.

But you're still setting this to max 5 seconds. Is it okay to not be > 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thing is -the timer resets itself if you don't reach max_restarts in max_seconds. so it's a bit difficult to figure out what the correct numbers are - for example, a delay in a process init/1 could reset the timer I'm mentioning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timer resets itself if you don't reach max_restarts in max_seconds.

oh wow.....I see. I thought if force failure as long as it runs overtime.

@InoMurko InoMurko merged commit 38746ce into master Jun 3, 2020
@InoMurko InoMurko deleted the inomurko/increase-supervisor-restart-strategy branch June 3, 2020 07:26
T-Dnzt pushed a commit that referenced this pull request Jun 10, 2020
* restart strategy

* restart strategy

* restart strategy
unnawut added a commit that referenced this pull request Jun 23, 2020
* Inomurko/reorg block getter (#1554)

* dont store blockgetter events

* dont fetch blockgetter events through aggregator

* get_block_submitted_events

* get_block_submitted_events

* return tuple

* the right contract encoding

* unused api function

* prevent race condition for status cache (#1558)

* prevent race condition for status cache

* Changelog for v1.0.0 (#1556)

* restart strategy (#1565)

* restart strategy

* restart strategy

* restart strategy

* Update changelog for v1.0.0

* global block get interval (#1576)

* Update changelog for v1.0.0

* feat: increase ExitProcessor timeouts (#1592)

* increase timeouts

* docs: changelog

Co-authored-by: Unnawut Leepaisalsuwanna <unnawut@omisego.co>

* chore: update watcher docker-compose to v1.0.1

* docs: small non-content fix to changelog

Co-authored-by: Ino Murko <ino.murko@outlook.com>
Co-authored-by: Thibault <thibault@omisego.co>
boolafish added a commit that referenced this pull request Jun 25, 2020
* feat: sync v1.0.1 changes back to master (#1599)

* Inomurko/reorg block getter (#1554)

* dont store blockgetter events

* dont fetch blockgetter events through aggregator

* get_block_submitted_events

* get_block_submitted_events

* return tuple

* the right contract encoding

* unused api function

* prevent race condition for status cache (#1558)

* prevent race condition for status cache

* Changelog for v1.0.0 (#1556)

* restart strategy (#1565)

* restart strategy

* restart strategy

* restart strategy

* Update changelog for v1.0.0

* global block get interval (#1576)

* Update changelog for v1.0.0

* feat: increase ExitProcessor timeouts (#1592)

* increase timeouts

* docs: changelog

Co-authored-by: Unnawut Leepaisalsuwanna <unnawut@omisego.co>

* chore: update watcher docker-compose to v1.0.1

* docs: small non-content fix to changelog

Co-authored-by: Ino Murko <ino.murko@outlook.com>
Co-authored-by: Thibault <thibault@omisego.co>

* Add block processing queue to watcher info (#1560)

* [WIP] initial queuing work

* refactor: queue processing

* catch db timeouts

* update existing tests

* add tests

* continue with tests

* finish tests

* Fix integration test

* refactor

* refactor

* refactor tests

* remove retry_count

* remove on exist genserver shutdown

* add telemetry queue length event

* sobelow skip BinToTerm

* remove pending block status

* naming

* fix tests

* fix tests

* rename config

* remove unused alias

* fix PR minor comments

* missing file

* Add Transaction filter by end_datetime (#1595)

* add new query

* add if case

* format

* refactor

* add test

* fix feature spec

* use address

* fix assert

* fix test

* dummy assert

* add get single tx

* add wait tx

* working waiter pooler

* add wait helper cabbage test

* test end_datetime querty

* add swagger generator spec

* use second

* fix allow constraints

* update constraints validator

* fix test

* fix test

* edit name

* add empty line'

* more fixes

* fix warning

* refactor use the elixir way

* improve test

* fix test and typo

* use cheap fn

* sort

* use any

* use any boolean return

* use Enum.all

* final credo

* format

* use default params

* update constaints test for end_datetime

* mix format

* @default_paging constant

* fix test end_datetime

* fix test and rename feature file'

* mix format

* Revert "explain analyze updates (#1569)" (#1601)

This reverts commit 3431f26.
This commit is intended to be only deployed to develop env, thus a short life commit.
Reverting this in preparation of release.

* release artifacts (#1597)

* release artifacts

Co-authored-by: Unnawut Leepaisalsuwanna <921194+unnawut@users.noreply.github.com>
Co-authored-by: Ino Murko <ino.murko@outlook.com>
Co-authored-by: Thibault <thibault@omisego.co>
Co-authored-by: Mederic <32560642+mederic-p@users.noreply.github.com>
Co-authored-by: Jarindr Thitadilaka <jarindr23@gmail.com>
boolafish added a commit that referenced this pull request Jul 7, 2020
* chore: merge master back to v1.0.2 (#1606)

* feat: sync v1.0.1 changes back to master (#1599)

* Inomurko/reorg block getter (#1554)

* dont store blockgetter events

* dont fetch blockgetter events through aggregator

* get_block_submitted_events

* get_block_submitted_events

* return tuple

* the right contract encoding

* unused api function

* prevent race condition for status cache (#1558)

* prevent race condition for status cache

* Changelog for v1.0.0 (#1556)

* restart strategy (#1565)

* restart strategy

* restart strategy

* restart strategy

* Update changelog for v1.0.0

* global block get interval (#1576)

* Update changelog for v1.0.0

* feat: increase ExitProcessor timeouts (#1592)

* increase timeouts

* docs: changelog

Co-authored-by: Unnawut Leepaisalsuwanna <unnawut@omisego.co>

* chore: update watcher docker-compose to v1.0.1

* docs: small non-content fix to changelog

Co-authored-by: Ino Murko <ino.murko@outlook.com>
Co-authored-by: Thibault <thibault@omisego.co>

* Add block processing queue to watcher info (#1560)

* [WIP] initial queuing work

* refactor: queue processing

* catch db timeouts

* update existing tests

* add tests

* continue with tests

* finish tests

* Fix integration test

* refactor

* refactor

* refactor tests

* remove retry_count

* remove on exist genserver shutdown

* add telemetry queue length event

* sobelow skip BinToTerm

* remove pending block status

* naming

* fix tests

* fix tests

* rename config

* remove unused alias

* fix PR minor comments

* missing file

* Add Transaction filter by end_datetime (#1595)

* add new query

* add if case

* format

* refactor

* add test

* fix feature spec

* use address

* fix assert

* fix test

* dummy assert

* add get single tx

* add wait tx

* working waiter pooler

* add wait helper cabbage test

* test end_datetime querty

* add swagger generator spec

* use second

* fix allow constraints

* update constraints validator

* fix test

* fix test

* edit name

* add empty line'

* more fixes

* fix warning

* refactor use the elixir way

* improve test

* fix test and typo

* use cheap fn

* sort

* use any

* use any boolean return

* use Enum.all

* final credo

* format

* use default params

* update constaints test for end_datetime

* mix format

* @default_paging constant

* fix test end_datetime

* fix test and rename feature file'

* mix format

* Revert "explain analyze updates (#1569)" (#1601)

This reverts commit 3431f26.
This commit is intended to be only deployed to develop env, thus a short life commit.
Reverting this in preparation of release.

* release artifacts (#1597)

* release artifacts

Co-authored-by: Unnawut Leepaisalsuwanna <921194+unnawut@users.noreply.github.com>
Co-authored-by: Ino Murko <ino.murko@outlook.com>
Co-authored-by: Thibault <thibault@omisego.co>
Co-authored-by: Mederic <32560642+mederic-p@users.noreply.github.com>
Co-authored-by: Jarindr Thitadilaka <jarindr23@gmail.com>

* docs: v1.0.2 change logs (#1611)

* chore: bump version in VERSION file (#1613)

Co-authored-by: Unnawut Leepaisalsuwanna <921194+unnawut@users.noreply.github.com>
Co-authored-by: Ino Murko <ino.murko@outlook.com>
Co-authored-by: Thibault <thibault@omisego.co>
Co-authored-by: Mederic <32560642+mederic-p@users.noreply.github.com>
Co-authored-by: Jarindr Thitadilaka <jarindr23@gmail.com>
@unnawut unnawut added the enhancement New feature or request label Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants