Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HystrixCollapser Refactor for Performance and Concurrency Bug Fixes #114

Merged
merged 1 commit into from
Feb 27, 2013
Merged

HystrixCollapser Refactor for Performance and Concurrency Bug Fixes #114

merged 1 commit into from
Feb 27, 2013

Conversation

benjchristensen
Copy link
Contributor

HystrixCollapser: Response Not Received #80

  • refactored HystrixTimer to be multi-threaded and use ScheduledThreadPoolExecutor to achieve this
  • refactored HystrixCollapser to:
    • timeout on wait and trigger batch execution from calling thread instead of relying on only background timer
    • use single CountDownLatch per batch rather than 2 per collapsed requests to dramatically reduce number of items being waited on
    • changed how batches are defined so the creation of a batch is a simpler step without a loop over all requests to register with the batch

Testing on a production canary shows these changes have given significant efficiency and thus performance gains.

I have tested it against 3 different applications including the Netflix API using a production canary.

IllegalStateException: Future Not Started #80

- refactored HystrixTimer to be multi-threaded and use ScheduledThreadPoolExecutor to achieve this
- refactored HystrixCollapser to:
  -- timeout on wait and trigger batch execution from calling thread instead of relying on only background timer
  -- use single CountDownLatch per batch rather than 2 per collapsed requests to dramatically reduce number of items being waited on
  -- changed how batches are defined so the creation of a batch is a simpler step without a loop over all requests to register with the batch

Testing on a production canary shows these changes have given significant efficiency and thus performance gains.
@benjchristensen
Copy link
Contributor Author

I have gotten positive test results of this from 3 teams including my own testing on production canaries so I am merging this code and releasing.

What I looked at in prod was:

  • no new errors showed up
  • concurrency bugs and "Response Not Received" are not gone
  • performance numbers under normal system load for application behavior did not change
  • thread dumps look clean
  • histo of objects on the heap looks fine after running through peak traffic and over night (so doesn't look like I created any memory leaks)

The other 2 teams did stress tests in the test environment and everything passed.

benjchristensen added a commit that referenced this pull request Feb 27, 2013
HystrixCollapser Refactor for Performance and Concurrency Bug Fixes
@benjchristensen benjchristensen merged commit 1c51f9e into Netflix:master Feb 27, 2013
@benjchristensen
Copy link
Contributor Author

I let 2 instances run as production canaries for 24+ hours with higher than normal load (higher RPS driven by weighted-load-balancing being increased for these boxes) and didn't see any "Null ResponseReference" logs on either machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant