Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Horizon Prerequisites Benchmarking #4960

Open
urvisavla opened this issue Jul 13, 2023 · 7 comments
Open

Horizon Prerequisites Benchmarking #4960

urvisavla opened this issue Jul 13, 2023 · 7 comments

Comments

@urvisavla
Copy link
Contributor

What problem does your feature solve?

We want to update the Horizon prerequisite documentation with the minimum hardware requirements for running the Horizon. To do this, we need to perform benchmarking and testing similar to what we've conducted previously.

What would you like to see?

Determine the minimum specifications required for the Horizon compute instance and the Postgres database instance with focus on measuring the memory, CPU, disk space, and IOPS requirements for both components.

What alternatives are there?

@mollykarcher
Copy link
Contributor

Some unknowns to investigate here. Notably, how much API load we assume that users will have. Potentially use existing goreplay setup but filtered/reduced depending on what we decide.

@sreuland
Copy link
Contributor

@urvisavla , during verification of compute resources, wanted to mention it should include ENABLE_CAPTIVE_CORE=true and CAPTIVE_CORE_USE_DB=true, I think those are the defaults at this point. Since, captive core with disk db usage will dramatically lower the amount of RAM used by captive, current pre-reqs in docs mention 32GB of ram required, but with on-disk usage, that should be well under 8GB in almost all cases if not lower - #4092 (comment)

@urvisavla
Copy link
Contributor Author

@urvisavla , during verification of compute resources, wanted to mention it should include ENABLE_CAPTIVE_CORE=true and CAPTIVE_CORE_USE_DB=true, I think those are the defaults at this point. Since, captive core with disk db usage will dramatically lower the amount of RAM used by captive, current pre-reqs in docs mention 32GB of ram required, but with on-disk usage, that should be well under 8GB in almost all cases if not lower - #4092 (comment)

@sreuland
We observed RAM usage on the ingestion instance (dev cluster) to remain below 8GB, usually hovering around 6GB. However, during state-verification, the RAM usage spikes to 11GB and remain so for the entire duration of state-verification. Meanwhile, the memory usage on the API instance (prod cluster) remains consistently below 3GB. I believe that the main contributor to memory usage is the in-memory graph for path payments.

Considering these observations and given that our recommendations are for an instance serving all functions (API + ingestion), 16GB RAM should be adequate. I will update our documentation to reflect this recommendation.

@urvisavla
Copy link
Contributor Author

Update:

  • Shared a document with the team detailing observations from EC2 and RDS instances from the dev and prod clusters

  • Updated the hardware specifications, including CPU, memory, and disk, in our public docs within the partner-experience branch (to be merged to the main branch soon).

  • Unfortunately, couldn't obtain hardware benchmarks for running an API instance due to the absence of API traffic in both staging and dev clusters.

  • Explored options like using the 'go-replay' tool for mirroring traffic, but it proved to be unfeasible #2461.

  • Next steps:
    Explore developing a custom tool to simulate requests from prod (using logs from AWS) and replay them on dev cluster. And for that we'd want to use instances with specifications similar to what we plan to recommend in our public docs. Created ops request for provisioning new instances.

@sreuland
Copy link
Contributor

Created https://github.com/stellar/ops/issues/2536 request for provisioning new instances.

Hello @urvisavla , I left a comment for considertaion of using k8s for provisioning new instances rather than ec2:
https://github.com/stellar/ops/issues/2536#issuecomment-1728517511

@sreuland
Copy link
Contributor

sreuland commented Oct 2, 2023

@urvisavla , you mentioned a performance benchmarks doc was shared, can it be linked or summ'd here also? Thanks!

@urvisavla
Copy link
Contributor Author

@urvisavla , you mentioned a performance benchmarks doc was shared, can it be linked or summ'd here also? Thanks!

Sorry, I missed this earlier. Here is the doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants