-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with using a direct DB connection instead of an API for retrieve data for SSR #783
Comments
We might be investigating this approach in the next weeks. @CharlieC3 do you think that there might be a security concern if we follow this route? |
Are there any reports showing the API is a cause of performance issues in the Explorer? The API is specifically designed to be as fast as possible at retrieving data from the DB with the Explorer being a prime usecase. What makes us believe the Explorer can do it better? Not trying to poke holes in this, just trying to understand the thought process around this 😄 Additionally during times of high traffic (e.g. during a mint), it's been shown that the root cause/bottleneck is queries piling up in the database, rather than the API service itself. If this happens, the Explorer is going to experience the same slowdown whether it routes requests through the API or directly to the backend database. The only way around this is to create a separate private API + DB deployment for Explorer SSR's, and at that point it would be easier to simply route SSR traffic through the private API anyway. I think reducing the amount of SSRs from 6 down to 1 in itself should yield massive performance improvements by an order of magnitude under heavy load, but it's not clear to me that there would be any benefit to accessing the API database directly. Maybe @zone117x or @rafaelcr can weigh in here if they have additional perspective I'm missing. |
Completely agree with what @CharlieC3 just described. Another important aspect is that we use cache headers in the API very heavily. If the explorer connected directly to the API DB, we would not get any help from CF to serve this data to clients. We'll need to accelerate work on this issue on the API so we can serve as much cached requests as we can: hirosystems/stacks-blockchain-api#1230 |
Thanks @CharlieC3 and @rafaelcr for those details, this is really helpful. I was not implying that there were any performance issue with API (which is actually really fast) but that doing SSR with 6 api requests per user request on the explorer server just to show 1 simple html page with light data is not the correct approach.
@andresgalante In light of @CharlieC3's and @rafaelcr's comments and the high dev cost involved with 2) I think we should move to solution 3) whose performance would be close to 2) |
Note that the typical bottlenecks we see are around postgres server load. So changing the way this data is fetched by creating a new a single API endpoint (which would still require multiple sql queries), or by making the Explorer perform multiple direct sql queries -- the performance floor will not change.
This makes much more sense to me, and seems like a better user experience. For example, if I'm loading a page for the purpose of checking tx status, I don't want to wait around for all the other data to be queried before anything is visible. It would also help users when providing feedback, instead of something like "account page is slow", users could report specific issues like "the account balance section of this page is slow to render." It also increases the amount of cache-hits (speeds up page rendering), since user web browsers generally cache data for a longer period of time than Cloudflare. With the assumption that the Explorer SSR fetches are not performing http caching, which means all those requests are at the mercy of Cloudflare's sporadic cache-eviction behavior. |
This is good to know. We can then confidently eliminate solution 2) It's seems that we are all converging to a full client-side solution (3)) @andresgalante unless you have any argument for considering solution 2) or 1), I will pursue my work on moving everything client-side and we can say bye bye to SSR 😄 |
I am not concerned about moving to CSR. Even though at the moment SEO is not something we care about, we might in the future. Not a concern, just something to keep in mind. If this decision can wait a week, I'd like Heidar to be involved. @beguene @CharlieC3 @zone117x @rafaelcr thanks a lot for the input and help here 🙇 |
@andresgalante
In our last meeting we already discussed this and he was ok with moving to client-side as long as we are consistent across pages and in the UX. I don't think this can wait. We must get this code ready as soon as possible. We currently put unnecessary load on the api, do not leverage caching, make rate-limiting difficult and the explorer have poor performance during heavy spike of traffic (and even during low traffic). The explorer becomes the single bottleneck. If you have many users requesting a page from the explorer at the same time, the explorer server is currently caching all the data for all the users in the server. To give you an idea of the urgency Compare this Report: https://gtmetrix.com/reports/explorer.stacks.co/qhW6fOvb/ with this staging environment (similar to prod environment) and my PR branch Or https://explorer.stacks.co/txid/0x28d5c2d5a9d070b378aa061a9a18eddf676d7747e9b232a34f538487e83e0302 First Contentful Paint: 3.7s https://explorer.staging.blockstack.xyz/txid/0x28d5c2d5a9d070b378aa061a9a18eddf676d7747e9b232a34f538487e83e0302 First Contentful Paint: 1.0s I don't think a 12 seconds waiting time for a simple block page is acceptable. Many users would think (incorrectly) that the blockchain is slow while it's actually the explorer which is slow. I am fine with not deploying to production right now, but I don't understand why we would not continue working on this issue now. |
You are right, it's not acceptable. Let's ship when it's done 👍 🚀 |
Just adding more context for the current setup. Generally at all times there will be multiple load-balanced instances of the Explorer running to serve requests, especially when traffic is high. So even if they do have support for a server-side cache, it's not being shared across all Explorer instances. If someone reloads their address page, they'd likely hit a different Explorer instance, which may not have cached data to serve them. Moving the caching efforts client-side should open the door to resolve this since we'd then be able to leverage their browser's cache and perhaps Cloudflare's cache. But if there's still any server-side caching going on, we should look into leveraging a middle caching layer (e.g. Cloudflare), or implementing support for a shared cache view for the Explorer with something like Redis or memcached. |
For most of our pages, the explorer server code is doing one or more (up to 6) api calls to
https://stacks-node-api.mainnet.stacks.co/
server-side which slows down dramatically our first server response.We should avoid this pattern because it puts too much load on the explorer web server and users would experience degraded performance on the explorer as traffic increases (even if the api endpoint can handle it).
Example: 1000 users, each user requests the /txid page once, the explorer server itself requests 6 calls for each user and wait for each user for all the 6 requests to complete before returning a response to the user. That's 6000 api call that can't really leverage user caching as the cache would invalidate more often with more users, it consumes way more memory than necessary etc...
Had those api call been made client side those 6000 api call could be cached per user.
Reducing the load on the explorer server means it will be able to handle more with the same config or the config can be reduced to handle the same number of users.
(Note that nothing changes for the api because the api endpoint will still have to handle those 1000 * 6 requests)
My PR reduces the number of server-side api calls to a maximum of 1. This solution is a quick one and will give us good performance without much change in the code.
Ideally we should fetch the data directly from the db server side with a direct read-only connection to the same db that the api server is using. This way the explorer server will get all the data really fast and will return a response quickly. Besides we can leverage the SSR of nextjs and have great SEO.
This task is to experiment with simple node db connection to the postgresql and check the performance and the amount of code that need to change.
Note that those changes could be done in a progressive manner (say page per page) without changing the rest of the explorer code.
The text was updated successfully, but these errors were encountered: