Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flawed logic in the new visualization timeout #3030

Closed
3 tasks done
andreineculau opened this issue Jun 23, 2017 · 14 comments · Fixed by #3329
Closed
3 tasks done

flawed logic in the new visualization timeout #3030

andreineculau opened this issue Jun 23, 2017 · 14 comments · Fixed by #3329

Comments

@andreineculau
Copy link

Make sure these boxes are checked before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if any
  • I have reproduced the issue with at least the latest released version of superset
  • I have checked the issue tracker for the same issue and I haven't found one similar

Superset version

0.18.5

Expected results

adhere to webserver/worker's timeouts e.g. as per the -t flag to superset runserver, or if superset is behind a reverse-proxy to that proxy's timeout

Actual results

adhere to hardcoded value in javascript client - 45 seconds

Steps to reproduce

superset runserver -t 300 will still show that a 60s query times out, because the timeout is set to 45 seconds in constants.js, despite 60 < 300

NOTE problem stems from #2763 which, for reasons that I cannot see nor guess, did not address the core problem: handle the 504 Gateway Time-out status code of the ajax call and instead implemented a client-side timeout / ping @graceguo-supercat

@mistercrunch
Copy link
Member

@graceguo-supercat can you take a look? We may just need to clarify/document what each one of these timeouts actually do. -t is a web server timeout, constants.js's timeout is the client's timeout, and there's at least one or two more timetouts (celery/sqllab) timeouts in the app...

@graceguo-supercat
Copy link

@andreineculau, please see updated faqs. please let me know if you have more concerns.

@andreineculau
Copy link
Author

@graceguo-supercat @mistercrunch while it's always good to have behaviours documented, in this case the timeout behaviour https://github.com/apache/incubator-superset/blob/master/docs/faq.rst#why-are-my-queries-timing-out , I would like to rephrase my concern via a couple of questions, hoping that this will clarify either the limitation that you addressed via the client-side timeout, either my goal in normalizing the behaviour across the board and not have a special client-side timeout.

  1. If I do not run superset behind a proxy, thus hypothetically no way to get a 504 response, but having data retrieval happening in more than 45 seconds, the client-side timeout will still kick in, not allowing the users to get queries that might have ended in 50 seconds, just 5 seconds after the timeout. Is this correct (=intended) behaviour?

  2. If I do run behind a proxy, which has a gateway timeout set to 10 seconds, the current behaviour is still to show 504 Gateway Time-out, instead of the "warning message". Is this correct (=intended) behaviour?

  3. Why is it that you cannot handle the 504 Gateway Time-out in a generic fashion, without having a client-side timeout?

  4. As it is now, the current behaviour introduced a warning message, which hides a perfectly fine HTTP error message (I believe other HTTP errors are still displayed), with a magic constant, that can only be configured by monkey-patching code. Where is the improvement?

@graceguo-supercat
Copy link

If I do not run superset behind a proxy, thus hypothetically no way to get a 504 response, but having data retrieval happening in more than 45 seconds, the client-side timeout will still kick in, not allowing the users to get queries that might have ended in 50 seconds, just 5 seconds after the timeout. Is this correct (=intended) behaviour?

If I do run behind a proxy, which has a gateway timeout set to 10 seconds, the current behaviour is still to show 504 Gateway Time-out, instead of the "warning message". Is this correct (=intended) behaviour?

If you are not behind proxy, or the proxy itself has a shorter timeout limit and you don't want user see the 504 Gateway time-out, currently you have to change js file and rebuild js package.

Why is it that you cannot handle the 504 Gateway Time-out in a generic fashion, without having a client-side timeout?

This article might be helpful. I have limited control on proxy server.

As it is now, the current behaviour introduced a warning message, which hides a perfectly fine HTTP error message (I believe other HTTP errors are still displayed), with a magic constant, that can only be configured by monkey-patching code. Where is the improvement?

Client-side timeout limit is just a quick fix. Ideally I am thinking we should provide a configuration webpage for admin user, so that he can easily configurate instead of hard-code it. If this feature has a good amount of request, I am happy to implement it. Also I am open to any suggestion to resolve this issue.

@andreineculau
Copy link
Author

andreineculau commented Jul 7, 2017

Why is it that you cannot handle the 504 Gateway Time-out in a generic fashion, I.E. CODE THE CLIENT TO SHOW THE WARNING MESSAGE WHENEVER THE BACKEND RESPONDS WITH 504, without having a client-side timeout?

Client-side timeout limit is just a quick fix.

The first question is more important but would you be so kind to tell also what does it quick fixes, beside showing a hard-coded message instead of the raw 504? Thanks (LATER EDIT: the stress is on "quick" - since you say this is just a quick fix and that a better/proper fix would come sooner or later, what triggered the emergency of the fix? #2734 called for an improvement not a fix, thus in my world there's no emergency that demands first a quick job followed by a better/proper one)

@graceguo-supercat
Copy link

graceguo-supercat commented Jul 7, 2017

Given the browser's implementation for XMLHttpRequest object, unfortunately when 504 happened, there is no build-in event being triggered. In dashboard/explore view, we use jQuery.ajax to fetch slice data. Here is a good example for how to handle request timeout. Using native XMLHttpRequest or other framework is pretty much the same way.

For your second question: quick fix means this fix only takes 2 hours, then users waiting for long-running queries won't see bold and ambiguous 504 error message any more. Hard-coded limit is not perfect for this problem. Admin's configuration page is one choice in my mind, and I am open to any better solution.

@andreineculau
Copy link
Author

Given the browser's implementation for XMLHttpRequest object, unfortunately when 504 happened, there is no build-in event being triggered.

What? Clearly the 504 error surfaces, because otherwise users wouldn't see the 504 literal in the first place, before this "fix". https://jsfiddle.net/654yLt7j/ shows unambiguously that you do get access to the 504 error, thus you can handle it gracefully.

In dashboard/explore view, we use jQuery.ajax to fetch slice data. Here is a good example for how to handle request timeout. Using native XMLHttpRequest or other framework is pretty much the same way.

Then why are we talking about jQuery? And why are we talking about the client-side timeout? I'm highlighting the fact that you do not need any client-side timeout. What is needed is handling the 504 error.

@andreineculau
Copy link
Author

Strictly to the point,

https://github.com/apache/incubator-superset/pull/2763/files#diff-d9a47bda12ab9d33eb61f5be2d924d51R250

should be

if (err.status === 504) {
  dispatch(chartUpdateTimeout(err.statusText));
} else {

and all references to QUERY_TIMEOUT_THRESHOLD should be removed.

Result: "nicer" text for timeouts, no client-side hardcoding.

@graceguo-supercat
Copy link

graceguo-supercat commented Jul 11, 2017

i see your point: instead of checking err.statusText === 'timeout', i can use err.status === 504 to handle 504 response. It seems when 504 happens, jQuery's err.statusText is still error, that's why i feel there is no way to identify 504 from any other type of errors.

But without client-side timeout limit, for users not behind proxy, they will wait either server timeout or browser timeout, whichever comes first. So probably we still need to handle browser timeout (assume we allow server timeout 5 minutes). Browser timeout doesn't have err.status === 504, but i think i can still use err.statusText === 'timeout'.

I think set ajax 'timeout' parameter is still necessary, but it doesn't need to be shorter than proxy timeout limit.

@andreineculau
Copy link
Author

andreineculau commented Jul 11, 2017

I don't know if you and @mistercrunch talked on other channels but the way that I read his issue #2734 is that he wants the 504 handled, not that the client should time out.

The client cannot know why the timeout occurs, it could very well be that the proxy server has high CPU usage, or that there is high network latency on some end, etc etc etc anything but my data set being large. From that perspective, the message itself is even more confusing and hedging with "Perhaps" doesn't help one bit. Just to be clear, that timeout will occur also if the backend already sent 200 OK, and partial headers/body but didn't yet signal the end of the HTTP message. Heck, it can even be that the browser pool of connections is filled and that the request doesn't even get sent from the queue - see definition of jquery.ajax' timeout.

But what is worse is that all of a sudden, superset instances become trapped by a magic 50s timeout. If indeed, @mistercrunch or someone else wanted/wants a client-side timeout instead of a world-wide known spinner that almost anyone knows it's either slow network or file too large or too much processing, then why not have it optional ? You show a friendly error message on all 504 errors, and optionally on a client-side timeout IFF the admin or maybe individual users in the future opt-in.

FWIW 1. I have lived through these times https://productforums.google.com/forum/#!topic/chrome/gb_bZzmCnm4
FWIW 2. the current http response timeout in browsers, or at least in Firefox is 5 minutes - quite a difference from superset's 50 seconds

@jonathortense
Copy link

As mentioned by @graceguo-supercat, the faq says:

to change client-side timeout limit settings from /superset/superset/assets/javascripts/constants.js file and rebuild js package

Please, can anyone explain me how to "rebuild js package"? because I already changed this file and I still see the message: "Query timeout - visualization query are set to time out at 45 seconds."

@mistercrunch
Copy link
Member

What about having the timeout setting on the client side be aligned with the timeout set on the backend? We could just align it with SUPERSET_WEBSERVER_TIMEOUT. Then it's a matter of passing that from the backend to the frontend when first loading the app(s).

@graceguo-supercat will be back from vacation in a few weeks and we can fix that up. In the meantime you'll have to follow the instructions in the CONTRIBUTING.md to make your own build.

@andreineculau
Copy link
Author

@mistercrunch you don't need a client setting in the first place...

@mistercrunch
Copy link
Member

There's a bit more to it though. Proxy and web servers have timeouts for a reason. With 5 minutes timeouts we potentially enable users to hammer the database(s) pretty heavily and allow for a not-so-interactive experience in the dashboard and explore view. The timeout forces users to summarize their data upfront which arguably is a good thing. Also if the web server has timed out, the user should be informed of it, and I think that (from what I know) without a client timeout there's no way to know in some cases.

If pushing the logic of long timeout to the extreme, you end up having to provision a lot of web workers to support hung-up database queries. When all web worker slots are taken by a poorly designed dashboard (or a database that is brought to its knees), you can't even serve the basic assets and pages anymore. I'm speaking from experience here where we experience flat out 504 on all pages when Presto comes to a crawl for whatever reason.

I've been thinking that it would be best to leverage the Celery worker infrastructure for all database interactions (when available), as opposed to increasing the timeouts and number of web worker slots.

I agree Superset shouldn't force you to define a specific timeout though. If you want to make it 10 minutes timeouts and however how many web workers you need to support that you should be able to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants