-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query Times Out with Pad List #25
Comments
Can you tell us a little about your environment? Why are you experiencing this problem? What is your MySQL config? OS, versions etc. etc. |
@JohnMcLear - I did not get the template when I clicked 'New Issue' I was just going to Etherpad 1.8.14 MySQL has about 25+ millions rows
|
MySQL is finicky about data structure and is easy to deploy in a non-optimal state. With that in mind, can you show results of describing the database and tables of your etherpad database? afaik the command is something like I'm wondering if you have MyISAM when InnoDB would be required, but that's just a hunch atm. I'm wondering if ep_adminpads2 We do have testing for large dataset queries afaik https://github.com/ether/ueberDB/blob/master/test/lib/mysql.sql#L38 We also support pooling: https://github.com/ether/ueberDB/blob/master/test/test_mysql.js#L42 |
Running the SQL direct
Takes about 30 seconds and returns 19763 rows. |
Cc @rhansen Should findKeys be this inefficient here? Is it returning pad contents for each pad or padid? If should just be key which should be very low computational cost no? Can/should adminpads build its own pad database record on each new / delete / fork / copy event to make this query low cost? |
The database is probably taking so long because it has to examine every I'm not sure how to fix this without changing the db schema or adding complex features to ueberdb. Instead of storing revisions at To scale to an arbitrary number of pads, we would need a solution that allows us to fetch keys a page at a time (e.g., 0 through 49, then 50 through 99, and so on). One way to do this is to add cursor support to ueberdb. Another way, which might not be possible, is to add a generic I think it would probably be better for admins of large Etherpad deployments to shard pads across multiple Etherpad instances and databases (e.g., the reverse proxy uses rendezvous hashing on the pad ID to pick the appropriate Etherpad backend). |
One short coming of the sharding approach is that it breaks plugins like ep_padadmin2, as the list of pads you get would be only those on the instance that the result to |
Instead of introducing |
That might work as a temporary solution, but there are some issues:
|
Actually, I'm mostly concerned about the size of the db entry and the frequency it gets updated. This is only useful for very large instances with a large number of pads and a lot of pad creations (which result in a large number of pads). So updating such a huge entry probably causes more trouble then it solves. regarding 1: on first startup after this code is live, check if the regarding 2: I know, but iirc I never saw public code (except john's proxy/sharding work). Those setups can cause some trouble, e.g. with ueberdb's buffer. Maybe we should start collecting a list of problems to take care of when running large instances. What I mean is after you scale your database cluster, setup distribution to different nodes based on padid, setup failover/HA for every group of nodes, what to expect if you don't setup different databases for every node group. Is there anything that Etherpad should do to better support this? regarding 3: Unless we drop support for any non-transactional databases, we probably won't be able to handle the crash case. "Normal" pad creation via API or UI shouldn't be too complicated to get right. When you say plugin actions, do you mean plugins that directly write to database? Instead of adding one |
This is correct, on large instance, this query is taking too long hit the query timeout in ueberdb. After optimizing the mysql (increasing the key_buffer_size, join_buffer_size and query_cache_limit), I could speed up the query by 25% from 75 seconds to 53seconds. The query itself:
All entries in the database:
As a side note: the etherpad instance runs on MYISAM engine and not in InnoDB, due to performance recommendations from the wiki [1][2]. Is this still the recommended engine to use for more performance, or would InnoDB be the better choice? [1] - https://github.com/ether/etherpad-lite/wiki/Converting-from-InnoDB-to-MyISAM |
I'm curious. Did you update to the latest version of Etherpad? Did something change? I updated many drivers so it could work now faster. Maybe Scylladb https://www.scylladb.com/ might be interesting. That is also used in Fortune 500 companies when performance is key and because it is already a key value store it might be even faster. |
How do I get past this timeout issue trying to list the pads?
The text was updated successfully, but these errors were encountered: