Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow page builds in 6.0 when generating navigation bar #381

Closed
2 tasks
charris opened this issue Apr 12, 2021 · 12 comments
Closed
2 tasks

Slow page builds in 6.0 when generating navigation bar #381

charris opened this issue Apr 12, 2021 · 12 comments

Comments

@charris
Copy link

charris commented Apr 12, 2021

See numpy/numpy#18756. Build times went from around 10 minutes to 30+ on CircleCI.

Things to try

  • Only ask for the toctree one time per page, and split it into two items for the navbar/sidebar
  • See if we can reduce the complexity of the toctree resolve in the "sidebar captions" functionality

To resolve this

We decided that the slowdown here is somewhat unavoidable, as long as you want to keep multiple levels within your navigation bar. For some tips about speeding up site builds, see: https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar

@jorisvandenbossche
Copy link
Member

I am having the same issue with pandas, which is related to the API docs -> #364
I assume for numpy it might be a similar issue? (I don't know the size of your API docs, but I assume also quite big)

@bollwyvl
Copy link
Collaborator

Perhaps we need to consider adding some explicit benchmarks... in the meantime, i'll see if i can dig up some numbers AP for the docs site build here so we can bisect a little.

@bollwyvl
Copy link
Collaborator

Here are some rough findings...

Looking at the data, my takeaway is b822548 is the place to start looking, when it jumps up and doesn't come back down....

head_sha started_at duration
4540f6a 2020-04-23 03:38:37+00:00 4
4540f6a 2020-04-23 03:38:37+00:00 4
1e8295d 2020-04-28 15:02:45+00:00 4
1e8295d 2020-04-28 15:02:45+00:00 4
e86bcd1 2020-05-04 06:46:59+00:00 11
e86bcd1 2020-05-04 06:46:59+00:00 11
24ae3c5 2020-05-04 07:00:14+00:00 10
24ae3c5 2020-05-04 07:00:14+00:00 10
e089e6d 2020-05-04 22:07:36+00:00 9
e089e6d 2020-05-04 22:07:36+00:00 9
8ae0c51 2020-05-06 01:14:22+00:00 8
8ae0c51 2020-05-06 01:14:22+00:00 8
4a33ff5 2020-05-20 20:40:31+00:00 12
4a33ff5 2020-05-20 20:40:31+00:00 12
4576172 2020-05-27 10:02:27+00:00 9
4576172 2020-05-27 10:02:27+00:00 9
e6680a3 2020-05-27 16:36:03+00:00 11
e6680a3 2020-05-27 16:36:03+00:00 11
45b4dc5 2020-06-08 06:28:18+00:00 10
45b4dc5 2020-06-08 06:28:18+00:00 10
b4a670f 2020-06-22 14:58:35+00:00 10
b4a670f 2020-06-22 14:58:35+00:00 10
64fde80 2020-06-23 06:22:16+00:00 10
64fde80 2020-06-23 06:22:16+00:00 10
e95b564 2020-06-23 12:36:20+00:00 9
e95b564 2020-06-23 12:36:20+00:00 9
74ffef0 2020-06-23 17:56:04+00:00 10
74ffef0 2020-06-23 17:56:04+00:00 10
d697ef0 2020-06-25 06:50:23+00:00 12
d697ef0 2020-06-25 06:50:23+00:00 12
6de76cb 2020-08-22 08:40:19+00:00 10
6de76cb 2020-08-22 08:40:19+00:00 10
190f32b 2020-08-22 08:43:09+00:00 11
190f32b 2020-08-22 08:43:09+00:00 11
65ed520 2020-09-16 05:59:07+00:00 11
65ed520 2020-09-16 05:59:07+00:00 11
404a6f5 2020-09-21 17:30:15+00:00 9
404a6f5 2020-09-21 17:30:15+00:00 9
7163801 2020-09-23 18:58:00+00:00 7
7163801 2020-09-23 18:58:00+00:00 7
15819a8 2020-09-28 08:16:19+00:00 8
15819a8 2020-09-28 08:16:19+00:00 8
bdf8224 2020-09-29 06:28:47+00:00 11
bdf8224 2020-09-29 06:28:47+00:00 11
52cb046 2020-10-06 19:11:49+00:00 9
52cb046 2020-10-06 19:11:49+00:00 9
ef61ea0 2020-10-06 23:41:47+00:00 11
ef61ea0 2020-10-06 23:41:47+00:00 11
edc2a0b 2020-11-02 19:26:07+00:00 8
edc2a0b 2020-11-02 19:26:07+00:00 8
e987558 2020-11-05 12:46:52+00:00 11
e987558 2020-11-05 12:46:52+00:00 11
f321520 2020-11-18 16:07:15+00:00 14
f321520 2020-11-18 16:07:15+00:00 14
fe61e9d 2020-12-16 09:47:41+00:00 9
fe61e9d 2020-12-16 09:47:41+00:00 9
b324149 2020-12-23 09:21:05+00:00 9
b324149 2020-12-23 09:21:05+00:00 9
f2c33be 2020-12-28 21:16:31+00:00 9
f2c33be 2020-12-28 21:16:31+00:00 9
9697182 2020-12-28 21:27:28+00:00 9
9697182 2020-12-28 21:27:28+00:00 9
2488b7d 2021-01-19 08:34:59+00:00 9
2488b7d 2021-01-19 08:34:59+00:00 9
7d14f11 2021-01-19 09:07:58+00:00 9
7d14f11 2021-01-19 09:07:58+00:00 9
ab92898 2021-01-19 12:10:44+00:00 9
ab92898 2021-01-19 12:10:44+00:00 9
d70d894 2021-01-24 15:45:27+00:00 10
d70d894 2021-01-24 15:45:27+00:00 10
c4a6425 2021-01-25 12:24:51+00:00 11
c4a6425 2021-01-25 12:24:51+00:00 11
8a203b7 2021-01-26 10:34:29+00:00 8
8a203b7 2021-01-26 10:34:29+00:00 8
270bf6c 2021-01-26 10:45:51+00:00 9
270bf6c 2021-01-26 10:45:51+00:00 9
e32af5f 2021-03-09 18:40:28+00:00 9
e32af5f 2021-03-09 18:40:28+00:00 9
65ca2db 2021-03-09 20:24:44+00:00 8
65ca2db 2021-03-09 20:24:44+00:00 8
f2d189a 2021-03-10 14:25:33+00:00 10
f2d189a 2021-03-10 14:25:33+00:00 10
82bf21c 2021-03-11 19:39:37+00:00 10
82bf21c 2021-03-11 19:39:37+00:00 10
579eec6 2021-03-22 03:29:56+00:00 8
579eec6 2021-03-22 03:29:56+00:00 8
f81cf47 2021-03-22 08:51:40+00:00 12
f81cf47 2021-03-22 08:51:40+00:00 12
199f69f 2021-03-22 08:53:01+00:00 12
199f69f 2021-03-22 08:53:01+00:00 12
ce961a8 2021-03-22 08:55:03+00:00 11
ce961a8 2021-03-22 08:55:03+00:00 11
925ac87 2021-03-22 09:00:27+00:00 9
925ac87 2021-03-22 09:00:27+00:00 9
c36390f 2021-03-25 08:51:56+00:00 13
c36390f 2021-03-25 08:51:56+00:00 13
fd1709c 2021-03-25 09:01:57+00:00 9
fd1709c 2021-03-25 09:01:57+00:00 9
d56e601 2021-03-26 09:54:10+00:00 9
d56e601 2021-03-26 09:54:10+00:00 9
9908530 2021-03-27 17:50:44+00:00 9
9908530 2021-03-27 17:50:44+00:00 9
b822548 2021-03-27 23:17:46+00:00 12
b822548 2021-03-27 23:17:46+00:00 12
f637474 2021-03-31 20:48:27+00:00 12
f637474 2021-03-31 20:48:27+00:00 12
85d0b9c 2021-04-01 20:11:30+00:00 10
85d0b9c 2021-04-01 20:11:30+00:00 10
3683cf6 2021-04-03 13:04:50+00:00 11
3683cf6 2021-04-03 13:04:50+00:00 11
013d4b8 2021-04-04 18:45:28+00:00 11
013d4b8 2021-04-04 18:45:28+00:00 11
3b45a37 2021-04-04 22:02:42+00:00 11
3b45a37 2021-04-04 22:02:42+00:00 11
f1e4d91 2021-04-09 12:38:28+00:00 10
f1e4d91 2021-04-09 12:38:28+00:00 10
fea9cdc 2021-04-09 09:01:24-04:00 11
fea9cdc 2021-04-09 09:01:24-04:00 11
25ca675 2021-04-09 09:24:41-04:00 14
25ca675 2021-04-09 09:24:41-04:00 14
a957651 2021-04-09 17:19:40-04:00 15
a957651 2021-04-09 17:19:40-04:00 15
14ad42d 2021-04-09 17:24:29-04:00 13
14ad42d 2021-04-09 17:24:29-04:00 13

@jorisvandenbossche
Copy link
Member

I think our demo docs are a bit too small to really find the culprit.
I am currently building the pandas docs locally under a profiler, will report back here in a bit (it takes a while to build though .. ;))

@jorisvandenbossche
Copy link
Member

So I build a subset of the pandas API docs (removing the narrative user guide, as the slowdown comes from the writing phase, and it's the API docs that has many pages): https://gist.githubusercontent.com/jorisvandenbossche/f5ff72ee2eea52c30193abc2e9b5cd05/raw/bcc68040a7b3691e58828bf2dddaaed7d9866f57/profile-pandas-docs.svg

Most of the time is spent in generate_nav_html (more than 80% in this case). Digging deeper, around 30% is spent in resolve from sphinx (this has increased, because with collapse=False, the size of the toctree to resolve has become much bigger). There is certainly a significant part spent in bs4 as well (but would need to compare with lxml how much that can be reduced, however I assume that part of this is also simply due to the larger HTML size of the resulting pages that gets parsed)

@jorisvandenbossche
Copy link
Member

And the version with using the lxml parser through bs4: https://gist.githubusercontent.com/jorisvandenbossche/8aab410b0231a74d755ed54e656e5b7c/raw/8720e2df4426088483b6b775bac982cdb35bdc19/profile-pandas-docs-lxml.svg

(for a big site like pandas, this doesn't seem to make much difference)

@jorisvandenbossche
Copy link
Member

We probably want to include a configuration option like readthedocs theme has: collapse_navigation: https://sphinx-rtd-theme.readthedocs.io/en/latest/configuring.html#confval-collapse_navigation

@bollwyvl
Copy link
Collaborator

Well, if the lxml thing is a red herring (or maybe i don't have a handle on how to interpret the profiling): can more things be cached along the way? I don't know enough about the toctree data structure, but seems like it would be possible to generate The Tree, and then slice off the pieces needed per page?

@jorisvandenbossche
Copy link
Member

Yeah, I think sphinx is definitely doing a lot of duplicated effort .. However, to do that on our side might require some deeper plumbing into sphinx.

For example, a large part of the time is spent in this resolve call:

toctree = self.resolve(docname, builder, toctreenode, prune=True, **kwargs)

The toctree it it is resolving is the same in many cases, but each time (for each page), the docname is different, i.e. each call of this function is slightly different. So we can't easily "cache" it on our side.
I think the only difference between resolve calls for pages that have the same root index is that the "current" class in the HTML is tagged onto a different item in the navigation list. That's of course a tiny difference for repeating the expensive operation .. So we could maybe think about resolving it for the root index once and then add the "current" tags ourselves with some HTML bs4 manipulation.

@ChaiByte
Copy link
Contributor

ChaiByte commented May 19, 2021

Same problem. The average building time increases from less than 10min to about 15min.

writing output... [ 97%]
writing output... [ 97%]
waiting for workers...    <- that takes a long time :(   Does anybody know it is waiting for what exactly?

generating indices... genindex py-modindex done
copying notebooks ... [100%] 
highlighting module code... [100%] 
writing additional pages... search done

Seems that generating the collapsible sidebar spends a lot of time?


Updated: 15min used in my 8core 16g machine. But for GitHub Actions (Ubuntu 2core 8g),

now is over 1 hour.., https://github.com/MegEngine/Documentation/runs/2619670415

image

It's hard to accept. :(


Updated again, the build artifact now is over 1.2G, most from API HTML files, and each file has over 10000 lines.

@hawkinsp
Copy link

hawkinsp commented Feb 4, 2022

In case anyone lands here and is looking for a workaround, the instructions in https://pydata-sphinx-theme.readthedocs.io/en/latest/user_guide/configuring.html#selectively-remove-pages-from-your-sidebar helped a lot for our project that uses sphinx-book-theme (which in turn uses pydata-sphinx-theme) and has many autogenerated API docs.

(I found this issue by profiling my sphinx-build command using cprofile and identifying that generate_nav_html took a very large fraction of the total build time.)

@choldgraf
Copy link
Collaborator

Ah thanks for linking that @hawkinsp - that section was added to address this issue, so I think that we can close this one and I'll update the top comment with a link to that section

@choldgraf choldgraf changed the title The jump from 5.2 to 6.0 causes a big slow down. Slow page builds in 6.0 when generating navigation bar Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants