Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: env.toctree_includes different between parallel and serial builds #12409

Closed
larsoner opened this issue Jun 4, 2024 · 3 comments · Fixed by #12888
Closed

BUG: env.toctree_includes different between parallel and serial builds #12409

larsoner opened this issue Jun 4, 2024 · 3 comments · Fixed by #12888

Comments

@larsoner
Copy link
Contributor

larsoner commented Jun 4, 2024

Describe the bug

In SciPy we had a bug where using pydata-sphinx-theme in serial builds was fine, but in parallel builds a jinja context did some incorrect HTML replacement. The short version is that the PST function looks for the ancestors of a given page using _get_toctree_ancestors(env.toctree_includes, pagename) and decides based on the ancestors what HTML to show.

In the correct-behavior case, which always happens in serial mode and sometimes happens in parallel mode, for the reference/cluster.hierarchy page we get what you'd expect:

ancestors=['reference/cluster.hierarchy', 'reference/cluster', 'reference/index']

In the incorrect case, which happens non-deterministically in parallel builds only, we get:

ancestors=['reference/cluster.hierarchy', 'reference/cluster', 'reference/index', 'dev/contributor/contributor_toc', 'dev/index']

You can see the addition of 'dev/contributor/contributor_toc' -- this isn't entirely unreasonable because reference/index is added to a hidden toctree of that page. But it does deviate from the serial behavior to have those in there.

I wonder if it's perhaps due to when the forked reading subprocesses return, and which env gets merged into which (and what takes precedence). There is also some setdefault code that could be involved. I'm not sure 🤷

How to Reproduce

After following SciPy's dev docs:

python dev.py doc -j6

or if you don't want it to build but have SciPy and pydata-sphinx-theme installed, clone SciPy main (before scipy/scipy#20897 gets merged, like on scipy/scipy@9d543ae) and then enter the doc dir and do:

python -msphinx -WT --keep-going -b html -d build/doctrees -E source build/html -j6

it will complain about bad refs and stuff but then if you inspect the output you might see the incorrect behavior (left) or the correct behavior (right) 🎲 :

Incorrect Correct
Screenshot 2024-06-04 at 12 31 30 PM Screenshot 2024-06-04 at 12 16 30 PM

Environment Information

Tried on latest `master` and Sphinx 5.3. Replicated on macOS but originally seen on Linux in CIs. Tried with Python 3.11 and 3.12, etc.

Sphinx extensions

I commented out a few of them to speed up the build but ultimately used:

    'sphinx.ext.autodoc',
    'sphinx.ext.autosummary',
    'sphinx.ext.coverage',
    'sphinx.ext.mathjax',
    'sphinx.ext.intersphinx',
    'numpydoc',
    'sphinx_design',
    'doi_role',
    'matplotlib.sphinxext.plot_directive',

Additional context

No response

@picnixz
Copy link
Member

picnixz commented Jun 5, 2024

Tried on latest master and Sphinx 5.3.

So the bug has been since 5.3? I'm really surprised that no one noticed... Just wondering but.. what happens if you try using Python 3.9 and Python 3.10? If the same happens, I think we might have an issue with the merge environment process. By the way, does it also happen using another theme?

I'd like a MWE (with at least 10 documents because otherwise you cannot really trigger the parallel build) which does not use any custom extensions so that I can know which are the side-effects.

@drammock
Copy link

drammock commented Jun 6, 2024

So the bug has been since 5.3?

5.3 and latest master are the only sphinx versions we tested (5.3 is lowest supported by pydata sphinx theme). Problem may occur with even earlier sphinx versions, we don't know.

what happens if you try using Python 3.9 and Python 3.10

I've repro'd it locally with 3.9 before, never expressly tried 3.10 I don't think

does it also happen using another theme?

Not expected to occur with other themes, because as mentioned:

the PST function looks for the ancestors of a given page using _get_toctree_ancestors(env.toctree_includes, pagename) and decides based on the ancestors what HTML to show.

... and most themes don't do that (at least none that I'm aware of). This is also probably why nobody noticed the problem before. Indeed so far it is only SciPy that has reported this behavior, most likely because their docs site is quite large; many other smaller sites using the PyData theme build in parallel and haven't encountered this.

@jayaddison
Copy link
Contributor

Adding a note to mention that I'm currently investigating a very similar behaviour -- perhaps a duplicate -- in #6714.

jayaddison added a commit to jayaddison/sphinx that referenced this issue Sep 11, 2024
Bugreports sphinx-doc#6714 and sphinx-doc#12409 indicate that the table-of-contents
collection process is not currently implemented in a way that
guarantees deterministic resolution in the presence of parallelism.

Until such time as we can implement that, disable parallel reads
for the `TocTreeCollector` so that the tree is resolved serially.
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.