Skip to content

Commit

Permalink
Optimize doc builds (#14856)
Browse files Browse the repository at this point in the history
cudf docs are generally very slow to build. This problem was exacerbated by the recent addition of libcudf C++ API documentation to the Sphinx build. This PR aims to ameliorate this issue for both local and CI builds by making the following changes:
- The XML parsing logic used to clean up doxygen XML now avoids rewriting files unless they are actually modified. This prevents Sphinx from doing extra work during a second (text) build after the first (HTML) build.
- toctrees on the generated API pages are removed (see https://pydata-sphinx-theme.readthedocs.io/en/stable/user_guide/performance.html#selectively-remove-pages-from-your-sidebar).
- Text docs are disabled in PRs and only occur in nightly/branch builds.

The net result is roughly a halving of the CI run time for the builds (~40 min to ~20 min). Further potential optimizations:
- Reenabling parallel builds. We cannot fully revert #14796 until the theme is fixed, but if we can put in a warning filter we could reenable parallelism and have it work on just the reading steps of the build and not the writes. That would still improve performance.
- Better caching of notebooks. [nbsphinx supports caching](https://myst-nb.readthedocs.io/en/latest/computation/execute.html#execute-cache), but there are various caveats w.r.t. 1) local vs CI builds, 2) proper cache invalidation, e.g. when notebook source does not change but underlying libraries do, and 3) forcing rebuilds. Alternatively, we could enable some environment variable that allows devs to turn off notebook execution locally. Making it opt-in would make the default behavior safe while providing an escape hatch for power users who want the builds to be fast.

Authors:
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - AJ Schmidt (https://github.com/ajschmidt8)
  - Lawrence Mitchell (https://github.com/wence-)

URL: #14856
  • Loading branch information
vyasr authored Jan 26, 2024
1 parent a0c637f commit a41238f
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 7 deletions.
18 changes: 12 additions & 6 deletions ci/build_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,19 +41,25 @@ popd
rapids-logger "Build Python docs"
pushd docs/cudf
make dirhtml
make text
mkdir -p "${RAPIDS_DOCS_DIR}/cudf/"{html,txt}
mkdir -p "${RAPIDS_DOCS_DIR}/cudf/html"
mv build/dirhtml/* "${RAPIDS_DOCS_DIR}/cudf/html"
mv build/text/* "${RAPIDS_DOCS_DIR}/cudf/txt"
if [[ "${RAPIDS_BUILD_TYPE}" != "pull-request" ]]; then
make text
mkdir -p "${RAPIDS_DOCS_DIR}/cudf/txt"
mv build/text/* "${RAPIDS_DOCS_DIR}/cudf/txt"
fi
popd

rapids-logger "Build dask-cuDF Sphinx docs"
pushd docs/dask_cudf
make dirhtml
make text
mkdir -p "${RAPIDS_DOCS_DIR}/dask-cudf/"{html,txt}
mkdir -p "${RAPIDS_DOCS_DIR}/dask-cudf/html"
mv build/dirhtml/* "${RAPIDS_DOCS_DIR}/dask-cudf/html"
mv build/text/* "${RAPIDS_DOCS_DIR}/dask-cudf/txt"
if [[ "${RAPIDS_BUILD_TYPE}" != "pull-request" ]]; then
make text
mkdir -p "${RAPIDS_DOCS_DIR}/dask-cudf/txt"
mv build/text/* "${RAPIDS_DOCS_DIR}/dask-cudf/txt"
fi
popd

rapids-upload-docs
1 change: 1 addition & 0 deletions conda/environments/all_cuda-118_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ dependencies:
- sphinx-autobuild
- sphinx-copybutton
- sphinx-markdown-tables
- sphinx-remove-toctrees
- sphinxcontrib-websupport
- streamz
- sysroot_linux-64==2.17
Expand Down
1 change: 1 addition & 0 deletions conda/environments/all_cuda-120_arch-x86_64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ dependencies:
- sphinx-autobuild
- sphinx-copybutton
- sphinx-markdown-tables
- sphinx-remove-toctrees
- sphinxcontrib-websupport
- streamz
- sysroot_linux-64==2.17
Expand Down
1 change: 1 addition & 0 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,7 @@ dependencies:
- sphinx-autobuild
- sphinx-copybutton
- sphinx-markdown-tables
- sphinx-remove-toctrees
- sphinxcontrib-websupport
notebooks:
common:
Expand Down
13 changes: 12 additions & 1 deletion docs/cudf/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import filecmp
import glob
import os
import re
import sys
import tempfile
import xml.etree.ElementTree as ET

from docutils.nodes import Text
Expand Down Expand Up @@ -62,13 +64,16 @@ class PseudoLexer(RegexLexer):
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx_copybutton",
"sphinx_remove_toctrees",
"numpydoc",
"IPython.sphinxext.ipython_console_highlighting",
"IPython.sphinxext.ipython_directive",
"PandasCompat",
"myst_nb",
]

remove_from_toctrees = ["user_guide/api_docs/api/*"]


# Preprocess doxygen xml for compatibility with latest Breathe
def clean_definitions(root):
Expand Down Expand Up @@ -126,7 +131,13 @@ def clean_all_xml_files(path):
for fn in glob.glob(os.path.join(path, "*.xml")):
tree = ET.parse(fn)
clean_definitions(tree.getroot())
tree.write(fn)
with tempfile.NamedTemporaryFile() as tmp_fn:
tree.write(tmp_fn.name)
# Only write files that have actually changed.
if not filecmp.cmp(tmp_fn.name, fn):
tree.write(fn)




# Breathe Configuration
Expand Down

0 comments on commit a41238f

Please sign in to comment.