Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign How Autodoc Works #19208

Merged
merged 25 commits into from
Mar 11, 2024
Merged

Redesign How Autodoc Works #19208

merged 25 commits into from
Mar 11, 2024

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Mar 7, 2024

This branch deletes the Autodoc implementation and replaces it with a new one.

High Level Strategy

The old implementation looked like this:

  5987 src/Autodoc.zig
   435 src/autodoc/render_source.zig
 10270 lib/docs/commonmark.js
  1245 lib/docs/index.html
  5242 lib/docs/main.js
  2146 lib/docs/ziglexer.js
 25325 total

After compilation (sizes are for standard library documentation):

272K commonmark.js
3.8M data-astNodes.js
360K data-calls.js
767K data-comptimeExprs.js
2.2M data-decls.js
896K data-exprs.js
 13K data-files.js
  45 data-guideSections.js
 129 data-modules.js
  15 data-rootMod.js
 294 data-typeKinds.js
3.2M data-types.js
 38K index.html
158K main.js
 36M src/ (470 .zig.html files)
 78K ziglexer.js

Total output size: 47M (5.7M gzipped)

src/Autodoc.zig processed ZIR code, outputting JSON data for a web application to consume. This resulted in a lot of code ineffectively trying to reconstruct the AST from no-longer-available data.

lib/docs/commonmark.js was a third-party markdown implementation that supported too many features; for example I don't want it to be possible to have HTML tags in doc comments, because that would make source code uglier. Only markdown that looks good both as source and rendered should be allowed.

lib/docs/ziglexer.js was an implementation of Zig language tokenization in JavaScript, despite Zig already exposing its own tokenizer in the standard library. When I saw this added to the zig project, a little part of me died inside.

src/autodoc/render_source.zig was a tool that converted .zig files to a syntax-highlighted but non-interactive .zig.html files.

The new implementation looks like this:

   942 lib/docs/main.js
   403 lib/docs/index.html
   933 lib/docs/wasm/markdown.zig
   226 lib/docs/wasm/Decl.zig
  1500 lib/docs/wasm/markdown/Parser.zig
   254 lib/docs/wasm/markdown/renderer.zig
   192 lib/docs/wasm/markdown/Document.zig
   941 lib/docs/wasm/main.zig
  1038 lib/docs/wasm/Walk.zig
  6630 total

After compilation (sizes are for standard library documentation):

 12K index.html
 32K main.js
192K main.wasm
 12M sources.tar

Total output size: 12M (2.3M gzipped)

As you can see, it is both dramatically simpler in terms of implementation as well as build artifacts. Now there are exactly 4 files instead of approximately one gajillion, with a 4x reduction in total file size of the generated web app.

However, not only is it simpler, it's actually more powerful than the old system, because instead of processing ZIR, this system processes the source files directly, meaning it has 100% of the information and never needs to piece anything together backwards.

This strategy uses a WebAssembly module written in Zig. This allows it to reuse components from the compiler, such as the tokenizer, parser, and other utilities for operating on Zig code.

The sources.tar file, after being decompressed by the HTTP layer, is fed directly into the wasm module's memory. The tar file is parsed using std.tar and source files are parsed in place, with some additional computations added to hash tables on the side.

There is room for introducing worker threads to speed up the parsing, although single-threaded it's already so fast that it doesn't really seem necessary.

Zig Installation

Before this branch, a Zig installation comes with a docs/std/ directory that contains those 47M of output artifacts mentioned above.

This branch removes those artifacts from Zig installations, instead offering the zig std command, which hosts std lib autodocs and spawns a browser window to view them. When this command is activated, lib/compiler/std-docs.zig is compiled from source to perform this operation (#19063).

The HTTP server creates the requested files on the fly, including rebuilding main.wasm if any of its source files changed, and constructing sources.tar, meaning that any source changes to the documented files, or to the autodoc system itself are immediately reflected when viewing docs. Prefixing the URL with /debug results in a debug build of the WebAssembly module.

This means contributors can test changes to Zig standard library documentation, as well as autodocs functionality, by pressing refresh in their browser window.

In total, the Zig installation size is reduced from 317M to 268M (-15%).

Time to Build the Compiler

Since many lines were deleted from the compiler, we might hope for it to compile faster.

Benchmark 1 (3 runs): before/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          86.8s  ± 3.28s     84.1s  … 90.5s           0 ( 0%)        0%
  peak_rss           4.58GB ±  492KB    4.58GB … 4.58GB          0 ( 0%)        0%
  cpu_cycles          350G  ± 1.99G      348G  …  352G           0 ( 0%)        0%
  instructions        505G  ±  205M      505G  …  506G           0 ( 0%)        0%
  cache_references   21.4G  ±  128M     21.3G  … 21.5G           0 ( 0%)        0%
  cache_misses       1.76G  ± 15.3M     1.75G  … 1.78G           0 ( 0%)        0%
  branch_misses      2.43G  ± 2.19M     2.43G  … 2.43G           0 ( 0%)        0%
Benchmark 2 (3 runs): after/zig build-exe ...
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          85.9s  ± 3.63s     82.8s  … 89.9s           0 ( 0%)          -  1.1% ±  9.0%
  peak_rss           4.51GB ±  259KB    4.51GB … 4.51GB          0 ( 0%)        ⚡-  1.5% ±  0.0%
  cpu_cycles          346G  ± 2.29G      343G  …  347G           0 ( 0%)          -  1.2% ±  1.4%
  instructions        499G  ±  185M      498G  …  499G           0 ( 0%)        ⚡-  1.3% ±  0.1%
  cache_references   21.0G  ±  209M     20.8G  … 21.2G           0 ( 0%)          -  1.9% ±  1.8%
  cache_misses       1.73G  ± 16.9M     1.71G  … 1.75G           0 ( 0%)          -  1.9% ±  2.1%
  branch_misses      2.41G  ± 2.16M     2.41G  … 2.41G           0 ( 0%)          -  0.7% ±  0.2%

Not much difference here.

A ReleaseSmall build of the compiler shrinks from 10M to 9.8M (-1%).

Time to Build Autodocs

Autodocs generation is now done properly as part of the pipeline of the compiler rather than tacked on at the end. It also no longer has any dependencies on other parts of the pipeline.

This is how long it now takes to generate standard library documentation:

Benchmark 1 (3 runs): old/zig test /home/andy/dev/zig/lib/std/std.zig -fno-emit-bin -femit-docs=docs
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          13.3s  ±  405ms    12.8s  … 13.6s           0 ( 0%)        0%
  peak_rss           1.08GB ±  463KB    1.08GB … 1.08GB          0 ( 0%)        0%
  cpu_cycles         54.8G  ±  878M     54.3G  … 55.8G           0 ( 0%)        0%
  instructions        106G  ±  313K      106G  …  106G           0 ( 0%)        0%
  cache_references   2.11G  ± 35.4M     2.07G  … 2.14G           0 ( 0%)        0%
  cache_misses       41.3M  ±  455K     40.8M  … 41.7M           0 ( 0%)        0%
  branch_misses       116M  ± 67.8K      116M  …  116M           0 ( 0%)        0%
Benchmark 2 (197 runs): new/zig build-obj -fno-emit-bin -femit-docs=docs ../lib/std/std.zig
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          24.6ms ± 1.03ms    22.8ms … 28.3ms          4 ( 2%)        ⚡- 99.8% ±  0.3%
  peak_rss           87.3MB ± 60.6KB    87.2MB … 87.4MB          0 ( 0%)        ⚡- 91.9% ±  0.0%
  cpu_cycles         38.4M  ±  903K     37.4M  … 46.1M          13 ( 7%)        ⚡- 99.9% ±  0.2%
  instructions       39.7M  ± 12.4K     39.7M  … 39.8M           0 ( 0%)        ⚡-100.0% ±  0.0%
  cache_references   2.65M  ± 89.1K     2.54M  … 3.43M           3 ( 2%)        ⚡- 99.9% ±  0.2%
  cache_misses        197K  ± 5.71K      186K  …  209K           0 ( 0%)        ⚡- 99.5% ±  0.1%
  branch_misses       184K  ± 1.97K      178K  …  190K           6 ( 3%)        ⚡- 99.8% ±  0.0%

Regressed Features

  • Guides
    • I don't want to port the langref to a guide. I think that should remain a separate document.
    • I think there is room for guides to be added back to this system - likely they will actually work better since there is now support for parsing and linkifying arbitrary code.

New Features

Reliable Linkification

This stems from the fact that with full source files we have all the information, and can write more robust code to look up identifiers from the context they occur in.

Interactive Source Listings

Press u to go to source code for any declaration:

image

The links take you to the API page for that specific link by changing the location hash.

Embedded Source Listings

image

Search Includes Doc Comments

Pretty straightforward. The current autodoc seems to not support this for some reason.

image

Planning to also add struct field names, struct field docs, parameter names, and parameter docs to this.

Error Set View

Merged error sets are detected:

image

Errors that come from other declarations are linked:

image

Errors are also shown on function view:

image

Correct Type Detection

image

Previous implementation guesses wrong on the type of options as well as DynLib.

Correct Implementation of Scroll History

See andrewrk/autodoc@6d96a63

Follow-Up Work

I do not consider these to be merge blockers.

  • make the panic handler reflect the failure in the user interface
  • when navigating back to search results, up+down arrow should keep working
  • redundant search results (search "format")
  • in query_exec_fallible, sorting should also check the local namespace inside the file
  • walk assign_destructure not implemented yet
  • escape URLs when rendering html (look for missing_feature_url_escape)
  • implement renderHome for multiple modules
  • struct fields: render each component separate rather than via source rendering
  • infer comptime_int constants (example: members of #std.time)
  • when global const has a type of type, categorize it as a type despite its value
  • show abbreviated doc comments in types and namespaces listings
  • show type function names as e.g. ArrayList(T)
  • enum fields should not be linkified (example: std.log.Level)
  • shrink Ast to fit the slices
  • linkification of methods (example: std.array_hash_map.ArrayHashMap.count)
  • navigating to source from a decl should scroll to the decl
  • in source view, make @imports into links, but keep same syntax highlighting
  • include struct field names and doc comments in search query matching
  • include function parameter names and doc comments in search query matching
  • instead of logging "can't index foo because it has syntax errors" put it in the UI
  • in Walk.expr() it is missing support for asm_input/asm_output nodes
  • in renderNamespace, handle an aliasing loop
  • add a history item when clicking a search result (it already works when keyboard triggered)
  • instead of "declaration not found", show the decl that can't be penetrated (example: #std.os.system.fd_t)
  • when rendering source code, better handle indentation (example: #std.array_hash_map.ArrayHashMapUnmanaged.count)

closes #3403
closes #13512
closes #15865
closes #16490
closes #16728
closes #16741
closes #16763
closes #16898
closes #17061

@expikr
Copy link
Contributor

expikr commented Mar 7, 2024

does the interactive source listings also close #18587 ?

@Jarred-Sumner
Copy link
Contributor

reliable linkification

Does that mean it no longer uses relies on location.hash ("#/std/fs/File/) and instead uses pathnames for generated documentation pages? This would let Google display relevant search results for Zig's standard library documentation for searches like "read file zig" or "close file zig" or "zig http client"

If it is still using #, there're ways to fix that without losing offline access, without always needing a server to host the files, and without depending on another tool or framework.

You can use a Service Worker to override the "fetch" event and then when a request is made to a page like /std/fs/File you can use the Cache API to retrieve generated HTML for the page and store it. Then, Zig developers will be able to go to https://ziglang.org/documentation/master/std/fs/File/write in their browser

For indexing, separately, you can have a cron job run ~daily that generates the static versions of all the HTML pages and saves the output. Can probably reuse the code from the service worker to do this, as it's a similar operation

@andrewrk
Copy link
Member Author

Does that mean it no longer uses relies on location.hash ("#/std/fs/File/) and instead uses pathnames for generated documentation pages?

No, it's still a location.hash based system.

You can [do a bunch of stuff]

What's the problem statement?

@zraineri
Copy link
Contributor

Having the generated urls be pathnames instead of the hash structure allows for better Google etc. search

Generating

https://ziglang.org/documentation/master/std/fs/File/write

instead of

https://ziglang.org/documentation/master/std/#A;std:fs.File.write

would be ideal

@RossComputerGuy
Copy link
Sponsor Contributor

Generating

https://ziglang.org/documentation/master/std/fs/File/write

instead of

https://ziglang.org/documentation/master/std/#A;std:fs.File.write

Another note about this is it also makes search bar suggestions based on history a little better. Ime, FF tends to not push results if hash based routing is used but it does push more results when path based routing is used.

upstream commit 1f921d540e1a8bb40839be30239019c820eb663d

after this branch is merged, ziglang/zig becomes the new repository for
this code.
A lot of these "shorthand" doc comments were redundant, low quality
filler content. Better to let the actual modules speak for themselves
with top level doc comments rather than trying to document their
aliases.
Fixes a merge conflict with one of mlugg's recent branches.
@andrewrk
Copy link
Member Author

Here is a link to play with the new implementation: https://andrewkelley.me/temp/autodoc-preview-1397342/index.html

@rpkak
Copy link

rpkak commented Mar 11, 2024

Probably unwanted behavior:

std.mem.doNotOptimizeAway includes links to std.mem.builtin.zig_backend, but because std.mem.builtin.zig_backend is part of std.mem.builtin, which is @import("builtin"), a "Declaration not found." page is displayed.

@andrewrk
Copy link
Member Author

that corresponds to this follow-up issue:

instead of "declaration not found", show the decl that can't be penetrated (example: #std.os.system.fd_t)

@andrewrk andrewrk merged commit d0c06ca into master Mar 11, 2024
10 checks passed
@andrewrk andrewrk deleted the rework-autodoc branch March 11, 2024 08:37
@andrewrk andrewrk added release notes This PR should be mentioned in the release notes. autodoc The web application for interactive documentation and generation of its assets. labels Mar 11, 2024
@andrewrk andrewrk added this to the 0.12.0 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment