From 3182a379bd1f5d6ad54266f34b97418fd0d3aac3 Mon Sep 17 00:00:00 2001 From: Claudio Wunder Date: Tue, 1 Nov 2022 18:39:40 +0100 Subject: [PATCH] tools: add documentation regarding our api tooling Introduces a proper imperative description of how the current API documentation build system works. Refs: https://github.com/nodejs/next-10/issues/169 --- doc/contributing/node-api-tooling.md | 179 +++++++++++++++++++++++++++ 1 file changed, 179 insertions(+) create mode 100644 doc/contributing/node-api-tooling.md diff --git a/doc/contributing/node-api-tooling.md b/doc/contributing/node-api-tooling.md new file mode 100644 index 00000000000000..fa45fca324de5c --- /dev/null +++ b/doc/contributing/node-api-tooling.md @@ -0,0 +1,179 @@ +# Node.js's API Documentation Tooling + +Node.js API documentation are generated by an in-house tooling that resides within the `tools/doc` directory. This tooling is composed of a few different pieces: + +1. The entry-point being `tools/doc/generate.js`. +1. The tooling supports a few CLI arguments that is listed in a table below. +1. The tooling processes one file at a time. +1. The tooling uses a set of dependencies as described in the dependencies section. +1. The tooling parses the input file and does several transformations to the AST (Abstract Syntax Tree). +1. The tooling generates a JSON output that contains the metadata and content of the Markdown file +1. The tooling generates a HTML output that contains a human-readable and ready to-view version of the file. + +These are the summarised steps of the API tooling. On the following sections we're diving deep what each part does, means and how it works. + +This documentation mainly serves the purpose of explaining the existing tooling processes, to allow an easier maintenance and evolution of the tooling. It is not meant to be a guide on how to write documentation for Node.js. + +#### Vocabulary & Good to Know's + +- AST means "Abstract Syntax Tree" and it is a data structure that represents the structure of a certain data format. In our case, the AST is a "graph" representation of the contents of the Markdown file. +- MDN means "Mozilla Developer Network" and it is a website that contains documentation for web technologies. We use it as a reference for the structure of the documentation. +- "Stability Index" is an internal concept for the Stability of a given Node.js module. There's no concrete documentation of the "Stability Index" but it goes as: + - Stability 1: Experimental. (This module is Experimental) + - Stability 0: Deprecated. (This module is Deprecated) + - Stability 2: Stable. (This module is Stable) + - Stability 3: Legacy. (This module is Legacy) +- Within Remark YAML snippets `` are considered HTML nodes, that's because YAML isn't valid Markdown content. (Doesn't abide by the Markdown spec) +- "New Tooling" references to the (written from-scratch) API build tooling introduced in `nodejs/nodejs.dev` whose might replace the current one from `nodejs/node` + +## CLI Arguments + +The tooling requires a main `filename` argument and supports extra arguments (some also required) as shown below: + +| Argument | Description | Required | Example | +| -------- | ----------- | -------- | ------- | +| `--node-version=` | The version of Node.js that is being documented. It defaults to `process.version` which is supplied by Node.js itself | No | v19.0.0 | +| `--output-directory=` | The directory where the output files will be generated. | Yes | `./out/api/` | +| `--apilinks=` | This file is used as an index to specify the source file for eac module | No | `./out/doc/api/apilinks.json` | +| `--versions-file=` | This file is used to specify an index of all previous versions of Node.js. It is used for the Version Navigation on the API docs page. | No | `./out/previous-doc-versions.json` | + +**Note.:** the `apilinks` parameter is generated by the Node.js build process (Makefile). And it is a file containing a JSON object. + +**Note.:** the `versions-file` parameter is generated by the Node.js build process (Makefile). And it is a file containing a JSON object. + +### Basic Usage + +```bash +npm run node-doc-generator ${filename} +``` + +## Dependencies and how the Tooling works internally + +The API tooling currently uses an-AST-alike library for processing the Input file as a Graph that supports easy modification and update of its nodes. [unified](https://github.com/unifiedjs/unified) is the current used library for the AST manipulation. Besides of `unified` we also use other libraries on top to help parsing the Markdown back and forth (Parsing from Markdown to X and from X to Markdown). + +For this process we us [Remark](https://github.com/remarkjs/remark) for manipulating the Markdown part, and [Rehype](https://github.com/rehypejs/rehype) for Manipulating the HTML part. + +### What are the steps of the internal tooling? + +The tooling uses `unified` pipe-alike engine to pipe each part of the process. (The description below is a simplified version) + +- Starting from reading the Frontmatter section of the Markdown file with [remark-frontmatter](https://www.npmjs.com/package/remark-frontmatter). +- Then the tooling goes to parse the Markdown by using `remark-parse` and adds support to [GitHub Flavoured Markdown](https://github.github.com/gfm/) +- The tooling proceeds by parsing some of the Markdown nodes and transforming them to HTML +- The tooling proceeds to generate the JSON output of the file +- Finally it does its final node transformations and generates a stringified HTML. +- It then stores the output to a JSON file and adds extra styling to the HTML and then stores the HTML file. + +### What each file is responsible for? + +The files listed below are the ones referenced and actually used during the build process of the API docs as we see on https://nodejs.org/api. The remaining files from the directory might be used by other steps of the Node.js Makefile or might even be deprecated/remnant of old processes and might need to be revisited/removed. + +- **`html.mjs`**: Responsible for transforming nodes by decorating them with visual artifacts for the HTML pages; + - For example, transforming man or JS doc references to links correctly referring to respective External documentation. +- **`json.mjs`**: Responsible for generating the JSON output of the file; + - It is mostly responsible for going through the whole Markdown file and generating a JSON object that represent the Metadata of a specific Module. + - For example, for the FS module, it will generate an object with all its methods, events, classes and use several regular expressions (ReGeX) for extracting the information needed. +- **`generate.mjs`**: Main entry-point of doc generation for a specific file. It does e2e processing of a documentation file; +- **`allhtml.mjs`**: A script executed after all files are generated to create a single "all" page containing all the HTML documentation; +- **`alljson.mjs`**: A script executed after all files are generated to create a single "all" page containing all the JSON entries; +- **`markdown.mjs`**: Basically contains one utility to replace Markdown links to work with the https://nodejs.org/api/ website. +- **`common.mjs`**: Contains a few utility functions that are used by the other files. +- **`type-parser.mjs`**: Used to replace "type references" (e.g. "String", or "Buffer") to the correct Internal/External documentation pages (i.e. MDN or other Node.js documentation pages). + +**Note.:** It is important to mention that other files not mentioned here might be used during the process but are not relevant to the generation of the API docs themselves. You will notice that a lot of the logic within the build process is **specific** to the current https://nodejs.org/api/ infrastructure. Just as adding some JavaScript snippets, styles, transforming certain Markdown elements into HTML, and adding certain HTML classes or such things. + +**Note.:** Regarding the previous **Note** it is important to mention that we're currently working on an API tooling that is generic and independent of the current Nodejs.org Infrastructure. [The new tooling that is functional is available at the nodejs.dev repository](https://github.com/nodejs/nodejs.dev/blob/main/scripts/syncApiDocs.js) and uses plain ReGeX (No AST) and [MDX](https://mdxjs.com/). + +## The Build Process + +The build process that happens on `generate.mjs` follows the steps below: + +- Links within the Markdown are replaced directly within the source Markdown (AST) (`markdown.replaceLinks`) + - This happens within `markdown.mjs` and basically it adds suffixes or modifies link references within the Markdown + - This is necessary for the `https://nodejs.org` infrastructure as all pages are suffixed with `.html` +- Text (and some YAML) Nodes are transformed/modified through `html.preprocessText` +- JSON output is generated through `json.jsonAPI` +- The title of the page is inferred through `html.firstHeader` +- Nodes are transformed into HTML Elements through `html.preprocessElements` +- The HTML Table of Contents (ToC) is generated through `html.buildToc` + +### `html.mjs` + +This file is responsible for doing Node AST transformations that either update Markdown nodes to decorate them with more data or transform them into HTML Nodes that attain a certain visual responsibility; For example, to generate the "Added at" label, or the Source Links or the Stability Index, or the History table. + +**Note.:** Methods not listed below are either not relevant or utility methods for string/array/object manipulation (e.g.: are used by the other methods mentiond beloew). + +#### `preprocessText` + +**New Tooling:** Most of the features within this method are available within the new tooling. + +This method does two things: + +- Replaces the Source Link YAML entry `<-- source_link= -->` into a the "Source Link" HTML ancor element. +- Replace type references within the Markdown (text) (i.e.: "String", "Buffer") into the correct HTML ancor element that links to the correct documentation page. + - The original node then gets mutated from text to HTML. + - It also updates references to Linux "MAN" pages tom actual Web versions of them. + +#### `firstHeader` + +**New Tooling:** All features within this method are available within the new Tooling. + +Is used to attempt to extract the first heading of the page (recursively) to define the "title" of the page. + +**Note.:** As all API Markdown files start with a Heading, this could possibly be improved to a reduced complexity. + +#### `preprocessElements` + +**New Tooling:** All features within this method are available within the new tooling. + +This method is responsible for doing multiple transformation within the AST Nodes, in majority, transforming the source Node in respective HTML elements with diverse responsibilities, such as: + +- Updating Markdown `code` blocks by adding Language highlighting + - It also adds the "CJS"/"MJS" switch to Nodes that are followed by their CJS/ESM equivalents. +- Increasing the Heading level of each Heading +- Parses YAML blocks and transforms them into HTML elements (See more at the `parseYAML` method) +- Updates BlockQuotes that are prefixed by the "Stability" word into a Stability Index HTML element. + +#### `parseYAML` + +**New Tooling:** Most of the features within this method are available within the new tooling. + +This method is responsible for parsing the `<--YAML snippets -->` and transforming them into HTML elements. + +It follows a certain kind of "schema" that basically constitues in the following options: + +| YAML Key | Description | Example | Example Result | Available on new tooling | +| -------- | ----------- | ------- | -------------- | ------------------------ | +| `added` | It's used to reference when a certain "module", "class" or "method" was added on Node.js | `added: v0.1.90` | `Added in: v0.1.90` | Yes | +| `deprecated` | It's used to reference when a certain "module", "class" or "method" was deprecated on Node.js | `deprecated: v0.1.90` | `Deprecated since: v0.1.90` | Yes | +| `removed` | It's used to reference when a certain "module", "class" or "method" was removed on Node.js | `removed: v0.1.90` | `Removed in: v0.1.90` | No | +| `changes` | It's used to describe all the changes (historical ones) that happened within a certain "module", "class" or "method" in Node.js | `[{ version: v0.1.90, pr-url: '', description: '' }]` | -- | Yes | +| `napiVersion` | It's used to describe in which version of the N-API this "module", "class" or "method" is available within Node.js | `napiVersion: 1` | `N-API version: 1` | Yes | + +**Note.:** The `changes` field gets preprended with the `added`, `deprecated` and `removed` fields if they exist. The table only gets generated if a `changes` field exists. In the new tooling only "added" is prepended for now. + +#### `buildToc` + +**New Tooling:** This feature is natively available within the new tooling through MDX. + +This method generates the Table of Contents based on all the Headings of the Markdown file. + +#### `altDocs` + +**New Tooling:** All features within this method are available within the new tooling. + +This method generates a version picker for the current page to be shown in older versions of the API docs. + +### `json.mjs` + +This file is responsible for generating a JSON object that (supposedly) is used for IDE-Intellisense or for indexing of all the "methods", "classes", "modules", "events", "constants" and "globals" available within a certain Markdown file. + +It attempts a best effort extraction of the data by using several regular expression patterns (ReGeX). + +**Note.:** JSON output generation is currently not supported by the new tooling, but it is in the pipeline for development. + +#### `jsonAPI` + +This method traverses all the AST Nodes by iterating each one of them and infers through ReGeX what kind of information each Node contains. It does then mutate the data and appends it to the final JSON object. + +For a more in-depth information we recommend to refer to the `json.mjs` file as it contains a lot of comments.