Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse .-prefixed reference identifiers (relative identifiers) #37

Closed
wants to merge 3 commits into from

Conversation

lmmx
Copy link

@lmmx lmmx commented Jan 14, 2024

This PR intends to add support for relative references

class A:
    """This is a class [A][.] which has a method [m][.m]."""
    def m(self) -> None:
        """This is the method [m][.] on the class [A][..]."""
        return
  • docs: Document precisely what evalId produces in both cases, stripping off the square brackets
  • style: Added a line-length config in a tool.ruff block in pyproject.toml so ruff-lsp will not auto-lint the lines incorrectly while editing [clash with pre-commit duty rule]

Testing

I have set up a Vercel deployment to test the effect of the proposal as I work on it, using a branch on a Vercel/mkdocs "hello world" type demo repo:

@lmmx
Copy link
Author

lmmx commented Jan 14, 2024

As suggested, it looks like we can get the reference from autorefs and thus be language agnostic.

An initial debugging session with the code has highlighted the existence of AutorefsPlugin._url_map which is used by _get_item_url which provides the url variable in the get_item_url method which is executed in the url_mapper partial that gets passed to fix_refs to carry out the work of applying the autorefs to the output (which is the HTML page string that was preprocessed by the evalID -> handleMatch -> makeTag routine in references.py which populates the all-important span tag for each link with a data-autorefs-identifier attribute bearing the identifier in question, which we wish to permit to begin with a . character to indicate a relative identifier)

Click to show a pysnooper.snoop debug dump of the relevant source code

Note: pdmdoc is an alias for pdm run mkdocs serve

(autoref-docs) louis 🌟 ~/lab/docs/autoref-test $ pdmdoc -a localhost:8001 
INFO    -  Building documentation...
INFO    -  Cleaning site directory
Source path:... /home/louis/dev/autorefs/src/mkdocs_autorefs/plugin.py
Starting var:.. self = <mkdocs_autorefs.plugin.AutorefsPlugin object at 0x7fe2750e4eb0>
Starting var:.. output = '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
Starting var:.. page = Page(title='Home', url='/')
Starting var:.. kwargs = {'config': {'config_file_path': '/home/louis/lab...absolute_links': 20, 'unrecognized_links': 20}}}}
16:06:23.371158 call       203     def on_post_page(
16:06:23.371416 line       226         log.debug(f"Fixing references in page {page.file.src_path}")
16:06:23.371673 line       228         url_mapper = functools.partial(
16:06:23.371825 line       229             self.get_item_url, from_url=page.url, fallback=self.get_fallback_anchor
16:06:23.371957 line       228         url_mapper = functools.partial(
New var:....... url_mapper = functools.partial(<bound method AutorefsPlugin.g...andlers.base.Handlers object at 0x7fe2750e4e20>>)
16:06:23.372166 line       231         fixed_output, unmapped = fix_refs(output, url_mapper)
New var:....... fixed_output = '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
New var:....... unmapped = []
16:06:23.372318 line       233         if unmapped and log.isEnabledFor(logging.WARNING):
16:06:23.372569 line       240         return fixed_output
16:06:23.372830 return     240         return fixed_output
Return value:.. '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
Elapsed time: 00:00:00.001986
Starting var:.. self = <mkdocs_autorefs.plugin.AutorefsPlugin object at 0x7fe2750e4eb0>
Starting var:.. output = '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
Starting var:.. page = Page(title='APIs', url='/api/')
Starting var:.. kwargs = {'config': {'config_file_path': '/home/louis/lab...absolute_links': 20, 'unrecognized_links': 20}}}}
16:06:23.374654 call       203     def on_post_page(
16:06:23.374821 line       226         log.debug(f"Fixing references in page {page.file.src_path}")
16:06:23.375039 line       228         url_mapper = functools.partial(
16:06:23.375212 line       229             self.get_item_url, from_url=page.url, fallback=self.get_fallback_anchor
16:06:23.375376 line       228         url_mapper = functools.partial(
New var:....... url_mapper = functools.partial(<bound method AutorefsPlugin.g...andlers.base.Handlers object at 0x7fe2750e4e20>>)
16:06:23.375526 line       231         fixed_output, unmapped = fix_refs(output, url_mapper)
New var:....... fixed_output = '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
New var:....... unmapped = []
16:06:23.375715 line       233         if unmapped and log.isEnabledFor(logging.WARNING):
16:06:23.375965 line       240         return fixed_output
16:06:23.376203 return     240         return fixed_output
Return value:.. '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
Elapsed time: 00:00:00.001890
Starting var:.. self = <mkdocs_autorefs.plugin.AutorefsPlugin object at 0x7fe2750e4eb0>
Starting var:.. output = '\n<!doctype html>\n<html lang="en" class="no-js....js"></script>\n      \n    \n  </body>\n</html>'
Starting var:.. page = Page(title='Foo', url='/api/foo/')
Starting var:.. kwargs = {'config': {'config_file_path': '/home/louis/lab...absolute_links': 20, 'unrecognized_links': 20}}}}
16:06:23.381089 call       203     def on_post_page(
16:06:23.381283 line       226         log.debug(f"Fixing references in page {page.file.src_path}")
16:06:23.381513 line       228         url_mapper = functools.partial(
16:06:23.381690 line       229             self.get_item_url, from_url=page.url, fallback=self.get_fallback_anchor
16:06:23.381873 line       228         url_mapper = functools.partial(
New var:....... url_mapper = functools.partial(<bound method AutorefsPlugin.g...andlers.base.Handlers object at 0x7fe2750e4e20>>)
16:06:23.382057 line       231         fixed_output, unmapped = fix_refs(output, url_mapper)
    Source path:... /home/louis/dev/autorefs/src/mkdocs_autorefs/plugin.py
    Starting var:.. self = <mkdocs_autorefs.plugin.AutorefsPlugin object at 0x7fe2750e4eb0>
    Starting var:.. identifier = '.x'
    Starting var:.. from_url = 'api/foo/'
    Starting var:.. fallback = <bound method Handlers.get_anchors of <mkdocstri...handlers.base.Handlers object at 0x7fe2750e4e20>>
    16:06:23.382299 call       105     def get_item_url(
    16:06:23.382355 line       121         url = self._get_item_url(identifier, fallback)
> /home/louis/dev/autorefs/src/mkdocs_autorefs/plugin.py(90)_get_item_url()
-> try:
(Pdb) p identifier
'.x'
(Pdb) p fallback
<bound method Handlers.get_anchors of <mkdocstrings.handlers.base.Handlers object at 0x7fe2750e4e20>>
(Pdb) n
> /home/louis/dev/autorefs/src/mkdocs_autorefs/plugin.py(91)_get_item_url()
-> return self._url_map[identifier]
(Pdb) p self._url_map
{'foo': 'api/foo/#foo', 'foo.main': 'api/foo/#foo.main', 'foo.main.A': 'api/foo/#foo.main.A', 'foo.main.A.x': 'api/foo/#foo.main.A.x', 'foo.main.A.y': 'api/foo/#foo.main.A.y'}

@lmmx
Copy link
Author

lmmx commented Jan 14, 2024

I presume that the right place to 'intervene' (process the relative path) is here, but don't want to rush in with a thoughtless implementation:

def _get_item_url(
self,
identifier: str,
fallback: Callable[[str], Sequence[str]] | None = None,
) -> str:
try:
return self._url_map[identifier]
except KeyError:
if identifier in self._abs_url_map:
return self._abs_url_map[identifier]
if fallback:
new_identifiers = fallback(identifier)
for new_identifier in new_identifiers:
with contextlib.suppress(KeyError):
url = self._get_item_url(new_identifier)
self._url_map[identifier] = url
return url
raise

e.g. in my example above the _url_map was

(Pdb) pp self._url_map
{'foo': 'api/foo/#foo',
 'foo.main': 'api/foo/#foo.main',
 'foo.main.A': 'api/foo/#foo.main.A',
 'foo.main.A.x': 'api/foo/#foo.main.A.x',
 'foo.main.A.y': 'api/foo/#foo.main.A.y'}

Looking at it, we don't have the information available to compute a relative path here: we would need to have already expanded it at an earlier stage into a full identifier from the _url_map. So I don't think the _get_item_url is the place to de-relativise the identifier (let's just say "expand" the relative identifier).

We could however use a lookup (to save time for repeated operations) if we only target the relative prefix (as we are only using one or more dots), and then we could just do a simple replacement.

What we're looking for would be something like this:

{1: 'foo.main.A',
 2: 'foo.main',
 3: 'foo'}

We would take the length of the . prefix at the start of the identifier, expand it by lookup to the above, and then pass along the identifier to the _url_map lookup.

So in the example of ".x" it would go:

  • count the . prefix length: 1
  • look up in the dict: {1: 'foo.main.A'}
  • append the rest of the identifier -> 'foo.main.A' +'x' -> foo.main.A.x
  • do the identifier lookup as originally: {'foo.main.A.x': 'api/foo/#foo.main.A.x'}

@pawamoy
Copy link
Member

pawamoy commented Jan 14, 2024

The data flow is not the easiest one 🙂

What I envisioned was to take advantage of the intermediate data format used by autorefs, i.e. the HTML span with arbitrary attributes. In there we could store the path of the origin object (the one that has the relative ref in its docstring), as well as other helpful debugging information (line numbers, etc.). Then, in the process of "fixing" refs, we could use this information to rebuild absolute refs (and use additional info in error messages if needed).

The issue is, how do we pass this extra information to the spans? We could infer some from the surrounding HTML context, but we definitely do not want to parse HTML to extract it (super costly). Maybe we could have a pre-processing step during Markdown rendering, thanks to an additional Markdown processor that would act only on refs [][] that can't be rendered? When we render a docstring (in Jinja templates), we do have all the necessary context (in the Jinja environment, and therefore anywhere else we want). Obviously it implies that it would be a coordinated effort between autorefs and handlers, though I'm sure that with a good design we can first implement in autorefs and then add support for it in handlers later.

(no links, no diagrams, let me know if it's unclear. I'd like to elaborate anyway and sometimes put a clear explanation of the data flow in the docs 🙂 )

@lmmx
Copy link
Author

lmmx commented Jan 14, 2024

It's not at all bad! The state is relatively simple in the demo I've made and there are only 2 modules. 😇

I just traced back where to hook into the identifier lookup and as you rightly say it would have to be where the url_mapper is applied, which is in the inner() higher order function inside the fix_ref() funcdef in references.py.

In this function we get access to the HTML span, I just stuck a breakpoint on it:

> /home/louis/dev/autorefs/src/mkdocs_autorefs/references.py(167)inner()
-> try:
(Pdb) p identifier
'.x'
(Pdb) p title
'.x'
(Pdb) p kind
'autorefs-identifier'
(Pdb) p match
<re.Match object; span=(20120, 20165), match='<span data-autorefs-identifier=".x">.x</span>'>

I like your suggestion: there may well be further benefits we can get from populating this span with the location.

For now, all we need is the qualname of the object the docstring is on (its "target" shall we say).

We could infer some from the surrounding HTML context, but we definitely do not want to parse HTML to extract it (super costly). Maybe we could have a pre-processing step during Markdown rendering, thanks to an additional Markdown processor that would act only on refs [][] that can't be rendered? When we render a docstring (in Jinja templates), we do have all the necessary context (in the Jinja environment, and therefore anywhere else we want). Obviously it implies that it would be a coordinated effort between autorefs and handlers, though I'm sure that with a good design we can first implement in autorefs and then add support for it in handlers later.

I don't fully follow this: I was proposing to keep it entirely in the autorefs code, so handlers get it "for free".

It sounds like you're saying that isn't entirely possible: the handler needs to provide some more information (the target's qualname, at least).

Tracing the origin of this span, it began in AutoRefInlineProcessor: when self.handleMatch() called self.makeTag().

When it did this, all it passed along was identifier and text (i.e. for [.x][] both values would be be the string ".x").

Therefore we need to retrace our steps further: the handleMatch() function is not called anywhere in the autorefs code though, and it's also worth noticing that self.getText() [which is called in handleMatch to get the text] is not defined here.

  • getText is a stub in AutoRefInlineProcessor's base class ReferenceInlineProcessor's base class LinkInlineProcessor, and I presume is meant to be overridden by each handler.
  • the w(here) PDB command shows that handleMatch was called from the markdown package, not the mkdocs/trings ecosystem
Click to show full traceback to the handleMatch call
(Pdb) w
  /home/louis/miniconda3/envs/autoref-docs/bin/mkdocs(8)<module>()
-> sys.exit(cli())
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/click/core.py(1157)__call__()
-> return self.main(*args, **kwargs)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/click/core.py(1078)main()
-> rv = self.invoke(ctx)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/click/core.py(1688)invoke()
-> return _process_result(sub_ctx.command.invoke(sub_ctx))
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/click/core.py(1434)invoke()
-> return ctx.invoke(self.callback, **ctx.params)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/click/core.py(783)invoke()
-> return __callback(*args, **kwargs)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocs/__main__.py(270)serve_command()
-> serve.serve(**kwargs)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocs/commands/serve.py(86)serve()
-> builder(config)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocs/commands/serve.py(67)builder()
-> build(config, live_server=None if is_clean else server, dirty=is_dirty)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocs/commands/build.py(322)build()
-> _populate_page(file.page, config, files, dirty)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocs/commands/build.py(175)_populate_page()
-> page.render(config, files)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocs/structure/pages.py(271)render()
-> self.content = md.convert(self.markdown)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/core.py(357)convert()
-> root = self.parser.parseDocument(self.lines).getroot()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/blockparser.py(117)parseDocument()
-> self.parseChunk(self.root, '\n'.join(lines))
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/blockparser.py(136)parseChunk()
-> self.parseBlocks(parent, text.split('\n\n'))
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/blockparser.py(158)parseBlocks()
-> if processor.run(parent, blocks) is not False:
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings/extension.py(124)run()
-> html, handler, data = self._process_block(identifier, block, heading_level)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings/extension.py(220)_process_block()
-> rendered = handler.render(data, options)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/handler.py(334)render()
-> return template.render(
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/jinja2/environment.py(1299)render()
-> return self.environment.concat(self.root_render_func(ctx))  # type: ignore
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/module.html(15)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/module.html(77)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/module.html(129)block_contents()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/module.html(162)block_children()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/children.html(15)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/children.html(185)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/module.html(15)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/module.html(77)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/module.html(129)block_contents()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/module.html(162)block_children()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/children.html(15)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/children.html(125)root()
-> {% include attribute|get_template with context %}
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/class.html(15)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/class.html(79)root()
-> {% endfor -%}
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/class.html(186)block_contents()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/class.html(233)block_docstring()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/docstring.html(15)root()
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings_handlers/python/templates/material/_base/docstring.html(33)root()
-> {% endfor %}
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/mkdocstrings/handlers/base.py(276)do_convert_markdown()
-> return Markup(self._md.convert(text))
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/core.py(361)convert()
-> newRoot = treeprocessor.run(root)
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/treeprocessors.py(385)run()
-> self.__handleInline(text), child
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/treeprocessors.py(136)__handleInline()
-> data, matched, startIndex = self.__applyPattern(
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/treeprocessors.py(286)__applyPattern()
-> node, start, end = pattern.handleMatch(match, data)
> /home/louis/dev/autorefs/src/mkdocs_autorefs/references.py(66)handleMatch()
-> return self.makeTag(identifier, text), m.start(0), end
...
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/treeprocessors.py(385)run()
-> self.__handleInline(text), child
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/treeprocessors.py(136)__handleInline()
-> data, matched, startIndex = self.__applyPattern(
  /home/louis/miniconda3/envs/autoref-docs/lib/python3.10/site-packages/markdown/treeprocessors.py(286)__applyPattern()
-> node, start, end = pattern.handleMatch(match, data)

We can go read this code for treeprocessors.py over in the Python-Markdown/markdown repo

It's not clear to me how to supply the required info, so I'm glad you can see!

@lmmx
Copy link
Author

lmmx commented Jan 14, 2024

If you could give me a pointer to where to go next I'd appreciate it, I hit a bit of a dead end there.

I'm going to take a break now and come back, I have other things to get onto, so please no rush! 😺

I modified the test repo in the demo repo (which broke the deployment build, obviously, as this feature isn't created yet):

@pawamoy
Copy link
Member

pawamoy commented Jan 14, 2024

Of course, take your time 🙂

It sounds like you're saying that isn't entirely possible: the handler needs to provide some more information (the target's qualname, at least).

You got it right. I read again autorefs' code and had forgotten it already works with a Markdown inline processor.

Put shortly, there are 4 steps:

  1. autorefs registers its Markdown extension into MkDocs configuration.
  2. Docstrings are rendered to HTML by mkdocstrings in a nested/inner rendering step. Since mkdocstrings uses the MkDocs-configured Markdown extensions, autorefs' extension runs while docstring are rendered too, and unresolvable references like [method][.method] are transformed to spans
  3. In the outer Markdown rendering step, the same thing happens for references outside docstrings (in Markdown pages).
  4. Once all pages have been rendered to HTML, autorefs regex-find-and-replaces spans using the data it has stored (URL maps).

So now I have a clearer idea where the hook between handlers and autorefs should happen: in the autorefs reference processor. Actually it already works with a hook to handlers in the post-page processing step, so we could do something similar in the Markdown reference processor: if handlers are hooked, use them to retrieve additional information, that is then stored in spans. If not, do as usual. Then, in the post-page processing step, use that information (as described in previous comments, to rebuild absolute refs).

Or, instead of fetching info and storing it into spans, maybe we could ask the handlers themselves to directly resolve relative refs to absolute ones, so that we can simply store the absolute one in the span 🙂

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

Hi. I'd just like to point out one thing. The dot character has no significance to autorefs whatsoever.

When moving in this direction, it may become best to put all the code of autorefs back into mkdocstrings and then proceed there. Because by now there's way too much data flow.

And even still then, the dot will have no significance to mkdocstrings. I assume that you're assuming the Python handler, but why? What about the Crystal handler which uses :: among others as separators?

Oh actually there's another thing. In Python you say .foo.bar to mean relative and foo.bar to mean absolute. Whereas in Crystal you say Foo::Bar to mean relative or absolute, and ::Foo::Bar to mean definitely absolute. In fact I even already implement that in the Crystal handler. This makes me fairly certain that this should be implemented through the mkdocstrings handlers. So, you have a ref string, send it to the handler with context to turn it into a canonical identifier, and then search for the ref at that identifier.

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

Actually with the latter in mind, I think this should just be implemented in each handler itself. The Crystal handler implements relative refs directly and I saw no benefit from doing it any other way. Maybe only with some shared utilities.

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

You may think that cross-language refs couldn't work that way.. but what would a relative ref from one language to another even mean? It is best that they don't work, that'd be a real surprise otherwise

@pawamoy
Copy link
Member

pawamoy commented Jan 14, 2024

Thanks for sharing how this works in Crystal. I think with my latest suggestion

ask the handlers themselves to directly resolve relative refs to absolute ones

...this effectively moves the logic specific to each language into handlers themselves 👍

The benefit of "hooking" handlers to autorefs this way is that each handler does not have to re-implement a Markdown inline reference processor, or other less efficient parsing to transform references in Markdown docstrings.

@pawamoy
Copy link
Member

pawamoy commented Jan 14, 2024

Regarding cross-language abilities: at some point it might be necessary to support it in autorefs (but not relative refs, you're right, it doesn't seem to make sense). For example a same project could declare stuff in a language, and have bindings in another, where both have the same identifier. In that case you'll get into trouble trying to reference the object from language A instead of B, depending in which order the headings were rendered. What I mean is that for projects with multiple languages, we'll possibly need a feature like Sphinx' domains, with a way to choose the domain (or other attributes) directly within the markup.

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

Thanks for the response.

each handler does not have to re-implement a Markdown inline reference processor

Actually that brings me to another point.

Yes, this saves the need to re-implement the [foo.bar][] syntax. But in fact, language handlers should instead take the opportunity to use a different syntax that works better with the language's conventions.

  • For the Crystal handler I instead use the syntax `Foo::Bar.baz` which is the syntax that its native Markdown-based doc renderer understands. (yes, it's just a code tag, all of them get linkified as appropriate, even the implementation is just a treeprocessor, not a preprocessor)

  • If one were to implement a handler for Dart, one would want to have the syntax [Foo.bar] because that's the native syntax for Dart (also Markdown-based) and that's how basically all docs are already written.

So maybe sharing the implementation of the syntax is not even that good of a goal to have? The Python handler could consider a more convenient syntax as well, by the way. After all, it's already not pure Markdown, what with the Google-style sections and such.

With this idea, the [xxx][] syntax could remain as an absolute-only syntax and could also be extended to support language domain prefixes. Whereas relative (or still possibly absolute) link syntaxes would be left to each language's handler.

@lmmx
Copy link
Author

lmmx commented Jan 14, 2024

Thanks for chiming in @oprypin, I was aware there is also a Crystal handler too! 🙂

Hi. I'd just like to point out one thing. The dot character has no significance to autorefs whatsoever.

When moving in this direction, it may become best to put all the code of autorefs back into mkdocstrings and then proceed there. Because by now there's way too much data flow.

Wow! I didn’t intend to cause that sort of disruption to implement this feature, I saw it as an extension of existing functionality.

And even still then, the dot will have no significance to mkdocstrings. I assume that you're assuming the Python handler, but why? What about the Crystal handler which uses :: among others as separators?

I was not: the use of . as a separator and use of . to indicate parent is intended in the style of accessing parent namespaces in Python relative imports, but could equally be compared to the ../ syntax for accessing parent directories in shell scripting languages like Bash. The use of . as a separator is not a requirement for use of . as an indicator for relative paths (much like the separator for shell script paths is / while the indicator for the current directory is . and parent is ..)

In Python you say .foo.bar to mean relative and foo.bar to mean absolute. Whereas in Crystal you say Foo::Bar to mean relative or absolute, and ::Foo::Bar to mean definitely absolute.

The preceding . notation is not a part of Python but rather something proposed to indicate “parent”.

This therefore isn’t Python-specific. You would simply take the absolute path to the object being documented, which as you say might be Foo::bar (i.e. the bar method on the Foo class), and then follow the same logic:

  • . -> Foo::bar (the method)
  • .. -> Foo (the parent class)

I am not familiar with Crystal syntax any further than this, but I don’t see any particular need to use a separate syntax.

In any case it sounds like a per-handler approach is now being proposed (I just wanted to emphasise that this is not due to something I overlooked in the proposal above).

I laid out a pseudo-code routine for how this would work in this comment above

We would take the length of the . prefix at the start of the identifier, expand it by lookup to the above, and then pass along the identifier to the _url_map lookup.

So in the example of ".x" it would go:

  • count the . prefix length: 1
  • look up in the dict: {1: 'foo.main.A'}
  • append the rest of the identifier -> 'foo.main.A' +'x' -> foo.main.A.x
  • do the identifier lookup as originally: {'foo.main.A.x': 'api/foo/#foo.main.A.x'}

This routine is language agnostic, for Crystal you’d just change the values in the dict of parent levels:

  • count the . prefix length: 1
  • look up in the dict: {1: 'foo::main::A'}
  • append the rest of the identifier -> 'foo::main::A' +'x' -> foo.main.A::x
  • do the identifier lookup as originally: {'foo.main.A.x': 'api/foo/#foo::main::A::x'}

I see now why you thought that I’d overlooked Crystal, I just elided the separator as “+”.

No matter what separator you want to use, it will always be possible to ascend a path, provided you have the separators to split it on, which for Python is {.} and for Crystal is apparently {::,.}.

This was just meant to be a rough outline, so good to discuss a full spec for the work to be done.

I’m happy to shift this into a per-handler if the data flow proves too taxing to do in a common way - though I don’t think it would be impossible or even difficult to do it for all languages (just specifying the separator in the handler would suffice from what I understand here).

It is a bit of a shame to have to split this off per-language, I thought a simple universal approach would work.

Hopefully this explains why . can work as a universal relative reference indicator, which was chosen due to the only existing implementation of relative references (from the mkdocs Python repo).

If all languages have the _url_map variable I was thinking it’d be viable to make use of that to do this ‘retracing the path’ routine. Ideally only the language-specific bits would be delegated.

If you feel strongly that’s unwise please say so, this is my first time developing with this code ecosystem.

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

Hi @lmmx , thanks for the consideration. I have expressed why I think a language-specific approach could be better - with a more natural syntax for the target language and with relative lookup rules familiar to developers in that language. But yes, I definitely missed the fact that your proposal could still work universally.

  • In Crystal actually the lookup can go an arbitrary number of levels up. Inside Foo::Bar::Baz you can find Foo::Other as just `Other` (applies the same to actual code and to the doc generator). Of course people would be confused by needing to use [..Other][] instead.
    (The Crystal handler probably has like 2 users total, so it's not like this matters at all, but just interesting to study the case.)

  • And in Dart, the lookup of identifiers also matches up between actual code and the doc generator.
    (Again this is purely theoretical for mkdocstrings but also can be interesting)

I'll not block anything either way. Just to me seems that the design considerations are much harder even than the implementation considerations, so wanted to cover everything.

@pawamoy
Copy link
Member

pawamoy commented Jan 14, 2024

@oprypin Interesting.

I'm curious to see examples for many languages now.

  • Python: [foo][] (source: us)
  • Crystal: `foo`
  • Go: [foo] (source)
  • Dart: [foo]
  • Ruby: foo (source)
  • Java: {@link foo} (source)
  • Rust: [foo] and others, very similar to Python (source)
  • C/C++: \ref foo (entire line, no inline support I think) (source)
  • TypeScript: {@link foo} (source)

It's nice and all that most languages have their way of writing cross-references, but I'd personally find it cumbersome to use different syntax depending on the language when the docs themselves are all written in Markdown. Or, in a more enthusiast way, it would be nice to have a single syntax based on Markdown for every language.

@lmmx perfectly explained what are also my thoughts: the . prefix is not Python-specific (even if it nicely matches the current . delimitation character, even though : will again be introduced in the future, for reasons that are off-topic here), and if it's an issue for a particular handler, it could be made configurable per handler.

I also understand that if a language specifies documentation comments with non-Markdown markup, supporting just the cross-ref ability in a Markdown-like syntax does not make much sense. Writers will want to stay close to their language specs, to allow compatibility with other tools from their ecosystem, etc.. Unification of the syntax is therefore maybe just a sweet dream. Or maybe it's just too early, and we should reconsider when we have many more handlers. Even if touted as superior by design, rST did not manage to bring its cross-language abilities everywhere. Markdown is much, much more popular so maybe it has a chance.

With this idea, the [xxx][] syntax could remain as an absolute-only syntax and could also be extended to support language domain prefixes. Whereas relative (or still possibly absolute) link syntaxes would be left to each language's handler.

I see "two birds with one stone" here. If we extend it to support language domains, it would be nice to also allow handlers to transform relative refs to absolute ones, without moving the entire responsibility on them.


In the end, as you said @oprypin, it's a hard thing to design. I've not yet made up my mind and will need more exploration/discussion 🙂
@lmmx apologies for the back and forth between autorefs and handlers. Thank you for your patience and for helping us here!

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

🙂🙂 Nice discussion yes

It's nice and all that most languages have their way of writing cross-references, but I'd personally find it cumbersome to use different syntax depending on the language when the docs themselves are all written in Markdown.

The way I was thinking about it is .. you're writing the comments for the current programming language which surely you're familiar with so it can't be cumbersome to use its typical syntax. On the other side, having to bend to the particular doc generator in use seemed like the worse tradeoff to me, even if that doc generator is consistent with itself. I mean, it's easy to imagine a team lead putting a veto on a doc generator that forces developers to write comments in an un-idiomatic style. Whereas with what I'm proposing, you could freely switch between mkdocstrings and the native doc generator without needing to edit the original code comments.

  • Again I'll bring up something about Dart - the language parser parses the doc comments and makes the cross-references syntax-highlighted and clickable in IDEs 🙂 so surely there'd be a lot to lose by switching to a non-native syntax just to use a different doc generator

But this is interesting actually. My thinking is biased by the fact that most languages these days already have Markdown-based documentation syntax so I started to think of mkdocstrings mainly as a way to insert docs into a Markdown page, not necessarily written in the exact same dialect of Markdown as the top-level page, or even Markdown at all. But of course for the case of Python that actually was whole point - mkdocstrings-python wouldn't want to be based on reStructuredText just because that's what the developers are used to - that would lose a huge point of the project. But still I think "mkdocstrings-python-rst" would be a valid and feasible project if someone really likes rst but not sphinx and wanted to develop such an alternative. It would render rst docs into an MkDocs site. In which case surely the refs would preferably use rst syntax and not this one, right?

@oprypin
Copy link
Member

oprypin commented Jan 14, 2024

(continuing from the above)
Hmm or maybe it wouldn't be a valid project, mkdocstrings could just declare that all handlers are supposed to use Markdown 😅 Maybe that's part of the concept

@pawamoy
Copy link
Member

pawamoy commented Jan 14, 2024

That's kinda part of the concept yes 😂 But I see your point 🙂

@pawamoy
Copy link
Member

pawamoy commented May 16, 2024

@lmmx I'm working on this, see #46. Closing this, thanks for the great conversation 🙂

@pawamoy pawamoy closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants