Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change search_index to be a JSON file #2517

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aaruni96
Copy link
Contributor

@aaruni96 aaruni96 commented May 31, 2024

Not the best thing at the moment, because the fetch() function assumes index lives in "./". So search only works from /index.html.
I don't really know how to fix this.

Addresses #2516 , when its fixed.

aaruni96 and others added 2 commits May 31, 2024 19:40
Not the best thing at the moment, because the fetch() function assumes site is served from /
So it won't find search_index.json if site is project.com/docs instead of docs.project.com
@aaruni96
Copy link
Contributor Author

aaruni96 commented Jun 7, 2024

Search now works from all pages in the built docs, and the basic file is a valid JSON file, which can be manipulated using standard JSON tools.

@fingolfin
Copy link
Contributor

@mortenpi @fredrikekre could you please let the CI tests run on this one?

It seems quite useful to me (but I am biased, I work with @aaruni96 ;-) ).

@mortenpi mortenpi added the Format: HTML Related to the default HTML output label Jun 13, 2024
@mortenpi
Copy link
Member

Sorry for the delay here.

So, the reason we have it as a .js file is that you could load it from the .html files if you open them locally with the file:/// protocol in the browser (i.e. without having a web server, like LiveServer) running. Last I checked, browser security rules don't allow you to fetch files from a higher level directory. I.e. for example if you have build/search_index.json and try to fetch it from build/foo/bar.html, it won't be allowed.

And I think we still want to support that file:/// case (especially if prettyurls=false), e.g. for the Julia offline documentation.

Just a thought I had that we could try though is put the fetch call into the search_index.js file. I suspect it won't work, but maybe it will -- browsers can be weird 🤷

If that doesn't work, then I'm inclined to leave the default to generate the .js file. But we could have an option to change to the .json, if there's demand. Alternatively, we could also add an option to generate the .json in parallel maybe, but still use .js in the Documenter HTML? But I would like to understand the use case in more detail, before adding the complexity. There is also code out there that depends on search_index.js having the format that it has, so changing this will have some cost.

Last note: purely on aesthetic grounds, I agree that a JSON file would be cleaner. And would also make it much easier for anyone to consume or post-process it. But it's the file:/// requirement that made me implement it this way.

@aaruni96
Copy link
Contributor Author

browser security rules don't allow you to fetch files from a higher level directory

MDN seems to suggest fetch() will just fail if the URL starts with file:// ( https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSRequestNotHttp ).

Just a thought I had that we could try though is put the fetch call into the search_index.js file. I suspect it won't work,

That idea doesn't work for two reasons: 1) fetch still throws a CORS error, and 2) I couldn't figure out a way to have the path of the index be relative to the current page, if it was all inside the javascript.

But I would like to understand the use case in more detail, before adding the complexity

Currently the documentation for the OSCAR project (https://docs.oscar-system.org) includes the documentation for sub projects like Nemo, Hecke, AbstractAlgebra. We do want some, but not all documentation pages from these sub projects. While we make the unwanted pages unreachable via navigation, they still show up in search, being confusing to users (functions with the same name but different method might work in Nemo, but not Oscar). If the search index were a standard json file, we could run a post process script to trim out these unwanted pages from search.

This is the situation as I understand it, maybe @fingolfin can correct me.

@aaruni96
Copy link
Contributor Author

aaruni96 commented Jun 13, 2024

So, the reason we have it as a .js file is that you could load it from the .html files if you open them locally with the file:/// protocol in the browser

Local browsing anyways doesn't work well if prettyurls is true (default).

Maybe a compromise can be, use a valid json file if prettyurls is true, and the current method of loading the index to a variable from within the search_index.js if prettyurls is false ?

@aaruni96
Copy link
Contributor Author

To uncomplicate the two codepaths, we could keep the name search_index.js.

if prettyurls is true, add the fetch to HTML pages, and don't add the assignment to search_index.js
if prettyurls is false, don't write the fetch to the HTML pages, and add the assignment to the front of search_index.js

Would that be an idea worth considering ?

@mortenpi
Copy link
Member

I would prefer not to complicate prettyurls -- it is meant to configure one thing. In an ideal world, instead of individual options, I think we'd have two modes: (1) web server / deploy mode, (2) local build mode. But we don't at the moment.

For the OSCAR use case.. what about hooking into the search index generation in Julia, in make.jl? Would that work, if we'd have some systematic way to do it? Or are you pulling in pre-built HTML?

But if you want to process the JSON file, then I'd say let's make this opt-in, configured via an option to Documenter.HTML.

browser security rules don't allow you to fetch files from a higher level directory

MDN seems to suggest fetch() will just fail if the URL starts with file:// ( https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSRequestNotHttp ).

Uh, yea, okay, looks like things have become more strict, not less, over time. My knowledge predated CVE-2019-11730 😄

# convert Vector{SearchRecord} to a JSON string + do additional JS escaping
println(io, JSDependencies.json_jsescape(ctx.search_index), "\n}")
println(io, JSDependencies.json_jsescape(ctx.search_index))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be double checked, but JSON might need a different escaping function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like json_jsescape only escapes some newline unicode, so it turns already valid JSON to also be valid javascript (also, according to comments in the source code, this is not required since 2019).

More closely looking at the function, it calls JSON.json(), and then escapes the problematic unicode in whatever is returned, so this should be fine (at least for this change).

@aaruni96
Copy link
Contributor Author

what about hooking into the search index generation in Julia, in make.jl?

I'm going to need some more explanation here...

Or are you pulling in pre-built HTML?

Best as I can tell, we somehow convince Documenter to build the documentation for Oscar + all the sub projects at the same time, right @fingolfin ?

@fingolfin
Copy link
Contributor

We do all kinds of crude things and hacks in https://github.com/oscar-system/Oscar.jl/blob/master/docs/make_work.jl to make it work, including using a nasty hack to workaround the fact that PR #552 here (from 2017) still hasn't been merged (perhaps I can volunteer @aaruni96 to help finish that? ;-) )

But I'll try to stay on topic: we actually copy documentation from Hecke, Nemo, AbstractAlgebra, but we do by copying (or rather: symlinking) .md files, not compiled HTML. See lines 142ff in the linked make_work.jl.

So I assume that the index creation is after that, so hooking into the search index generation might indeed work?

@mortenpi
Copy link
Member

Sorry for the delay. So, the search index is just a linear list we populate while HTMLWriter goes through the document

search_index :: Vector{SearchRecord}

It then gets written out here at the end

open(joinpath(doc.user.build, ctx.search_index_js), "w") do io
println(io, "var documenterSearchIndex = {\"docs\":")
# convert Vector{SearchRecord} to a JSON string + do additional JS escaping
println(io, JSDependencies.json_jsescape(ctx.search_index), "\n}")
end

We could add an interface where you e.g. pass a callback to Documenter.HTML that gets passed that list just before it's written and you can then to whatever processing you want in make.jl, and you return a new list. And it defaults to identity, of course.

The SearchRecord-s themselves are pretty straightforward:

struct SearchRecord
src :: String
page :: Documenter.Page
fragment :: String
category :: String
title :: String
page_title :: String
text :: String
end

function JSON.lower(rec::SearchRecord)
# Replace any backslashes in links, if building the docs on Windows
src = replace(rec.src, '\\' => '/')
ref = string(src, rec.fragment)
Dict{String, String}(
"location" => ref,
"page" => rec.page_title,
"title" => rec.title,
"category" => rec.category,
"text" => rec.text
)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format: HTML Related to the default HTML output
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants