Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index path with original path name #5785

Merged
merged 12 commits into from
Jun 18, 2019

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 11, 2019

Wen we index to es from the ImportedFiles models, we don't save the path from the model, but instead the page name from the fjson file. The page name doesn't include the extension, we used to rely on the doctype in the resolver, but that got removed.

We talked about reindexing all with path being the real path from the model, but that would require more changes, and we are not sure if that's the solution (removing page name = path).

So, for now I'm just adding a new field with the original path value.

Fix #5397

@stsewd stsewd added the Needed: design decision A core team decision is required label Jun 12, 2019
@stsewd
Copy link
Member Author

stsewd commented Jun 12, 2019

This is going to break all projects, and they will need to trigger a build to be fixed... Guessing we don't want that. So, my thoughts:

I think we should serve this file from our servers
https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/static-src/core/js/doc-embed/search.js

so, if we change something in the search api, we have control on how we represen it
Currently search relies in https://github.com/rtfd/readthedocs.org/blob/8258a3c98eae25175cafbdab70ad671cf40445a2/readthedocs/core/static-src/core/js/doc-embed/search.js#L41-L41
but we already know the full file path (with the extension!)
Or we can just keep serving that file from each build, but don't depend on anything else from the local js
like DOCUMENTATION_OPTIONS
So, another way to not break existing projects, create a new attribute doc.docurl
that returns the full correct link
and keep around doc.link with the current old behavior till we figure out how to migrate that

@ericholscher
Copy link
Member

Yea, I think we should start indexing the full filename in a new field in ES, then add it as a new endpoint in the docsearch API. Our new code can use this, and the old stuff will continue to work.

@stsewd stsewd removed the Needed: design decision A core team decision is required label Jun 17, 2019
@stsewd
Copy link
Member Author

stsewd commented Jun 17, 2019

Let me know if you want me to upload the minified files here

if 'current_page_name' in data:
path = data['current_page_name']
else:
log.info('Unable to index file due to no name %s', filename)
return None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return none from this function, get_processed_json is going to return None too, we always expect a dict from here. And get_processed_json has a default https://github.com/stsewd/readthedocs.org/blob/a2e0e3f5e442b0072a4b7cbac9efadbe2a41c224/readthedocs/projects/models.py#L1254-L1261

@stsewd stsewd requested review from ericholscher and a team June 18, 2019 02:13
@humitos
Copy link
Member

humitos commented Jun 18, 2019

We already have this information from the model

Could you edit the description of the PR to include a summary of the problem and what's the solution proposed by the PR? It's hard to review without this context to me.

@stsewd
Copy link
Member Author

stsewd commented Jun 18, 2019

@humitos updated ^

@stsewd
Copy link
Member Author

stsewd commented Jun 18, 2019

Hold the review for a moment, I think there is a bettter way to put the new field

@stsewd
Copy link
Member Author

stsewd commented Jun 18, 2019

Done!

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much simpler change, and lets us slowly roll out the change across all the places we use search. 👍

log.info('Unable to read file: %s', filename)
return None
log.info('Unable to read file: %s', fjson_filename)
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause all indexing to fail if a single file is missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see we're catching it at a higher level, 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were returning None, but we expect to always have a dict.

This is caught by https://github.com/stsewd/readthedocs.org/blob/96a85fa8af3cac8b139bdf99598d119eae0e0163/readthedocs/projects/models.py#L1244-L1260

Which returns a default dict

@stsewd stsewd merged commit a908684 into readthedocs:master Jun 18, 2019
@stsewd stsewd deleted the index-with-real-path-name branch June 18, 2019 16:43
@stsewd
Copy link
Member Author

stsewd commented Jun 18, 2019

Added to the deploy card that we need to do a full re-index to see this taking effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

readthedocs.org "Search all docs" endpoint forwards to incorrect link (.html missing)
3 participants