Index path with original path name #5785

stsewd · 2019-06-11T17:29:21Z

Wen we index to es from the ImportedFiles models, we don't save the path from the model, but instead the page name from the fjson file. The page name doesn't include the extension, we used to rely on the doctype in the resolver, but that got removed.

We talked about reindexing all with path being the real path from the model, but that would require more changes, and we are not sure if that's the solution (removing page name = path).

So, for now I'm just adding a new field with the original path value.

Fix #5397

We already have this information from the model

stsewd · 2019-06-12T04:35:08Z

This is going to break all projects, and they will need to trigger a build to be fixed... Guessing we don't want that. So, my thoughts:

I think we should serve this file from our servers
https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/core/static-src/core/js/doc-embed/search.js

so, if we change something in the search api, we have control on how we represen it
Currently search relies in https://github.com/rtfd/readthedocs.org/blob/8258a3c98eae25175cafbdab70ad671cf40445a2/readthedocs/core/static-src/core/js/doc-embed/search.js#L41-L41
but we already know the full file path (with the extension!)
Or we can just keep serving that file from each build, but don't depend on anything else from the local js
like DOCUMENTATION_OPTIONS
So, another way to not break existing projects, create a new attribute doc.docurl
that returns the full correct link
and keep around doc.link with the current old behavior till we figure out how to migrate that

ericholscher · 2019-06-12T20:05:49Z

Yea, I think we should start indexing the full filename in a new field in ES, then add it as a new endpoint in the docsearch API. Our new code can use this, and the old stuff will continue to work.

stsewd · 2019-06-17T20:05:45Z

Let me know if you want me to upload the minified files here

stsewd · 2019-06-18T01:26:16Z

readthedocs/search/parse_json.py

    if 'current_page_name' in data:
        path = data['current_page_name']
    else:
        log.info('Unable to index file due to no name %s', filename)
-        return None


If we return none from this function, get_processed_json is going to return None too, we always expect a dict from here. And get_processed_json has a default https://github.com/stsewd/readthedocs.org/blob/a2e0e3f5e442b0072a4b7cbac9efadbe2a41c224/readthedocs/projects/models.py#L1254-L1261

humitos · 2019-06-18T09:08:55Z

We already have this information from the model

Could you edit the description of the PR to include a summary of the problem and what's the solution proposed by the PR? It's hard to review without this context to me.

stsewd · 2019-06-18T15:18:51Z

@humitos updated ^

stsewd · 2019-06-18T15:37:50Z

Hold the review for a moment, I think there is a bettter way to put the new field

stsewd · 2019-06-18T16:21:02Z

Done!

ericholscher

This is a much simpler change, and lets us slowly roll out the change across all the places we use search. 👍

ericholscher · 2019-06-17T20:14:51Z

readthedocs/search/parse_json.py

-        log.info('Unable to read file: %s', filename)
-        return None
+        log.info('Unable to read file: %s', fjson_filename)
+        raise


Won't this cause all indexing to fail if a single file is missing?

Ah, I see we're catching it at a higher level, 👍

We were returning None, but we expect to always have a dict.

This is caught by https://github.com/stsewd/readthedocs.org/blob/96a85fa8af3cac8b139bdf99598d119eae0e0163/readthedocs/projects/models.py#L1244-L1260

Which returns a default dict

stsewd · 2019-06-18T16:44:09Z

Added to the deploy card that we need to do a full re-index to see this taking effect.

stsewd added 4 commits June 11, 2019 12:27

Index path with original path name

8cd3e7e

We already have this information from the model

Don't depend on DOCUMENTATION_OPTIONS

abf4a75

Don't return None

0874fba

Fix test

2fcd502

stsewd added the Needed: design decision A core team decision is required label Jun 12, 2019

stsewd added 3 commits June 17, 2019 12:55

Merge branch 'master' into index-with-real-path-name

a4edbad

Add new link field

28fb13f

Use doc.url in api response

ec839cd

stsewd removed the Needed: design decision A core team decision is required label Jun 17, 2019

stsewd added 2 commits June 17, 2019 18:11

Keep path around, add full_path

c971410

Revert changes in the search api

a2e0e3f

stsewd commented Jun 18, 2019

View reviewed changes

Fix tests

0e55e48

stsewd requested review from ericholscher and a team June 18, 2019 02:13

stsewd added 2 commits June 18, 2019 10:52

Better way to add a new field

f55b971

Fix test

96a85fa

ericholscher approved these changes Jun 18, 2019

View reviewed changes

stsewd merged commit a908684 into readthedocs:master Jun 18, 2019

stsewd deleted the index-with-real-path-name branch June 18, 2019 16:43

stsewd mentioned this pull request Jun 18, 2019

Use full_path to show search results #5821

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index path with original path name #5785

Index path with original path name #5785

stsewd commented Jun 11, 2019 •

edited

Loading

stsewd commented Jun 12, 2019

ericholscher commented Jun 12, 2019

stsewd commented Jun 17, 2019

stsewd Jun 18, 2019

humitos commented Jun 18, 2019

stsewd commented Jun 18, 2019

stsewd commented Jun 18, 2019

stsewd commented Jun 18, 2019

ericholscher left a comment

ericholscher Jun 17, 2019

ericholscher Jun 18, 2019

stsewd Jun 18, 2019

stsewd commented Jun 18, 2019

Index path with original path name #5785

Index path with original path name #5785

Conversation

stsewd commented Jun 11, 2019 • edited Loading

stsewd commented Jun 12, 2019

ericholscher commented Jun 12, 2019

stsewd commented Jun 17, 2019

stsewd Jun 18, 2019

Choose a reason for hiding this comment

humitos commented Jun 18, 2019

stsewd commented Jun 18, 2019

stsewd commented Jun 18, 2019

stsewd commented Jun 18, 2019

ericholscher left a comment

Choose a reason for hiding this comment

ericholscher Jun 17, 2019

Choose a reason for hiding this comment

ericholscher Jun 18, 2019

Choose a reason for hiding this comment

stsewd Jun 18, 2019

Choose a reason for hiding this comment

stsewd commented Jun 18, 2019

stsewd commented Jun 11, 2019 •

edited

Loading