-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation Plan: Additional search views #2676
Conversation
73637ba
to
c0e7f53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is excellent, I'm only requesting changes based on the volume of cleanup suggestions I've made.
Actually, one potential blocker. I believe we need to include the necessary API changes in this plan, rather than have a separate plan for the API. Since the API changes are so small.
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Zack, but will suggest that either the API changes should be included in this IP or they should be described in a separate IP first.
Without having a deep knowledge of how the frontend stores work, and because it isn't described in this implementation plan, it's very hard to understand how the searches for these collection pages is meant to be implemented. And in the interest of keeping the API as simple as possible, I'd prefer to have the API's approach guide the frontend implementation rather than the other way around.
Nuxt allows creating nested dynamic routes like | ||
`/pages/_collection/_mediaType/_term`. | ||
|
||
Here, the collection can be `tag`, `creator` or `source`. `mediaType` can be one | ||
of the supported media types (currently, `image` and `audio`). `term` refers to | ||
the actual value of the tag, creator or source name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fantastic! It could be useful for SEO as well. Speaking of which, what considerations do we need to apply to robots.txt
for this change, if any? I don't really understand our current approach to robots.txt, is it leftover from when we were iframed? @zackkrida I believe you may know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our current robots.txt is blank, which allows all pages of the site to be crawled and indexed. That works exactly the same as:
User-agent: *
Disallow:
The staging site is blocked from indexing.
I don't think we need to do anything here. These pages can be crawled and indexed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new pages are essentially search views with static URLs though (not query params). Should that kind of thing be indexed?
Sorry if this is already discussed somewhere. Might be helpful to document the rationale of our robots.txt in the docs site.
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
The header should also display the number of results, "251 audio files with the | ||
selected tag", "604 images provided by this source", "37 images by this creator | ||
in Hirshhorn Museum and Sculpture Garden". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it at all relevant for us to consider sources that have a single creator? Boston Public Library, as a source, for example, only has Boston Public Library as the "creator" of the work. It's weird to think that there would be two separate pages for them, but then I'm not sure how we would communicate this to the user if we automatically redirected the BPL creator search to the BPL source search 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a note to the plan. By the way, we don't have a Boston Public Library as a source filter on the frontend. What is the link to their API or collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I've just realised that BPL is just a creator we've picked up through Flickr, not an actual source. Maybe an easy source to add then!
#### Update the `searchBy` filter in the `search` store | ||
|
||
Currently, this filter is shaped as other filters: it is a list of objects with | ||
`code`, `name` and `checked` properties. This is a possible value now: `{ | ||
searchBy: { code: "creator", name: "filters.searchBy.creator", checked: true }}. | ||
|
||
Since this filter is mutually exclusive (you cannot search by both the creator | ||
and the tag, for example), this shape is very difficult to update. | ||
|
||
It is better to set the `searchBy` filter to have one of the possible values: | ||
`{ searchBy: <null|creator|source|tag> }`. | ||
|
||
We should also create a new method, `setSearchBy`, in the `search` store, that | ||
would allow directly set the `searchBy` filter value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify whether this is just a DevEx/code-quality refactor or if it's necessary for the new features? I don't fully understand why this particular change is relevant to the creator/tag/source pages. Are they going to use the same search store and filters? If so, that would be helpful background to have somewhere as well (may have missed it though?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I actually re-used the same search store in the draft implementation that I linked in the issue description. But to get the store to handle different searchBy
values, we should add this change.
Thank you for asking this question! I realized that I didn't write anything about the data side: what API calls will be made, how the stores will be used and how they need to be updated. I'll add this during the revision round.
I am not sure what events we want to add, if any. Should we add `<VIEW>_CLICKED` | ||
events for the new views, or would page views be sufficient? | ||
|
||
Should we add a new `SELECT_COLLECTION_ITEM` event similar to | ||
`SELECT_SEARCH_RESULT`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. SELECT_SEARCH_RESULT
can be narrowed to /search
or /tag*
(for example) to determine where the event occurred'
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
146dcec
to
12f655a
Compare
Full-stack documentation: https://docs.openverse.org/_preview/2676 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only three truly blocking issues, the rest are just additional questions or suggestions. The blockers are:
- The various issues I've described with using Postgres to query for the creator view must be addressed. I've suggested we continue to use ES by adding new
.raw
sub-fields for creator and source. - The frontend routes need to disambiguate creators by source. I've suggested alternative paths that mirror my suggested path (rather than query param) based approach for the API and would solve this problem.
- For me to fully understand how the plan would be implemented, I need an ordered, high-level step-by-step implementation of all the tasks that would be converted into individual issues. I think we should clarify this in the implementation plan template to have something like
ordered steps
andstep descriptions
so that it's possible to get both a high-level view of the plan separated from the detailed descriptions and nuances of each step.
Overall the plan looks very good though. The first two issues are critical blockers for the plan because the plan will not succeed without addressing them. The last is just a personal blocker for me to get a proper understanding of the plan.
I'm starting AFK tomorrow and am confident that you and Zack will be able to resolve the blockers without me. The plan otherwise is great. Please consider this plan approved by me once those blockers are resolved 🚀
|
||
<!-- Describe the implementation step necessary for completion. --> | ||
|
||
### API changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this section. It helps a lot to understand how this will all fit together.
I like the approach to add new routes on the API rather than trying to stuff everything into the existing search endpoint. That makes things a lot nicer for the creator and provider especially.
However, there are some problems with using the database rather than ES to access anything:
- The database does not cache queries in the same way that ES does. Repeated queries will not necessarily be as efficient as from ES.
- The database does not score documents at all, so the order will different dramatically to the way that ES would order the documents. That's an issue with respect to popularity data today already, but will become even more of an issue if we start to score documents based on other metrics as theorised by our search relevancy discussions.
creator
is not indexed in the API database, so a query against it will be very slow.
Only one of these can be "fixed", the third, but it's much easier to add a creator.raw
field to the index instead, like we've done for title, description, and tag name. Adding a new field to the index is far easier than adding a new index to the database (which would require careful application to avoid a long-running migration) and which could significantly increase the time it takes to do a data refresh.
I'd also strongly recommend against the route described as it is harder to cache in a sensible way (on the Cloudflare side of things) or to perform cache invalidation that might be required by #1969. Instead of using query strings, we can describe the resource via the path: /<media type>/source/<source>/creator/<creator>
is very clean, easy to read and understand, and very easy to manage the cache for because it is a static path. The source page can use the same route by leaving off the creator. This removes the need to manage specific query params as well and would allow us to add querying within these routes more easily in the future behind the regular q
parameter if we wanted.
For the tag route, I'd recommend the singular tag
rather than plural tags
, for legibility if we are presenting a single tag. Again, using a route-based approach instead of query parameter approach makes it easier to manage the cache if needed and could allow us to more easily feature specific tags with high quality results as example pages if we wanted, like /images/tag/dog
or /images/tag/flying%20birds
.
Path elements still need to be URL encoded to preserve special characters and spaces, but for most cases like /images/source/nasa
, /images/sources/nypl
, /images/sources/inaturalist
, etc we've suddenly got a very good-looking URL to point people to.
const prepareSearchQuery = ( | ||
searchParams: Record<string, string>, | ||
mediaType: SupportedMediaType | ||
) => { | ||
if (searchParams.searchBy) { | ||
if (searchBy === "creator") { | ||
// the name of the source is stored in the `<mediaType>Provider` filter | ||
return { | ||
creator: searchTerm, | ||
source: searchParams[`${mediaType}Provider`], | ||
} | ||
} else if (searchBy === "source") { | ||
return { source: searchParams[`${mediaType}Provider`] } | ||
} else if (searchBy === "tag") { | ||
// The parameter is plural in the API | ||
return { tags: searchTerm } | ||
} | ||
} else { | ||
// Current search query transform | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting that regardless of whether we use my fully route based approach above, we'll still need to determine the path pattern to use. Should this function also handle that, maybe returning either an array of path segments (including holes for parameters and a final object for query parameters if still needed) or the query parameter object to send to the default search
route?
Nuxt allows creating nested dynamic routes like | ||
`/pages/_collection/_mediaType/_term`. | ||
|
||
Here, the collection can be `tag`, `creator` or `source`. `mediaType` can be one | ||
of the supported media types (currently, `image` and `audio`). `term` refers to | ||
the actual value of the tag, creator or source name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realising now that for these to work, creator needs to have source included. It could make sense to make these match the API routes: /_mediaType/source/_source/creator/_creator
, /_mediaType/tag/_tag
etc.
The header should also display the number of results, "251 audio files with the | ||
selected tag", "604 images provided by this source", "37 images by this creator | ||
in Hirshhorn Museum and Sculpture Garden". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I've just realised that BPL is just a creator we've picked up through Flickr, not an actual source. Maybe an easy source to add then!
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Show resolved
Hide resolved
1af2cce
to
a886fd7
Compare
I've updated the plan to add the API and Elasticsearch changes. In particular, I updated the paths to match @sarayourfriend's suggestions. I also rewrote the plan to make the steps descriptions and steps sequence clearer. |
I'll review this today! |
|
||
<!-- What hard blockers exist which might prevent further work on this project? --> | ||
|
||
The main blocker could be the maintainer capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always a potential blocker, so I don't think we need to explicitly mention it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is in a good enough place to merge. You've done a really excellent job of outlining the steps necessary for the project and coming up with good frontend patterns for the feature.
...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md
Outdated
Show resolved
Hide resolved
Co-authored-by: Zack Krida <zackkrida@pm.me>
Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR: @krysal Excluding weekend1 days, this PR was ready for review 9 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2. @obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
We're going to merge this on the basis of @sarayourfriend's blockers being resolved 🥳 |
Fixes
Resolves #2560 by @zackkrida
Assigned reviewers
Note
A very rough draft implementation of the additional search view pages is in branch draft/collection_pages.
Current round
The discussion is currently in the Decision round.
The deadline for review of this round is 2023-07-25.