Implementation Plan: Additional search views #2676

obulat · 2023-07-19T17:37:14Z

Fixes

Assigned reviewers

Note

A very rough draft implementation of the additional search view pages is in branch draft/collection_pages.

Current round

Note
This discussion is following the Openverse decision-making process. Information about this process can be found
on the Openverse documentation site. Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.

The discussion is currently in the Decision round.

The deadline for review of this round is 2023-07-25.

zackkrida

This is excellent, I'm only requesting changes based on the volume of cleanup suggestions I've made.

Actually, one potential blocker. I believe we need to include the necessary API changes in this plan, rather than have a separate plan for the API. Since the API changes are so small.

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

sarayourfriend

I agree with Zack, but will suggest that either the API changes should be included in this IP or they should be described in a separate IP first.

Without having a deep knowledge of how the frontend stores work, and because it isn't described in this implementation plan, it's very hard to understand how the searches for these collection pages is meant to be implemented. And in the interest of keeping the API as simple as possible, I'd prefer to have the API's approach guide the frontend implementation rather than the other way around.

sarayourfriend · 2023-07-20T02:09:06Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+Nuxt allows creating nested dynamic routes like
+`/pages/_collection/_mediaType/_term`.
+
+Here, the collection can be `tag`, `creator` or `source`. `mediaType` can be one
+of the supported media types (currently, `image` and `audio`). `term` refers to
+the actual value of the tag, creator or source name.


That's fantastic! It could be useful for SEO as well. Speaking of which, what considerations do we need to apply to robots.txt for this change, if any? I don't really understand our current approach to robots.txt, is it leftover from when we were iframed? @zackkrida I believe you may know.

Our current robots.txt is blank, which allows all pages of the site to be crawled and indexed. That works exactly the same as:

User-agent: * Disallow:

The staging site is blocked from indexing.

I don't think we need to do anything here. These pages can be crawled and indexed.

These new pages are essentially search views with static URLs though (not query params). Should that kind of thing be indexed?

Sorry if this is already discussed somewhere. Might be helpful to document the rationale of our robots.txt in the docs site.

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

sarayourfriend · 2023-07-20T02:12:38Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+The header should also display the number of results, "251 audio files with the
+selected tag", "604 images provided by this source", "37 images by this creator
+in Hirshhorn Museum and Sculpture Garden".


Is it at all relevant for us to consider sources that have a single creator? Boston Public Library, as a source, for example, only has Boston Public Library as the "creator" of the work. It's weird to think that there would be two separate pages for them, but then I'm not sure how we would communicate this to the user if we automatically redirected the BPL creator search to the BPL source search 🤔

Added a note to the plan. By the way, we don't have a Boston Public Library as a source filter on the frontend. What is the link to their API or collection?

Oops, I've just realised that BPL is just a creator we've picked up through Flickr, not an actual source. Maybe an easy source to add then!

sarayourfriend · 2023-07-20T02:16:25Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+#### Update the `searchBy` filter in the `search` store
+
+Currently, this filter is shaped as other filters: it is a list of objects with
+`code`, `name` and `checked` properties. This is a possible value now: `{
+searchBy: { code: "creator", name: "filters.searchBy.creator", checked: true }}.
+
+Since this filter is mutually exclusive (you cannot search by both the creator
+and the tag, for example), this shape is very difficult to update.
+
+It is better to set the `searchBy` filter to have one of the possible values:
+`{ searchBy: <null|creator|source|tag> }`.
+
+We should also create a new method, `setSearchBy`, in the `search` store, that
+would allow directly set the `searchBy` filter value.


Can you clarify whether this is just a DevEx/code-quality refactor or if it's necessary for the new features? I don't fully understand why this particular change is relevant to the creator/tag/source pages. Are they going to use the same search store and filters? If so, that would be helpful background to have somewhere as well (may have missed it though?).

Yes, I actually re-used the same search store in the draft implementation that I linked in the issue description. But to get the store to handle different searchBy values, we should add this change.
Thank you for asking this question! I realized that I didn't write anything about the data side: what API calls will be made, how the stores will be used and how they need to be updated. I'll add this during the revision round.

sarayourfriend · 2023-07-20T02:18:15Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+I am not sure what events we want to add, if any. Should we add `<VIEW>_CLICKED`
+events for the new views, or would page views be sufficient?
+
+Should we add a new `SELECT_COLLECTION_ITEM` event similar to
+`SELECT_SEARCH_RESULT`?


Agreed. SELECT_SEARCH_RESULT can be narrowed to /search or /tag* (for example) to determine where the event occurred'

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

github-actions · 2023-07-21T07:32:23Z

Full-stack documentation: https://docs.openverse.org/_preview/2676

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

sarayourfriend

I have only three truly blocking issues, the rest are just additional questions or suggestions. The blockers are:

The various issues I've described with using Postgres to query for the creator view must be addressed. I've suggested we continue to use ES by adding new .raw sub-fields for creator and source.
The frontend routes need to disambiguate creators by source. I've suggested alternative paths that mirror my suggested path (rather than query param) based approach for the API and would solve this problem.
For me to fully understand how the plan would be implemented, I need an ordered, high-level step-by-step implementation of all the tasks that would be converted into individual issues. I think we should clarify this in the implementation plan template to have something like ordered steps and step descriptions so that it's possible to get both a high-level view of the plan separated from the detailed descriptions and nuances of each step.

Overall the plan looks very good though. The first two issues are critical blockers for the plan because the plan will not succeed without addressing them. The last is just a personal blocker for me to get a proper understanding of the plan.

I'm starting AFK tomorrow and am confident that you and Zack will be able to resolve the blockers without me. The plan otherwise is great. Please consider this plan approved by me once those blockers are resolved 🚀

sarayourfriend · 2023-07-24T00:19:23Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+
+<!-- Describe the implementation step necessary for completion. -->
+
+### API changes


Thank you for adding this section. It helps a lot to understand how this will all fit together.

I like the approach to add new routes on the API rather than trying to stuff everything into the existing search endpoint. That makes things a lot nicer for the creator and provider especially.

However, there are some problems with using the database rather than ES to access anything:

The database does not cache queries in the same way that ES does. Repeated queries will not necessarily be as efficient as from ES.

The database does not score documents at all, so the order will different dramatically to the way that ES would order the documents. That's an issue with respect to popularity data today already, but will become even more of an issue if we start to score documents based on other metrics as theorised by our search relevancy discussions.

creator is not indexed in the API database, so a query against it will be very slow.

Only one of these can be "fixed", the third, but it's much easier to add a creator.raw field to the index instead, like we've done for title, description, and tag name. Adding a new field to the index is far easier than adding a new index to the database (which would require careful application to avoid a long-running migration) and which could significantly increase the time it takes to do a data refresh.

I'd also strongly recommend against the route described as it is harder to cache in a sensible way (on the Cloudflare side of things) or to perform cache invalidation that might be required by #1969. Instead of using query strings, we can describe the resource via the path: /<media type>/source/<source>/creator/<creator> is very clean, easy to read and understand, and very easy to manage the cache for because it is a static path. The source page can use the same route by leaving off the creator. This removes the need to manage specific query params as well and would allow us to add querying within these routes more easily in the future behind the regular q parameter if we wanted.

For the tag route, I'd recommend the singular tag rather than plural tags, for legibility if we are presenting a single tag. Again, using a route-based approach instead of query parameter approach makes it easier to manage the cache if needed and could allow us to more easily feature specific tags with high quality results as example pages if we wanted, like /images/tag/dog or /images/tag/flying%20birds.

Path elements still need to be URL encoded to preserve special characters and spaces, but for most cases like /images/source/nasa, /images/sources/nypl, /images/sources/inaturalist, etc we've suddenly got a very good-looking URL to point people to.

sarayourfriend · 2023-07-24T00:23:03Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+const prepareSearchQuery = (
+  searchParams: Record<string, string>,
+  mediaType: SupportedMediaType
+) => {
+  if (searchParams.searchBy) {
+    if (searchBy === "creator") {
+      // the name of the source is stored in the `<mediaType>Provider` filter
+      return {
+        creator: searchTerm,
+        source: searchParams[`${mediaType}Provider`],
+      }
+    } else if (searchBy === "source") {
+      return { source: searchParams[`${mediaType}Provider`] }
+    } else if (searchBy === "tag") {
+      // The parameter is plural in the API
+      return { tags: searchTerm }
+    }
+  } else {
+    // Current search query transform
+  }
+}


Just noting that regardless of whether we use my fully route based approach above, we'll still need to determine the path pattern to use. Should this function also handle that, maybe returning either an array of path segments (including holes for parameters and a final object for query parameters if still needed) or the query parameter object to send to the default search route?

sarayourfriend · 2023-07-24T00:25:56Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+Nuxt allows creating nested dynamic routes like
+`/pages/_collection/_mediaType/_term`.
+
+Here, the collection can be `tag`, `creator` or `source`. `mediaType` can be one
+of the supported media types (currently, `image` and `audio`). `term` refers to
+the actual value of the tag, creator or source name.


Realising now that for these to work, creator needs to have source included. It could make sense to make these match the API routes: /_mediaType/source/_source/creator/_creator, /_mediaType/tag/_tag etc.

sarayourfriend · 2023-07-24T00:27:21Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+The header should also display the number of results, "251 audio files with the
+selected tag", "604 images provided by this source", "37 images by this creator
+in Hirshhorn Museum and Sculpture Garden".


Oops, I've just realised that BPL is just a creator we've picked up through Flickr, not an actual source. Maybe an easy source to add then!

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

obulat · 2023-07-26T19:21:50Z

I've updated the plan to add the API and Elasticsearch changes. In particular, I updated the paths to match @sarayourfriend's suggestions. I also rewrote the plan to make the steps descriptions and steps sequence clearer.
@zackkrida, this is ready for re-review.

zackkrida · 2023-07-28T11:27:29Z

I'll review this today!

zackkrida · 2023-07-28T16:28:05Z

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

+
+<!-- What hard blockers exist which might prevent further work on this project? -->
+
+The main blocker could be the maintainer capacity.


This is always a potential blocker, so I don't think we need to explicitly mention it.

zackkrida

I think this is in a good enough place to merge. You've done a really excellent job of outlining the steps necessary for the project and coming up with good frontend patterns for the feature.

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md

Co-authored-by: Zack Krida <zackkrida@pm.me>

openverse-bot · 2023-08-02T00:00:12Z

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 9 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)².

@obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

Requested changes were made

zackkrida · 2023-08-02T15:37:58Z

We're going to merge this on the basis of @sarayourfriend's blockers being resolved 🥳

obulat self-assigned this Jul 19, 2023

obulat requested a review from a team as a code owner July 19, 2023 17:37

obulat requested review from krysal and stacimc July 19, 2023 17:37

obulat changed the title ~~Additional search views implementation plan~~ Implementation Plan: Additional search views Jul 19, 2023

obulat requested review from zackkrida and removed request for stacimc July 19, 2023 17:37

github-actions bot added the 🧱 stack: documentation Related to Sphinx documentation label Jul 19, 2023

obulat requested a review from sarayourfriend July 19, 2023 17:39

obulat force-pushed the project/additional_search_views_ip branch from 73637ba to c0e7f53 Compare July 19, 2023 17:55

zackkrida requested changes Jul 19, 2023

View reviewed changes

sarayourfriend reviewed Jul 20, 2023

View reviewed changes

obulat force-pushed the project/additional_search_views_ip branch 4 times, most recently from 146dcec to 12f655a Compare July 21, 2023 07:22

obulat requested review from zackkrida and sarayourfriend July 23, 2023 04:08

sarayourfriend previously requested changes Jul 24, 2023

View reviewed changes

sarayourfriend mentioned this pull request Jul 24, 2023

Add additional details to implementation plan template #2703

Merged

5 tasks

obulat added 3 commits July 26, 2023 22:18

Additional search views implementation plan

3a2c0fc

Add changes from the review

2e1ad5a

Formatting changes

06e6e6c

obulat added 2 commits July 26, 2023 22:18

add local changes

89a96a3

Add API changes to fix blockers

a886fd7

obulat force-pushed the project/additional_search_views_ip branch from 1af2cce to a886fd7 Compare July 26, 2023 19:19

zackkrida reviewed Jul 28, 2023

View reviewed changes

zackkrida approved these changes Jul 28, 2023

View reviewed changes

zackkrida reviewed Jul 28, 2023

View reviewed changes

...ts/proposals/additional_search_views/20230719-implementation_plan_additional_search_views.md Outdated Show resolved Hide resolved

obulat and others added 2 commits July 29, 2023 18:56

Replace Elasticsearch changes with Search controller update

8ca4d4e

Add approval

36c1b81

Co-authored-by: Zack Krida <zackkrida@pm.me>

fcoveram mentioned this pull request Aug 2, 2023

Additional search views for the frontend #410

Closed

6 tasks

zackkrida requested a review from sarayourfriend August 2, 2023 15:37

zackkrida merged commit 4c99418 into main Aug 2, 2023
41 checks passed

zackkrida deleted the project/additional_search_views_ip branch August 2, 2023 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation Plan: Additional search views #2676

Implementation Plan: Additional search views #2676

obulat commented Jul 19, 2023 •

edited

Loading

zackkrida left a comment

sarayourfriend left a comment •

edited

Loading

sarayourfriend Jul 20, 2023

zackkrida Jul 20, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 20, 2023

obulat Jul 21, 2023

sarayourfriend Jul 24, 2023

sarayourfriend Jul 20, 2023

obulat Jul 20, 2023

sarayourfriend Jul 20, 2023

github-actions bot commented Jul 21, 2023 •

edited

Loading

sarayourfriend left a comment

sarayourfriend Jul 24, 2023

sarayourfriend Jul 24, 2023

sarayourfriend Jul 24, 2023

sarayourfriend Jul 24, 2023

obulat commented Jul 26, 2023

zackkrida commented Jul 28, 2023

zackkrida Jul 28, 2023

zackkrida left a comment

openverse-bot commented Aug 2, 2023

zackkrida commented Aug 2, 2023


		<!-- Describe the implementation step necessary for completion. -->

		### API changes


		<!-- What hard blockers exist which might prevent further work on this project? -->

		The main blocker could be the maintainer capacity.

Implementation Plan: Additional search views #2676

Implementation Plan: Additional search views #2676

Conversation

obulat commented Jul 19, 2023 • edited Loading

Fixes

Assigned reviewers

Note

Current round

zackkrida left a comment

Choose a reason for hiding this comment

sarayourfriend left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 21, 2023 • edited Loading

sarayourfriend left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

obulat commented Jul 26, 2023

zackkrida commented Jul 28, 2023

Choose a reason for hiding this comment

zackkrida left a comment

Choose a reason for hiding this comment

openverse-bot commented Aug 2, 2023

Footnotes

zackkrida commented Aug 2, 2023

obulat commented Jul 19, 2023 •

edited

Loading

sarayourfriend left a comment •

edited

Loading

github-actions bot commented Jul 21, 2023 •

edited

Loading