-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use search query parameters for additional search views in the API #3887
Conversation
6fcec5b
to
af3a43b
Compare
af3a43b
to
0e09c97
Compare
bef5f8a
to
43e93c4
Compare
I've seen this as a way that some other APIs handle arrays 😮 is this something we should support too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite the simplification! Noting that ENABLE_COLLECTIONS=True just api/up
did not work for me, I explicitly needed to set ENABLE_COLLECTIONS=true
in the api/.env
file instead. I also have a few questions.
), | ||
], | ||
) | ||
def test_collection_parameters(path, expected_params, api_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these removed because they're tested by schemathesis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that we don't need these tests because no longer have extra endpoints, and we do have the serializer tests that check that when a search request has the collection-related query parameters, we create the correct query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit different, unit vs integration level tests. Integration level tests would be nice for these complex endpoints because it's easier to build the real edge cases into the tests. Schemathesis might do that too, but I don't think this style of test is duplicative of the unit/serializer level tests. Just because we have serializer tests to test serializer output, doesn't mean those tests also ensure the serializer output integrates with the actual search requests themselves in all instances. Integration tests do that testing of the glue.
Incidentally, integration tests could obviate the need for some of those unit tests, but only reliably so if we had coverage collection.
With the complexity of the endpoints parameters, I think it's not a bad idea to have tests of the sort removed here that send a variety of query parameter configurations and assert at minimum whether the result is a successful search or not.
We could, but I don't see a reason for that. From the "Zen of Python": There should be one-- and preferably only one --obvious way to do it. 😄 |
To Madison's point, it really would be ideal if we were able to switch away from using You can verify this with Moscow, Idaho, where queries for Anyway, all of that to say, the current approach of using a comma doesn't work intuitively. You can URL encode the comma inside the tag, so long as we split on It isn't something we can change for existing parameters, and whether it's a pattern we should change for future ones, is subject to Olga's comment. Consistency can be more useful than correctness. |
It was really useful for me to write this response as I clearly understood the distinction between different parameters and how we handle them Additional search views The "list" queries are not used in the additional search views. Here, every For instance, Filter parameters in default or non-q search The list is split by a comma for the filter parameters only. There are two ways we handle such parameters
openverse/api/api/serializers/fields.py Line 55 in 697c178
Analyzing the search query openverse/api/api/controllers/search_controller.py Lines 335 to 349 in 697c178
Note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks and works great! Thanks for taking the time to explain the specifics of how we handle multiple arguments for the various parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requesting changes just for the environment variable name, but also asking for clarification on some things. The serializer changes we made to accommodate the different endpoints sharing code, now seem to be unnecessary, as it's all one endpoint. The abstraction is no longer necessary, and makes the code much harder to understand (especially now that it's all one endpoint). It's not necessarily a blocker, but some interdependencies of the validation are hard to keep track of now when needing to jump between multiple serializers.
|
||
def validate_source(self, input_sources): | ||
return self.validate_source_field(input_sources) | ||
sources_list = ", ".join([f"'{s}'" for s in allowed_sources]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this operation is unnecessary for most queries. Can we move it into the error branch itself on line 726 and bypass it for all other search queries?
# For collection=tag, return the value as is. It is ignored in the query builder. | ||
if collection == "tag": | ||
return value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this raise a validation error, as they are not compatible parameters, rather than just ignored? I'm having a hard time keeping straight what we're ignoring vs validating... spreading these out between separate serializers (MediaSearchRequestSourceSerializer and the regular search request serializer) is making it hard to keep track of what is validating what and where!
Are we able to reduce down to a single serializer, by chance? Or could the validation be built into a new SourceField
that validates differently based on the strategy?
), | ||
], | ||
) | ||
def test_collection_parameters(path, expected_params, api_client): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit different, unit vs integration level tests. Integration level tests would be nice for these complex endpoints because it's easier to build the real edge cases into the tests. Schemathesis might do that too, but I don't think this style of test is duplicative of the unit/serializer level tests. Just because we have serializer tests to test serializer output, doesn't mean those tests also ensure the serializer output integrates with the actual search requests themselves in all instances. Integration tests do that testing of the glue.
Incidentally, integration tests could obviate the need for some of those unit tests, but only reliably so if we had coverage collection.
With the complexity of the endpoints parameters, I think it's not a bad idea to have tests of the sort removed here that send a variety of query parameter configurations and assert at minimum whether the result is a successful search or not.
2006fdf
to
98628a6
Compare
I am a bit confused about which serializer changes this comment is. If the comment is about the separate The main I tried moving the @dhruvkb, as a person who worked on this code, do you think there could be any problems with using the If the comment is about |
The only issue with using context is if there are cases where the context cannot be defined, but the serialiser needs it to initialise. I don't think we have that in our application, we always know whether we're working on an image or audio request at the time we perform deserialization of request parameters. To me, it's a perfectly reasonable and DRF-y use of serializer context 👍 I'll re-review this PR in depth later today. Thanks for the explanation of the different serializers and how they worked together. I had not remembered the separation pre-existed the collections work, but now that validation is so much more complex, I think it does make sense to tighten the relationships but a bit, which it sounds like you've done. I look forward to reviewing the changes later today. |
Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR: @sarayourfriend Excluding weekend1 days, this PR was ready for review 7 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2. @obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
Review this now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serializer changes are excellent, and the cleanup and code deletions you've managed in this PR are awesome! Even with the addition of a large amount of new documentation strings, this PR still nets fewer lines of code 🎉
I have one thing that might seem small, but I think is actually a very significant issue with the template strings. Here is a patch that should resolve it:
diff --git a/api/api/docs/audio_docs.py b/api/api/docs/audio_docs.py
index dfaadc78f..06f5d573c 100644
--- a/api/api/docs/audio_docs.py
+++ b/api/api/docs/audio_docs.py
@@ -42,10 +42,10 @@ from api.serializers.provider_serializers import ProviderSerializer
serializer = AudioSearchRequestSerializer(context={"media_type": "audio"})
audio_filter_fields = fields_to_md([f for f in serializer.field_names if f != "q"])
-audio_search_description = SEARCH_DESCRIPTION % {
- "filter_fields": audio_filter_fields,
- "media_type": "audio files",
-}
+audio_search_description = SEARCH_DESCRIPTION.format(
+ filter_fields=audio_filter_fields,
+ media_type="audio files",
+)
search = custom_extend_schema(
desc=audio_search_description,
diff --git a/api/api/docs/base_docs.py b/api/api/docs/base_docs.py
index c80fd333a..96e3f6cee 100644
--- a/api/api/docs/base_docs.py
+++ b/api/api/docs/base_docs.py
@@ -144,7 +144,7 @@ SEARCH_DESCRIPTION_DEFAULT = """
Return audio files that match the query.
This endpoint allows you to search within specific fields, or to retrieve
-a collection of all %(media_type)s from a specific source, creator or tag.
+a collection of all {media_type} from a specific source, creator or tag.
Results are paginated on the basis of the `page` parameter. The `page_size`
parameter controls the total number of pages.
@@ -163,7 +163,7 @@ according to specific needs.
By default, this endpoint performs a full-text search for the value of `q` parameter.
You can search within the `creator`, `title` or `tags` fields by omitting
the `q` parameter and using one of these field parameters.
-These results can be filtered by %(filter_fields)s.
+These results can be filtered by {filter_fields}.
The default search results are sorted by relevance.
@@ -180,11 +180,11 @@ additions appearing first. The filters such as `license` are not available for c
"""
SEARCH_DESCRIPTION_COLLECTIONS_DISABLED = """
-Search %(media_type)s using a query string.
+Search {media_type} using a query string.
By using this endpoint, you can obtain search results based on specified
query and optionally filter results by
-%(filter_fields)s.
+{filter_fields}.
Results are ranked in order of relevance and paginated on the basis of the
`page` param. The `page_size` param controls the total number of pages.
diff --git a/api/api/docs/image_docs.py b/api/api/docs/image_docs.py
index f193308ec..5770f02c7 100644
--- a/api/api/docs/image_docs.py
+++ b/api/api/docs/image_docs.py
@@ -44,10 +44,10 @@ from api.serializers.provider_serializers import ProviderSerializer
serializer = ImageSearchRequestSerializer(context={"media_type": "image"})
image_filter_fields = fields_to_md([f for f in serializer.field_names if f != "q"])
-image_search_description = SEARCH_DESCRIPTION % {
- "filter_fields": image_filter_fields,
- "media_type": "images",
-}
+image_search_description = SEARCH_DESCRIPTION.format(
+ filter_fields=image_filter_fields,
+ media_type="images",
+)
search = custom_extend_schema(
desc=image_search_description,
diff --git a/api/api/serializers/docs.py b/api/api/serializers/docs.py
index 631a8eab7..5c333f2b6 100644
--- a/api/api/serializers/docs.py
+++ b/api/api/serializers/docs.py
@@ -40,17 +40,17 @@ Should be used with `{TAG}`, `source` or `creator`+`source`"""
EXCLUDED_SOURCE_HELP_TEXT = """
A comma separated list of data sources to exclude from the search.
-Valid values are `source_name`s from the stats endpoint: %(origin)s/v1/%(media_path)s/stats/.
+Valid values are `source_name`s from the stats endpoint: {origin}/v1/{media_path}/stats/.
"""
SOURCE_HELP_TEXT_COLLECTIONS_DISABLED = """
A comma separated list of data sources; valid values are
-`source_name`s from the stats endpoint: %(origin)s/v1/%(media_path)s/stats/."""
+`source_name`s from the stats endpoint: {origin}/v1/{media_path}/stats/."""
SOURCE = """
For default search, a comma separated list of data sources.
When the `collection` parameter is used, this parameter only accepts a single source.
-Valid values are `source_name`s from the stats endpoint: %(origin)s/v1/%(media_path)s/stats/.
+Valid values are `source_name`s from the stats endpoint: {origin}/v1/{media_path}/stats/.
"""
SOURCE_HELP_TEXT = (
diff --git a/api/api/serializers/media_serializers.py b/api/api/serializers/media_serializers.py
index 3b4e94f7f..809562cba 100644
--- a/api/api/serializers/media_serializers.py
+++ b/api/api/serializers/media_serializers.py
@@ -303,8 +303,8 @@ class MediaSearchRequestSerializer(PaginatedRequestSerializer):
"media_path": media_path,
}
- self.fields["source"].help_text = SOURCE_HELP_TEXT % variables
- self.fields["excluded_source"].help_text = EXCLUDED_SOURCE_HELP_TEXT % variables
+ self.fields["source"].help_text = SOURCE_HELP_TEXT.format(**variables)
+ self.fields["excluded_source"].help_text = EXCLUDED_SOURCE_HELP_TEXT.format(**variables)
def is_request_anonymous(self):
request = self.context.get("request")
Really want to emphasise that I'm not trying to be nitpicky here, and recognise that it's rare to block on something that may seem like preference, but %
formatted strings are so tricky and unusual in new Python code that I think it is worth doing in this case for the reasons I've shared.
# "collection", | ||
# "tag", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this, will these need the unstable__
prefix to be accurate, like unstable_authority
and the others below?
It's commented, so wouldn't matter either way, but after uncommenting? I wonder why we included unstable in the commented versions below, if not 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the comment to link to the GitHub issue to remove the unstable__
prefix. So, these will be uncommented after the prefix is gone.
I am not sure what the best process for launching would be. This is one of the scenarios I can think of:
- Switch the frontend
additional_search_views
flag on, without fully removing the flag, and setSHOW_COLLECTION_DOCS
env variable in the API to True in prod. This means that the users on the frontend can see the additional search views as fully-launched, and the API users can start using them with theunstable__
parameters. - Test how well both API and frontend work in prod for ~ 1 week. If needed, we can revert using the frontend flag and the API env variable.
- If everything is successful, add the stable parameters to the frontend.
- Stabilize the API parameters (remove the conditionals from the additional search views documentation and the
unstable__
prefix fromtag
andcollection
parameters) - Remove the additional frontend parameters and the feature flag.
These lines will be uncommented after step 4, then. So, they won't need the prefix. And the unstable versions will be seen after step 1, but will be removed after step 4.
label="licenses", | ||
help_text=make_comma_separated_help_text(LICENSE_GROUPS["all"], "licenses"), | ||
source = serializers.CharField( | ||
label="provider", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary to change in this PR, but should we update this to align with the actual parameter name "source" so it isn't confused with provider?
label
doesn't affect the actual API, I think, just description: https://www.django-rest-framework.org/api-guide/fields/#label
If so, let me know and I'll create an issue for this. Again, not something to implement/change in this PR, just wanting a second opinion :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Co-authored-by: Madison Swain-Bowden <bowdenm@spu.edu>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Co-authored-by: FelixSjogren <felixarvidsjogren@gmail.com> FEAT - Add lot for the content instead of a prop (#2) Closes #2 Co-authored-by: Stagge <jonatan.stagge@gmail.com> TEST - Add tests for VTag (#7) Added tests for Vtag, tests include: All props are sent to VButton VTag renders slot content Renders anchor tag. Co-authored-by: Stagge <jonatan.stagge@gmail.com> FEAT - Ensure inner VButton emits and handles events in VTag #4 Closes #4 Added accessibility label (#10) - Added aria-label to indicate that that the link is a tag Improvements from review lint Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
3809137
to
68664a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'll create the issues for the field labels tomorrow. I need to log off for the day now 🙂
Fixes
Fixes #3869 by @obulat
Description
This PR removes the collection API endpoints and uses the
search
endpoint withcollection
-related query parameters.This PR adds an
ENABLE_COLLECTIONS
environment variable. When it's set toTrue
, the API documentation includes the additional documentation fortag
,collection
query parameters, updated search endpoint descriptions and updated descriptions ofcreator
andsource
query parameters.The source serializer is updated to validate a single source when
collection
parameter is present.Note: the
source
serializer takes the value of the last parameter if there is more than one:source=flickr&source=stocksnap
only sendsstocksnap
to the source validator.Testing Instructions
Documentation
Add
ENABLE_COLLECTIONS=True
to your/api/.env
fileRun the API
just api/up
.Check the documentation for the images and audio search endpoints.
Text improvement suggestions/rewrites are very welcome!
Set
ENABLE_COLLECTIONS
env variable toFalse
and see that the new parameters (unstable__collection
andunstable__tag
are not listed, and the documentation is almost the same as it currently is in prod. Note I removedq
from the list of parameters that can be used to filter results since it's not used for that.Successful collection requests
These requests should return the collection results
Serializer validation errors
"The `creator` and `source` parameters are required when `unstable__collection` is set to `creator`."
"The `collection` parameter cannot be used with the `q` parameter."
"Invalid source parameter 'stocksnap,flickr'. Use one of the valid sources: 'stocksnap', 'flickr'"
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin