-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CollectionVersion Upload to use pulpcore machinery #1176
Conversation
02ee428
to
d7d9b91
Compare
7176d3f
to
d0a2fda
Compare
@@ -309,27 +309,17 @@ def test_collection_download(artifact, pulp_client, collection_detail): | |||
assert f.content == tarball | |||
|
|||
|
|||
def test_collection_upload_repeat(pulp_client, known_collection, pulp_dist): | |||
def test_collection_upload_repeat(pulp_client, artifact, pulp_dist, collection_upload): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of a fixture is "artifact"? If it's a pulp_ansible one can we name it collection_artifact
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the changes in this file supposed to be part of this PR at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I realized the tests for collection upload were wrong and incorrectly passing when it shouldn't.
# TODO: Add this back when namespace, name, and version are apart of the CollectionImport | ||
# for key, value in collection_upload.items(): | ||
# if key in COLLECTION_METADATA.keys(): | ||
# assert COLLECTION_METADATA[key] == value, collection_upload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these do not assert anymore, i think we are seeing a change behavioural change unrelated to the bug / improvement advertised in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The asserts were never being ran in the first place. collection_upload
is a CollectionImport
response and does not have the fields in COLLECTION_METADATA
: namespace, name, version
. So commenting them out does nothing I could have left it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying the keys should have been there? Then we should not guard with the "if". If not can we just remove the block completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea if they should have been there. But they will be once I add Namespace
s.
d0a2fda
to
d6d1b03
Compare
14c1616
to
4788bc8
Compare
fe2c1c7
to
8538fed
Compare
pulp_ansible/app/galaxy/v3/views.py
Outdated
pulp_tag_name = "Pulp_Ansible: Artifacts Collections V3" | ||
|
||
DEFAULT_ACCESS_POLICY = _PERMISSIVE_ACCESS_POLICY | ||
|
||
_declared_fields = serializer_class._declared_fields | ||
INLINE_SERIALIZER = inline_serializer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this the FakeCollectionVersionUploadSerializer
and define it outside of the class scope?
Also, can you remind me why we need to fake it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can hide the Pulp-specific fields: artifact
, upload
, and repository
from the generated schema. I tried to just not include them in the Meta
class, but then it would error when artifact
or repository
was set by the viewset's logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It think, this is dangerous. Just not adding them to the docs, means that the serializer still accepts them. And it shouldn't!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I'll add validation for this serializer.
tags = validated_data.pop("tags") | ||
origin_repository = validated_data.pop("origin_repository") | ||
# Create CollectionVersion from its metadata and adds to repository if specified | ||
content = super().create(validated_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to add the add-to-repository-in-a-subtask feature to pulpcore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about it. I already know how I am going to implement the feature in pulp_ansible, but putting it into pulpcore would make it much more simple.
log = logging.getLogger(__name__) | ||
|
||
|
||
def process_collection_artifact(artifact, namespace, name, version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the thing we call in deferred_validate
now? Great achievement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, extracted out the relevant parts of import_collection
for gathering metadata into this.
pulp_ansible/app/serializers.py
Outdated
"""Check and set the namespace, name & version.""" | ||
# This validation is for creating CollectionVersions | ||
if self.instance: | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we still call super
first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking of removing this check and instead over-riding the update
method to raise an error if called. validate
should only be called on create
or update
so if self.instance
is set then update
will be called and we shouldn't allow it.
Edit NVM, validate
gets called before update
or create
, so I have to leave this check here and the question is should I raise an error. If I call super
it will try to call the process_collection
code which is probably not what we want to do if they are trying to update. But should we allow them to use the serializer to update a collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Hard question. From the API we should never allow updates. But we want to allow for a repair command. And it would be super neat if that repair command could just use this serializer to actually call the process_collection
part again. No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be a good use. There would still need to be other fields that would need to be set if you wanted to call super
in that case (artifact
, expected_namespace/name/version/
) else the serializer would fail validation. I'll remove the check and put a TODO comment to revisit this for a future issue (#1106).
c42344c
to
11b02ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdellweg @newswangerd Ready for review.
Summary of changes:
/ansible/collections/
& Galaxy V2 Collection upload have been deprecated.CollectionVersionUploadSerializer
now has all the old fields ofCollectionOneShotSerializer
and the entirety of the collection upload logic needed.- Galaxy V3 Collection upload is now a
SingleArtifactContentUploadViewSet
and uses theCollectionVersionUploadSerializer
to perform the collection upload. CollectionVersionSerializer
now hasCollectionVersionUploadSerializer
as a base class so the Pulp CV api uses the same base upload logic as the Galaxy V3 api.- Pulp CV api now accepts
artifact
andupload
fields when uploading a collection.
def is_valid(self, raise_exception=False): | ||
""" | ||
Allow this serializer to be used for validating before saving a model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't touch any of the old import_collection
task code, but this functionality with this serializer is how that task verifies the validity of the imported collection. Interestingly only pulp_ansible & pulp_deb follow this validation model and pulp_deb had to create two serializers to perform it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure we need this. Is it there to support code we plan to deprecate/delete? If i read correctly it allows the serializer to consume validated data, right? I do not think they are designed that way.
My thinking is if we hand the same raw request data that we get in the viewset to the task, then the serializer in the task can validate in the same way plus the additional deferred_validate step. No need for any serializer field to behave differently.
Not blocking on this now. But can you confirm that this will fall for another refactor to come?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it most likely will be removed once the import_collection
code is finally removed. read_only
fields are not validated when initialized in a serializer and they don't appear in the validate
method.
log = logging.getLogger(__name__) | ||
|
||
|
||
def process_collection_artifact(artifact, namespace, name, version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, extracted out the relevant parts of import_collection
for gathering metadata into this.
# Set up logging for CollectionImport object | ||
CollectionImport.objects.get_or_create(task_id=Task.current().pulp_id) | ||
user_facing_logger = logging.getLogger("pulp_ansible.app.tasks.collection.import_collection") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is set up under the assumption we are performing a collection import. A slight change would be needed here if we wanted to also use this for repairing existing collection metadata.
sha256 = data["file"].hashers["sha256"].hexdigest() | ||
artifact = Artifact.objects.filter(sha256=sha256).first() | ||
if artifact: | ||
raise serializers.ValidationError(_("Artifact already exists")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it safe to just take the existing artifact at this time? (The user uploaded the proper file, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this calls the old import_collection
code and it doesn't handle duplicate artifacts so the task would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to (eventually?) change that task then too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I eventually plan to remove the task as it is duplicate code now and eventually remove the deprecated endpoints that call it. Git sync still uses it as well so that will also require a separate PR to refactor it.
pulp_ansible/app/galaxy/v3/views.py
Outdated
pulp_tag_name = "Pulp_Ansible: Artifacts Collections V3" | ||
|
||
DEFAULT_ACCESS_POLICY = _PERMISSIVE_ACCESS_POLICY | ||
|
||
_declared_fields = serializer_class._declared_fields | ||
INLINE_SERIALIZER = inline_serializer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It think, this is dangerous. Just not adding them to the docs, means that the serializer still accepts them. And it shouldn't!?
d148e42
to
00e8775
Compare
00e8775
to
a48903d
Compare
a48903d
to
d172762
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still a bit confused by the big number of serializers we have for creating CollectionVersions. Maybe we need individual ones for each version of the api, and that is fine.
We still need confirmation from the galaxy side before we can merge this.
3. ``pulpcore.plugin.tasks.general_create`` is dispatched with this serializer, the task payload | ||
and a deferred context determine by the viewset (default an empty context). | ||
4. ``general_create`` calls ``validate()`` again with deferred context now. | ||
``deferred_validate()`` is now called and the upload object (if present) is converted to an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this where we also call the import_collection to extract all the metadata? That process that may be offloaded into a different application cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wrote these steps describing how the pulpcore machinery works since I feel it is not clear from just looking at the serializer what is going on.
def is_valid(self, raise_exception=False): | ||
""" | ||
Allow this serializer to be used for validating before saving a model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure we need this. Is it there to support code we plan to deprecate/delete? If i read correctly it allows the serializer to consume validated data, right? I do not think they are designed that way.
My thinking is if we hand the same raw request data that we get in the viewset to the task, then the serializer in the task can validate in the same way plus the additional deferred_validate step. No need for any serializer field to behave differently.
Not blocking on this now. But can you confirm that this will fall for another refactor to come?
d172762
to
c21565b
Compare
[noissue] Required PR: pulp/pulp_ansible#1176
b95db90
to
a0dd173
Compare
[noissue] Required PR: pulp/pulp_ansible#1176
c479463
to
0b03aa0
Compare
0b03aa0
to
0dc2543
Compare
No-Issue Required PR: pulp/pulp_ansible#1176
No-Issue Required PR: pulp/pulp_ansible#1176
No-Issue Required PR: pulp/pulp_ansible#1176 env: LOCK_REQUIREMENTS=0 env: PULP_ANSIBLE_REVISION=0dc254376919b92ba28c4121058a844cd037d010
fixes: #1175