Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CollectionVersion Upload to use pulpcore machinery #1176

Merged
merged 1 commit into from
Dec 1, 2022

Conversation

gerrod3
Copy link
Contributor

@gerrod3 gerrod3 commented Aug 17, 2022

fixes: #1175

@gerrod3 gerrod3 marked this pull request as ready for review August 18, 2022 02:11
@gerrod3 gerrod3 requested a review from a team as a code owner August 18, 2022 02:11
@gerrod3 gerrod3 force-pushed the collection-test-fix branch 4 times, most recently from 7176d3f to d0a2fda Compare August 18, 2022 19:38
@@ -309,27 +309,17 @@ def test_collection_download(artifact, pulp_client, collection_detail):
assert f.content == tarball


def test_collection_upload_repeat(pulp_client, known_collection, pulp_dist):
def test_collection_upload_repeat(pulp_client, artifact, pulp_dist, collection_upload):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of a fixture is "artifact"? If it's a pulp_ansible one can we name it collection_artifact?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes in this file supposed to be part of this PR at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I realized the tests for collection upload were wrong and incorrectly passing when it shouldn't.

Comment on lines +198 to +204
# TODO: Add this back when namespace, name, and version are apart of the CollectionImport
# for key, value in collection_upload.items():
# if key in COLLECTION_METADATA.keys():
# assert COLLECTION_METADATA[key] == value, collection_upload
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these do not assert anymore, i think we are seeing a change behavioural change unrelated to the bug / improvement advertised in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asserts were never being ran in the first place. collection_upload is a CollectionImport response and does not have the fields in COLLECTION_METADATA : namespace, name, version. So commenting them out does nothing I could have left it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying the keys should have been there? Then we should not guard with the "if". If not can we just remove the block completely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if they should have been there. But they will be once I add Namespaces.

@gerrod3 gerrod3 marked this pull request as draft August 22, 2022 01:34
@gerrod3 gerrod3 force-pushed the collection-test-fix branch 2 times, most recently from 14c1616 to 4788bc8 Compare August 24, 2022 15:42
@gerrod3 gerrod3 changed the title Fix duplicate collection upload error msg & tests Refactor CollectionVersion Upload to use pulpcore machinery Aug 24, 2022
@gerrod3 gerrod3 force-pushed the collection-test-fix branch 3 times, most recently from fe2c1c7 to 8538fed Compare August 25, 2022 18:14
pulp_tag_name = "Pulp_Ansible: Artifacts Collections V3"

DEFAULT_ACCESS_POLICY = _PERMISSIVE_ACCESS_POLICY

_declared_fields = serializer_class._declared_fields
INLINE_SERIALIZER = inline_serializer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this the FakeCollectionVersionUploadSerializer and define it outside of the class scope?

Also, can you remind me why we need to fake it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can hide the Pulp-specific fields: artifact, upload, and repository from the generated schema. I tried to just not include them in the Meta class, but then it would error when artifact or repository was set by the viewset's logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think, this is dangerous. Just not adding them to the docs, means that the serializer still accepts them. And it shouldn't!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I'll add validation for this serializer.

tags = validated_data.pop("tags")
origin_repository = validated_data.pop("origin_repository")
# Create CollectionVersion from its metadata and adds to repository if specified
content = super().create(validated_data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to add the add-to-repository-in-a-subtask feature to pulpcore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about it. I already know how I am going to implement the feature in pulp_ansible, but putting it into pulpcore would make it much more simple.

log = logging.getLogger(__name__)


def process_collection_artifact(artifact, namespace, name, version):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the thing we call in deferred_validate now? Great achievement!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, extracted out the relevant parts of import_collection for gathering metadata into this.

"""Check and set the namespace, name & version."""
# This validation is for creating CollectionVersions
if self.instance:
return data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we still call super first?

Copy link
Contributor Author

@gerrod3 gerrod3 Aug 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of removing this check and instead over-riding the update method to raise an error if called. validate should only be called on create or update so if self.instance is set then update will be called and we shouldn't allow it.

Edit NVM, validate gets called before update or create, so I have to leave this check here and the question is should I raise an error. If I call super it will try to call the process_collection code which is probably not what we want to do if they are trying to update. But should we allow them to use the serializer to update a collection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Hard question. From the API we should never allow updates. But we want to allow for a repair command. And it would be super neat if that repair command could just use this serializer to actually call the process_collection part again. No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be a good use. There would still need to be other fields that would need to be set if you wanted to call super in that case (artifact, expected_namespace/name/version/) else the serializer would fail validation. I'll remove the check and put a TODO comment to revisit this for a future issue (#1106).

@gerrod3 gerrod3 force-pushed the collection-test-fix branch 5 times, most recently from c42344c to 11b02ad Compare August 26, 2022 04:40
Copy link
Contributor Author

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdellweg @newswangerd Ready for review.

Summary of changes:

  • /ansible/collections/ & Galaxy V2 Collection upload have been deprecated.
  • CollectionVersionUploadSerializer now has all the old fields of CollectionOneShotSerializer and the entirety of the collection upload logic needed.
  • Galaxy V3 Collection upload is now a SingleArtifactContentUploadViewSet and uses the CollectionVersionUploadSerializer to perform the collection upload.
  • CollectionVersionSerializer now has CollectionVersionUploadSerializer as a base class so the Pulp CV api uses the same base upload logic as the Galaxy V3 api.
  • Pulp CV api now accepts artifact and upload fields when uploading a collection.

pulp_ansible/app/galaxy/mixins.py Outdated Show resolved Hide resolved
pulp_ansible/app/serializers.py Outdated Show resolved Hide resolved
Comment on lines +617 to +631
def is_valid(self, raise_exception=False):
"""
Allow this serializer to be used for validating before saving a model.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't touch any of the old import_collection task code, but this functionality with this serializer is how that task verifies the validity of the imported collection. Interestingly only pulp_ansible & pulp_deb follow this validation model and pulp_deb had to create two serializers to perform it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure we need this. Is it there to support code we plan to deprecate/delete? If i read correctly it allows the serializer to consume validated data, right? I do not think they are designed that way.
My thinking is if we hand the same raw request data that we get in the viewset to the task, then the serializer in the task can validate in the same way plus the additional deferred_validate step. No need for any serializer field to behave differently.
Not blocking on this now. But can you confirm that this will fall for another refactor to come?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it most likely will be removed once the import_collection code is finally removed. read_only fields are not validated when initialized in a serializer and they don't appear in the validate method.

log = logging.getLogger(__name__)


def process_collection_artifact(artifact, namespace, name, version):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, extracted out the relevant parts of import_collection for gathering metadata into this.

Comment on lines +26 to +28
# Set up logging for CollectionImport object
CollectionImport.objects.get_or_create(task_id=Task.current().pulp_id)
user_facing_logger = logging.getLogger("pulp_ansible.app.tasks.collection.import_collection")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set up under the assumption we are performing a collection import. A slight change would be needed here if we wanted to also use this for repairing existing collection metadata.

@gerrod3 gerrod3 marked this pull request as ready for review August 26, 2022 16:26
sha256 = data["file"].hashers["sha256"].hexdigest()
artifact = Artifact.objects.filter(sha256=sha256).first()
if artifact:
raise serializers.ValidationError(_("Artifact already exists"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to just take the existing artifact at this time? (The user uploaded the proper file, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this calls the old import_collection code and it doesn't handle duplicate artifacts so the task would fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we plan to (eventually?) change that task then too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eventually plan to remove the task as it is duplicate code now and eventually remove the deprecated endpoints that call it. Git sync still uses it as well so that will also require a separate PR to refactor it.

pulp_tag_name = "Pulp_Ansible: Artifacts Collections V3"

DEFAULT_ACCESS_POLICY = _PERMISSIVE_ACCESS_POLICY

_declared_fields = serializer_class._declared_fields
INLINE_SERIALIZER = inline_serializer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It think, this is dangerous. Just not adding them to the docs, means that the serializer still accepts them. And it shouldn't!?

pulp_ansible/app/serializers.py Outdated Show resolved Hide resolved
pulp_ansible/app/serializers.py Outdated Show resolved Hide resolved
pulp_ansible/app/serializers.py Outdated Show resolved Hide resolved
pulp_ansible/app/serializers.py Outdated Show resolved Hide resolved
@gerrod3 gerrod3 force-pushed the collection-test-fix branch 2 times, most recently from d148e42 to 00e8775 Compare August 29, 2022 18:26
Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused by the big number of serializers we have for creating CollectionVersions. Maybe we need individual ones for each version of the api, and that is fine.
We still need confirmation from the galaxy side before we can merge this.

3. ``pulpcore.plugin.tasks.general_create`` is dispatched with this serializer, the task payload
and a deferred context determine by the viewset (default an empty context).
4. ``general_create`` calls ``validate()`` again with deferred context now.
``deferred_validate()`` is now called and the upload object (if present) is converted to an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this where we also call the import_collection to extract all the metadata? That process that may be offloaded into a different application cluster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wrote these steps describing how the pulpcore machinery works since I feel it is not clear from just looking at the serializer what is going on.

Comment on lines +617 to +631
def is_valid(self, raise_exception=False):
"""
Allow this serializer to be used for validating before saving a model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure we need this. Is it there to support code we plan to deprecate/delete? If i read correctly it allows the serializer to consume validated data, right? I do not think they are designed that way.
My thinking is if we hand the same raw request data that we get in the viewset to the task, then the serializer in the task can validate in the same way plus the additional deferred_validate step. No need for any serializer field to behave differently.
Not blocking on this now. But can you confirm that this will fall for another refactor to come?

@mdellweg mdellweg marked this pull request as draft September 12, 2022 13:41
gerrod3 added a commit to gerrod3/galaxy_ng that referenced this pull request Sep 26, 2022
@newswangerd newswangerd removed the request for review from a team October 4, 2022 16:23
@gerrod3 gerrod3 force-pushed the collection-test-fix branch 2 times, most recently from b95db90 to a0dd173 Compare November 17, 2022 05:13
gerrod3 added a commit to gerrod3/galaxy_ng that referenced this pull request Nov 17, 2022
@gerrod3 gerrod3 force-pushed the collection-test-fix branch 2 times, most recently from c479463 to 0b03aa0 Compare November 17, 2022 16:18
gerrod3 added a commit to gerrod3/galaxy_ng that referenced this pull request Nov 21, 2022
gerrod3 added a commit to gerrod3/galaxy_ng that referenced this pull request Nov 21, 2022
gerrod3 added a commit to gerrod3/galaxy_ng that referenced this pull request Nov 21, 2022
No-Issue
Required PR: pulp/pulp_ansible#1176
env: LOCK_REQUIREMENTS=0
env: PULP_ANSIBLE_REVISION=0dc254376919b92ba28c4121058a844cd037d010
@gerrod3 gerrod3 marked this pull request as ready for review November 30, 2022 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collection Upload doesn't raise a pretty error when uploading a duplicate Artifact.
2 participants