-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow passing a local profile path for profile validation #54
Conversation
There is indeed no way to specify/override the bag profile in the CLI and it certainly seems reasonable to want parity between the API and CLI for this function. However, after looking at this with fresh eyes (it's been quite a while since it was first implemented), there are some additional points I'd like to raise. The
So while I would say this is currently supported, it is not clearly stated anywhere in the documentation. In fact the current documentation for However, as I review some of this logic, it now occurs to me that providing a way to override the profile path via if bag_info[profile_id_tag] != self.url:
self._fail(
"%s: '%s' tag does not contain this profile's URI: <%s> != <%s>"
% (bag, profile_id_tag, bag_info[profile_id_tag], self.url)
) This has got me thinking that implementing I am curious how you tested this with a local path. Did you include the local path as a string in your test bag's My current thinking is now that this parameter should be deprecated, and the documentation updated to reflect that it is possible to specify a local path via |
Hi @mikedarcy, thanks for taking a look so promptly and sharing your concerns! I was aware of the possibility to pass
That definitely sounds worth fixing, true.
This is not currently a problem in this PR, as the URL is explicitly passed to the # Instantiate a profile, supplying its URI.
if not profile_url:
profile_url = bag.info.get(BAG_PROFILE_TAG, None)
if not profile_url:
raise bdbp.ProfileValidationError("Bag does not contain a BagIt-Profile-Identifier") This logic was there before my changes as well but using But it's good that you bring this up as it made me notice that the profile URL the
This way you would then indeed get an error if the specified profile identifies itself differently than the profile targeted by the bag which is as the spec intends it I believe? Leaving it as I have it currently would never fail as the source of the URL is the bag which will of course always match itself further down - arguably that's more in line with what I'd expect, having a Not sure which is the better way to go, deprecating the option entirely would not be a good outcome in my view. But I don't know if I made where I'm coming from understandable or if you still disagree with the goal? |
OK, this makes sense to me now that I've looked at it a bit more. I failed to notice that the Profile constructor takes an already instantiated I can understand why you made some stylistic changes in places with simplifying some boolean truthiness and removing some line breaks but that stuff really doesn't belong to the feature request and makes for a more confusing PR and merge history. I take note of those and would be happy to consider them in a separate PR. Also, you relocated the conditionalized processing of the profile validation step in Additionally, I think renaming the existing Lastly, this is going to need a unit test. It's a pretty simple one so I don't see that being too cumbersome. All that being said, I would ask that you close this PR and submit a new one with just the minimum necessary changes and a test case. Other changes are welcome in other PRs. Alternatively, since I now get the gist of this feature request, you could just file an issue outlining the use-case and I can make the changes necessary, including adding a test case. I'm amenable to either path, so I will leave it up to you. In any case thanks for bringing this up! I do agree it is a useful feature and also illuminates some mistakes in the original implementation. Cheers 👍 |
…passing a local profile per PR: #54
@prettybits I've added the functionality discussed above in 3961d9e for the |
Thanks for taking this up again @mikedarcy, I got sidetracked a bit too much and lost sight of this for a while, sorry. I think the implementation of
The check for matching profile URLs is not overridden by the
You're right, sorry. It's a not ideal tendency of mine to make stylistic changes / linter fixes when I'm working on an area of code while I'm there, but I fully understand this can make seeing the essential changes harder.
That was partially motivated by the intended workflow as stated in the BagIt Profiles Specification ("specifically, it must complete the Bag if fetch.txt is present, validate the complete Bag against the Profile, and then [emphasis mine] validate the Bag against the cannonical BagIt spec.") and it also making more sense to me to fail faster in the comparatively cheap profile validation step before checksums are computed during full bag validation. I'm not sure that switching the order would affect your stated motivation but I'd have to think this through again. In any case you're right that I should at least have provided motivation for that change in my initial description. |
… before bag validation in CLI execution order. See #54. Add unit test and doc entry for above. Fix a typo in a related logging statement.
Sorry, I just plum forgot that the CLI modification was still necessary. Since I've already integrated part of your PR, I will just integrate the CLI part as well, and add another unit test for it.
OK, that makes sense. Unfortunately there is a small conflict with moving this because of how the existing order allows for profile validation upon bag creation as well. Specifically, the profile serialization can be validated after a bag serialization step during the creation process. Admittedly, this logic is a bit convoluted, but I think I have a solution for this where both behaviors can be supported. See 98b716e. |
Thanks for the prompt commit, I left a comment there as well just now. The logic is a bit hard to follow, yes, if I'm reading this right I don't think it's currently even possible to create and validate a bag in the same invocation going by this conditional but it works with updating? Besides that, couldn't the creation of the archive file (via the |
You are right about the conditional. The funny part is that this used to work in earlier version when doing the bag create, but the introduction of extra logic into that conditional some versions back actually broke it. I am going to try to restore that part of the conditional to allow for this sequence of events in the bag creation case. Now it will be even more convoluted ;). I'd rather not move archiving ahead of validation since having fail-fast behavior on profile validation would abort the archiving if validation failed, saving a potentially lengthy archive and cleanup operation. |
Support for this has been added to the 1.7.1 release. Closing PR. |
This introduces the option to specify a locally stored BagIt profile for profile validation instead of always trying to fetch the profile from the network. I was missing this when developing a new profile and testing validation, where the profile doesn't have a reachable URL yet. Will also be good to have when used in a fixed workflow where the profile JSON could just be stored locally and save the network request when validating incoming packages.
I chose
--profile-path
as the name for the new option.Is this something you would be OK with introducing? Happy to discuss this further if needed.