-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed changes for 1.0 (updated source repo) #19
base: master
Are you sure you want to change the base?
Conversation
This appears to have been commented out since at least 2008.
This follows the guidelines in RFC-3629
* Note the existence of namespaces in the security considerations section * Update previously un-displayed list of reserved DOS/Windows filenames
Clarify that this section is part of the specification but is not considered a hard requirement for an implementation.
Update the section describing md5sum’s output format and clarify that it is strictly optional to accept bags which are produced using md5sum and will not pass a strict validation.
This adds background information for problems related to case-sensitivity and Unicode normalization and adds a list of recommendations for implementors.
This adds the note that, unlike other metadata tags, this element must not be repeated and clarifies that the Payload-Oxum value is not sufficient for validation.
This triggers the standard formatting in HTML, etc. outputs
* Use <organization> for relevant <author> entries * Omit empty <date> attributes
* Remove reference to GRABIT since the spec is now returning HTTP 404 and there are no known public implementations. * Add METALINK (RFC 5854) as an alternative which supports mirrors and protocols such as BitTorrent.
This wording is shorter and doesn’t distinguish between validation for payload and tag files.
The spec shouldn't need to include mechanistic transfer details: if the results validate, it's a bag.
…rror handling "Upon discovering errors in bags, an implementation is free to take action (for example, logging or reporting) in an application-specific manner. This document does not mandate any particular action."
some displays ended with an extra blank line
Per reviewer comment: > Section 2.1.3), a file named "bagit.txt" (see Section 2.1.1), and > zero or more additional tag files (see Section 2.2). The tag files > in the optional tag directories are arbitrary file hierarchies and > the tag directories MAY have any name that is not reserved for a file > or directory in this specification. Above (2) seems to say that all tag directories are optional. Hence constantly including the word 'optional' for them, in the rest of the document, is distracting. > > The base directory MAY have any name. > > <base directory>/ > | bagit.txt > | manifest-<algorithm>.txt > | [optional additional tag files] > \--- data/ > | [payload files] > \--- [optional tag directories]/ > | [optional tag files] The square brackets are probably enough to indicate being optional. The word just makes things wordier. _The word “optional” has been removed as redundant, given the bracketing and that all tag directories have been described previously as optional._
bagit.xml
Outdated
@@ -287,8 +287,7 @@ The base directory can have any name. | |||
| | |||
+-- [optional tag directories]/ | |||
| | |||
+-- [optional tag files] | |||
</artwork> | |||
+-- [optional tag files] </artwork> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the intention of this to avoid an extra line in the rendered text output? I'm not a huge fan of the closing tag being on the end of the line like this but I'm not sure it's worth changing everything to go the other way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, and it definitely makes the XML ugly. It may be a defect in the xml2rfc tool. If you can find a way around it, go for it, but if push comes to shove, I think getting the rendered human-oriented document consistent and correct is more important than making the XML pretty.
Per reviewer comment: > A payload manifest is a tag file that lists payload files and probably: that lists payload file names and Clarified. Saying "lists" does imply names and not the file contents, but for some reason I think the modified form will be clearer. > checksums for those payload files generated using a particular bag I'm pretty sure it's not the payload files that are generated using a checksum algorithm... I assume it's a manifest payload file listing... _That sentence was stricken during recent editing rounds. A similar sentence has been reworded: “Every payload manifest MUST list every payload file name exactly once.”_
Per reviewer comment: > checksum algorithm. Every bag MUST contain one payload manifest > file, and MAY contain more than one. A payload manifest file MUST I think this is unusual enough to warrant, again, an initial, summary statement. If I'm understanding, it should be something like: A bag can have more than one data integrity manifest, with each using a different validation algorithm. _This sentence has been added: A bag can have more than one payload manifest, with each using a different validation algorithm._
Per reviewer comment: > Source-Organization Organization transferring the content. ... > Organization-Address Mailing address of the organization. organization -> source organization > Contact-Name Person at the source organization who is responsible > for the content transfer. > > Contact-Phone International format telephone number of person or > position responsible. > > Contact-Email Fully qualified email address of person or position > responsible. > ... > External-Description A brief explanation of the contents and > provenance. ... > Bagging-Date Date (YYYY-MM-DD) that the content was prepared for > delivery. I think you mean 'transfer' rather than 'delivery'...
Per reviewer comment: > The "fetch.txt" file allows a bag to be transmitted with "holes" in > it, which can be practical for several reasons. For example, it > obviates the need for the sender to stage a large serialized copy of > the content while the bag is transferred to the receiver. Also, this > method allows a sender to construct a bag from components that are > either a subset of logically related components (e.g., the localized > logical object could be much larger than what is intended for export) > or assembled from logically distributed sources (e.g., the object > components for export are not stored locally under one filesystem > tree). This paragraph would be a better introduction to the section. _Done._
Per reviewer comment: > Implementors of tools that complete bags by retrieving URLs listed in > a "fetch.txt" file need to be aware that some of those URLs may point > to hosts, intentionally or unintentionally, that are not under > control of the bag's sender. Checksums are intended as a reasonable > guarantee against corruption during transit, not a strong > cryptographic protection against intentional spoofing. Oh? _This wording was meant to apply to checksums as they are used in bags, as well as to address criticism that many legacy bags used easily broken MD5 checksums. That last sentence has now been reworded to: Moreover, older checksum algorithms, even if reasonable for detecting corruption during transit, may not offer strong cryptographic protection against intentional spoofing._
Per reviewer comment: > In all text tag files except for the bag declaration file, text MUST > be encoded in the character encoding specified in the "bagit.txt" bag be encoded in the character encoding -> use the character encoding _Done._
Per reviewer comment: > The size of files, as optionally reported in the "fetch.txt" file, > cannot be guaranteed to match the actual file size to be downloaded. > Implementors SHOULD take care to appropriately handle cases where the > actual file size does not match the file size reported in the > fetch.txt. Implementors SHOULD NOT use the file size in the > "fetch.txt" file for critical resource allocation, such as buffer > sizing or storage requisitioning. Absent specification of what "appropriately handle" means, this guidance lacks substance. _Reworded the second sentence to be: Implementers SHOULD take steps to monitor and abort transfer when the received file size exceeds the file size reported in the fetch file._
Update Justin's contact info
updated email address
Changed reference to character set registry.
Added clarification about malicious attackers.
This is a replacement for #17 reflecting the move from the old loc-rdc organization to the primary LibraryOfCongress. The primary notable change from #17 is restoring the
fetch.txt
section following discussion with @jkunze, @dbrunton, and @johnscancella.