Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend MultiGzDecoder over GzDecoder in docs #324

Merged
merged 8 commits into from
Jul 30, 2023

Conversation

jsha
Copy link
Contributor

@jsha jsha commented Oct 31, 2022

Part of #178.

I removed the text that says "the specification, however, allows ..." and the comment about bioinformatics because they make it sound like MultiGzDecoder is a rare thing that you should only enable if you need it, but it's actually the correct choice for almost all cases since it implements what the RFC considers a "gzip file."

src/gz/bufread.rs Outdated Show resolved Hide resolved
@jsha
Copy link
Contributor Author

jsha commented Jul 13, 2023

I see you've also made some tweaks to these docs in #347. Thanks! I went ahead and updated this PR, and I still think it's worthwhile.

In this PR I reference the concept of the "gzip file" instead of multistream, which is what the RFC talks about as the core concept (with "members" being a sub-concept).

Also, I removed the list of cases where gzip files containing multiple members are commonly used, since that seems to suggest that gzip files containing multiple members are only used in particular endeavors, when really they can be used any time you have a .gz file or a Content-Encoding: gzip HTTP response.

Similarly, I removed the comment "Use [MultiGzDecoder] if your file has multiple streams" because in many cases the user of this library doesn't know in advance if they will be processing a file with multiple streams or not; but it is always correct to use a MultiGzDecoder because it will handle a single-stream file just fine.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

I feel the narrative is now inverted, and the way it should be, by making MultiGzDecoder effectively the default. To me it feels like ultimately, MutliGzDecoder really wants to be the default also by name, as right now it still feels special in that regard. Of course, I am not proposing to make name changes to types in this PR, which already gets the documentation to a place where most people who want to use GzDecoder will end up using the MutiGzDecoder instead.

src/gz/bufread.rs Outdated Show resolved Hide resolved
@Byron Byron requested a review from JohnTitor July 14, 2023 15:18
src/gz/bufread.rs Outdated Show resolved Hide resolved
@Byron Byron self-assigned this Jul 17, 2023
@Byron
Copy link
Member

Byron commented Jul 17, 2023

I think it would be nice if @JohnTitor could have another look. Overall, I believe this PR would help to get a better understanding on the intended use of MultiGzDecoder, and I would definitely have benefitted from that while working with #348 . So I see a lot of value in getting it merged. Thank you.

@Byron Byron self-requested a review July 20, 2023 06:36
@joshtriplett
Copy link
Member

Whether the specification supports it or not, in many cases I think people will be surprised if they encounter a gzip stream with multiple files in it. I think that people typically think of gzip as a single-file format, and combine it with things like tar when storing multiple files.

So, I don't think we should go as far as saying that most people should use MultiGzDecoder. I think it's perfectly fine to adjust the documentation to say that it's allowed, and explain the distinction, but I think we should actually explain the distinction and the common usage rather than presenting MultiGzDecoder as the most common case.

src/gz/write.rs Outdated
/// bioinformatics, for example when using the BGZF compressed data.
/// A gzip file consists of a series of "members" concatenated one after another.
/// MultiGzDecoder decodes all members of a file, while [GzDecoder] will only decode
/// the first member. MultiGzDecoder is preferable in most cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the first member. MultiGzDecoder is preferable in most cases.
/// the first member. Many gzip files in the wild will typically have one member,
/// but not all. Use MultiGzDecoder if you need to handle multiple members, or
/// [GzDecoder] if you know you only need to handle a single member.

Copy link
Member

@Byron Byron Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have applied this change here and to the other 2 places where the same text appears.

Edit: I reverted the change as it is contended - din't read the messages below until now.

src/gz/write.rs Outdated
@@ -373,17 +373,16 @@ impl<W: Read + Write> Read for GzDecoder<W> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a full [gzip file].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A gzip streaming decoder that decodes a full [gzip file].
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.

src/gz/write.rs Outdated
Comment on lines 169 to 170
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
/// A decoder for a gzip file with a single member.

I'd agree that we shouldn't just present this as a decoder for gzip in general, and should capture the distinction in the summary, but I'd stop short of telling people to prefer MultiGzDecoder.

///
/// This structure exposes a [`Write`] interface that will emit uncompressed data
/// to the underlying writer `W`.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
Copy link
Member

@joshtriplett joshtriplett Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

src/gz/read.rs Outdated
/// for a chunk of input data.
/// A gzip file consists of a series of "members" concatenated one after another.
/// MultiGzDecoder decodes all members of a file, while [GzDecoder] will only decode
/// the first member. MultiGzDecoder is preferable in most cases.
Copy link
Member

@joshtriplett joshtriplett Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the first member. MultiGzDecoder is preferable in most cases.
/// the first member. Many gzip files in the wild will typically have one member,
/// but not all. Use MultiGzDecoder if you need to handle multiple members, or
/// [GzDecoder] if you know you only need to handle a single member.

/// for a chunk of input data.
/// A gzip file consists of a series of "members" concatenated one after another.
/// MultiGzDecoder decodes all members of a file, while [GzDecoder] will only decode
/// the first member. MultiGzDecoder is preferable in most cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// the first member. MultiGzDecoder is preferable in most cases.
/// the first member. Many gzip files in the wild will typically have one member,
/// but not all. Use MultiGzDecoder if you need to handle multiple members, or
/// [GzDecoder] if you know you only need to handle a single member.

Comment on lines 170 to 171
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
/// A decoder for a gzip file with a single member.

I'd agree that we shouldn't just present this as a decoder for gzip in general, and should capture the distinction in the summary, but I'd stop short of telling people to prefer MultiGzDecoder.

src/gz/read.rs Outdated
Comment on lines 93 to 94
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A decoder for a single member of a gzip file. Prefer [MultiGzDecoder] for
/// most uses.
/// A decoder for a gzip file with a single member.

I'd agree that we shouldn't just present this as a decoder for gzip in general, and should capture the distinction in the summary, but I'd stop short of telling people to prefer MultiGzDecoder.

@@ -397,20 +397,16 @@ impl<R: BufRead + Write> Write for GzDecoder<R> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a complete [gzip file].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A gzip streaming decoder that decodes a complete [gzip file].
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.

src/gz/read.rs Outdated
@@ -180,21 +178,16 @@ impl<R: Read + Write> Write for GzDecoder<R> {
}
}

/// A gzip streaming decoder that decodes all members of a multistream
/// A gzip streaming decoder that decodes a full [gzip file].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// A gzip streaming decoder that decodes a full [gzip file].
/// A gzip streaming decoder that decodes a [gzip file] with multiple members.

///
/// This structure exposes a [`Read`] interface that will consume compressed
/// data from the underlying reader and emit uncompressed data.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
/// [`Read`]: https://doc.rust-lang.org/std/io/trait.Read.html
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

///
/// This structure consumes a [`BufRead`] interface, reading compressed data
/// from the underlying reader, and emitting uncompressed data.
/// Use [`MultiGzDecoder`] if your file has multiple streams.
///
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
///
/// This decoder only handles gzipped data with a single stream.
/// Use [`MultiGzDecoder`] for gzipped data with multiple streams.
///

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the final text, it should stick to the RFC terminology of "member" rather than using "stream" in some places.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 21, 2023

I added some suggestions implementing my comment #324 (comment) .

I am not attached to exact wording here, feel free to wordsmith. Just going for the general spirit of documenting the situation without pushing people towards MultiGzDecoder.

@workingjubilee
Copy link
Member

Is there a disadvantage to using MultiGzDecoder when the stream only has a single file?

@workingjubilee
Copy link
Member

I agree with jsha: the evidence that Rust programs which do not adhere to this advisory will drop data on the floor is evidence enough that we should in fact recommend MultiGzDecoder is used. While not memory unsoundness, Rust programmers often are... irked when they find out an interface that they were using sends half or more of their data to /dev/null.

@jongiddy
Copy link
Contributor

There is a difference in behavior that should be noted. MultiGzDecoder behaves differently for a gzip member embedded in other data. If there is non-gzip data following the gzip data GzipDecoder succeeds, MultiGzDecoder fails.

As an example, for the write versions GzipDecoder returns the number of bytes consumed, leaving any remaining bytes for the caller to handle. MultiGzDecoder returns an Err since it expects more gzip data.

Correct behavior depends on knowledge of the data. Is it only gzip data and may have multiple members, or is it a stream of data elements that contains gzipped data? HTTP clients have a real problem here as some servers send multi-member gzip responses and some other servers send additional data after the gzip-encoded content. There is probably a server that does both.

That is why it is legitimate to have two types of decoder, and it is the caller who needs to choose which is most appropriate for their data.

@Byron
Copy link
Member

Byron commented Jul 21, 2023

After having read this I feel that instead of nudging towards a particular implementation, it would be preferable to teach about the concept of "members" on the front-page while providing some of the examples we have seen here to explain what this can mean for an application. From there, users should be able to make the choice for their particular application.

For this PR this could mean one more section in the lib.rs documentation, along with the explicit lack of preferences in the wording for the documentation of Gz and MultiGz variants of types.

If this sounds feasible, give this comment a thumbs up. Otherwise, you could thumbs down (just to have an easy count) along with suggestions on what to do instead. Thanks everyone!

@joshtriplett
Copy link
Member

joshtriplett commented Jul 21, 2023

(FWIW, I withdraw some of the github suggestions I left, in favor of what's summarized in this comment.)

@jongiddy wrote:

MultiGzDecoder behaves differently for a gzip member embedded in other data. If there is non-gzip data following the gzip data GzipDecoder succeeds, MultiGzDecoder fails.

That's one of the reasons, yeah. We should document that as a reason people should consider when choosing what to use.

Also, as @jsha noted, single-member decoding seems to be what many browsers and web clients do, and it'll be surprising if you get different results in different tools. Especially if the subsequent members are silently appended when they may be intended to be different "files".

It'd be worth coordinating with browser vendors to find out whether they have any practical considerations here (e.g. some hard-won knowledge about what happens in the wild). Are there some Firefox or Curl developers in the relevant area that we could poke?

@jsha wrote:

If we were designing from scratch, I would argue that the library shouldn't have a single-stream decoder that wraps Read / BufRead, because of the likelihood of silently dropping data. Instead, to support use cases that want to know about stream boundaries, there should be a secondary API that yields an Iterator<Item = GzipMember>,

Yeah, I'd agree with this. We could still add that API, which seems much more reasonable than MultiGzDecoder since it doesn't silently traverse the boundaries between files. And that API seems more reasonable to steer people towards, because it doesn't make the same tradeoffs of not being able to continue reading. In particular, if we're careful, you could iterate over the first member and then read the underlying stream for non-gzip data (without any issue at the boundary). (The iterator type could have a "give me the rest of the data as a Read/BufRead instance" method that consumes the iterator.) And conversely, we could have a method on the iterator that consumes the iterator and gives you a single stream but errors if there's more data afterwards, so that when people aren't expecting appended data they can check that assumption.

Summarizing:

  • Could we check with Firefox and/or Curl developers to find out if there's practical knowledge here that we're lacking? Or, are there open bugs, where knowledge like "this is a bug but here's why we can't fix it" or "this is a bug and we should actually fix it but we haven't yet" might live? If they have specific hard-won considerations about data in the wild, I'd be happy to defer to those. But the fact that multiple pieces of software out there don't decode multiple streams by default makes me hesitant to diverge.
  • In the meantime, it seems entirely reasonable for the documentation to document, at length, the considerations that should go into picking between the options we have. Perhaps we could explain this in detail at the top level documentation of the crate, and then link that from all the various decoders to avoid duplication? This includes explaining the single-stream/multi-stream issue, the tradeoffs about being able to have non-gzip data after one gzip stream, the fact that the single-stream decoder will ignore a subsequent gzip stream, the fact that many pieces of software don't seem to decode multiple streams...
  • We might be able to have a better API that better encourages people to explicitly handle this consideration. I don't think we should block merging better documentation until we have a better API, but I do think that having a better API would make the issue visible in the type system so that people need to handle it.

@workingjubilee
Copy link
Member

Hearing there are common use-cases that combine a gz-encoded stream with something else does seem worth thinking about. It feels like people should be introduced to the idea that if they are handling a simple "gz file" that they probably should just handle it with MultiGzDecoder but that if they are handling a more complex encoding they might need to do something more complex and that is where GzDecoder comes into play. Ironically, it is the "simpler" version that is used for the "complex" cases, which violates many intuitions.

@jongiddy
Copy link
Contributor

Exactly, and it is also the case that multi-member gzip files are uncommon (outside of Bioinformatics) so GzDecoder works for the common use case most of the time, which makes it unexpected when it doesn't.

I wonder whether it is worth introducing new types GzipDecoder and GzipMemberDecoder that alias MultiGzDecoder and GzDecoder and then deprecate the Gz types? On one hand it fixes the simple vs complex problem, on the other it might just add more confusion.

@workingjubilee
Copy link
Member

Maybe in the future? It seems like it'd just add more confusion right now. Improving docs and adding the hypothetical iterator seems like the "way to go" in terms of cleaning up the API, probably.

@jongiddy
Copy link
Contributor

The iterator idea is a cool abstraction of the problem but not really needed by users of this crate. Almost no-one cares about the individual members, they just want all the uncompressed data from the gzip data whether or not it is multi-member and whether or not it is embedded in other data. The one exception of which I am aware is the BGZF format, where an index is used to seek to the correct gzip member and then uncompress just that one member. Neither benefits from the iterator abstraction.

@workingjubilee
Copy link
Member

I don't disagree that most users will basically go .iter().collect::<String>() or whatever, but it was also feature requested in #192 for "My data is HUGE! Please let me lazily iterate it!" use-cases.

@the8472
Copy link
Member

the8472 commented Jul 22, 2023

The iterator idea is a cool abstraction of the problem but not really needed by users of this crate. Almost no-one cares about the individual members, they just want all the uncompressed data from the gzip data whether or not it is multi-member and whether or not it is embedded in other data.

As a low-level crate it still makes sense to expose the basic building blocks for those who need it and then provide an abstraction for the most common use-cases.

@workingjubilee
Copy link
Member

along with the explicit lack of preferences in the wording for the documentation of Gz and MultiGz variants of types.

If this sounds feasible, give this comment a thumbs up. Otherwise, you could thumbs down (just to have an easy count) along with suggestions on what to do instead. Thanks everyone!

Altogether, for this PR in particular, I'm in favor of this, modulo "we shouldn't be shy about saying that MultiGzDecoder is the simpler option for a 'basic' file". 👍

Could we check with Firefox and/or Curl developers to find out if there's practical knowledge here that we're lacking? Or, are there open bugs, where knowledge like "this is a bug but here's why we can't fix it" or "this is a bug and we should actually fix it but we haven't yet" might live? If they have specific hard-won considerations about data in the wild, I'd be happy to defer to those. But the fact that multiple pieces of software out there don't decode multiple streams by default makes me hesitant to diverge.

Let's make this a separate issue, but I suspect this might be a case of "EXTREMELY pedantic adherence to a specific RFC's spec in a way that is unnecessarily underachieving".

@workingjubilee
Copy link
Member

Also see:

This issue is an extremely common papercut.

Byron and others added 2 commits July 23, 2023 15:21
I also added a reference to the general section about the differences
in the crate documentation.

Co-Authored-By: Josh Triplett <josh@joshtriplett.org>
@Byron
Copy link
Member

Byron commented Jul 23, 2023

I have added a new top-level section to explain the intricacies of the gzip file format. Further I did my best to integrate all of Josh's suggestions, while also not applying a new paragraph if it appeared to say the same thing as a paragraph above it.

I particularly invite for a thorough review of the new top-level section.

Copy link
Contributor Author

@jsha jsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great. Thanks for all the thoughtful conversation and feedback, all. One of the things it's clarified is that the main thing that winds up being risky / surprising is that the single-member variants drop data on the floor; so we should explicitly document that fact on those variants. I'll write up another commit with that change plus some minor tweaks that I just commented, and push that shortly.

src/gz/write.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
- Use relative paths to link to the introduction.
- Use consistent language across {Read,BufRead,Write}{Gz,MultiGz}Decoder.
- Use `member` rather than `stream`.
- Document what happens to unused data for `Gz` variants.
@jsha
Copy link
Contributor Author

jsha commented Jul 25, 2023

Okay, pushed a batch of changes. Sorry they wound up bigger than I intended; I was hoping to converge piecemeal, but wound up wanting to try and get all six places using the same language.

Could we check with Firefox and/or Curl developers to find out if there's practical knowledge here that we're lacking?

This is a good idea. I'm not likely to get it done soon, but I think it would be really informative and interesting.

To make sure I got it all right, copy-pasting from the generated docs for {Read,BufRead,Write}{Gz,MultiGz}Decoder:

bufread::GzDecoder

A decoder for the first member of a gzip file.

This structure exposes a BufRead interface, reading compressed data from the underlying reader, and emitting uncompressed data.

After reading the first member of a gzip file (which is often, but not always, the only member), this reader will return Ok(0) even if there are more bytes available in the underlying reader. If you want to be sure not to drop bytes on the floor, call into_inner() after Ok(0) to recover the underlying reader.

To handle gzip files that may have multiple members, see MultiGzDecoder or read more in the introduction.

read::GzDecoder

A decoder for the first member of a gzip file.

This structure exposes a Read interface that will consume compressed data from the underlying reader and emit uncompressed data.

After reading the first member of a gzip file (which is often, but not always, the only member), this reader will return Ok(0) even if there are more bytes available in the underlying reader. If you want to be sure not to drop bytes on the floor, call into_inner() after Ok(0) to recover the underlying reader.

To handle gzip files that may have multiple members, see MultiGzDecoder or read more in the introduction.

write::GzDecoder

A decoder for the first member of a gzip file.

This structure exposes a Write interface, receiving compressed data and writing uncompressed data to the underlying writer.

After decoding the first member of a gzip file, this writer will return XXX to all subsequent writes.

To handle gzip files that may have multiple members, see MultiGzDecoder or read more in the introduction.

bufread::MultiGzDecoder

A gzip streaming decoder that decodes a gzip file that may have multiple members.

This structure exposes a BufRead interface that will consume compressed data from the underlying reader and emit uncompressed data.

A gzip file consists of a series of members concatenated one after another. MultiGzDecoder decodes all members of a file and returns Ok(0) once the underlying reader does.

To handle members seperately, see GzDecoder or read more in the introduction.

read::MultiGzDecoder

A gzip streaming decoder that decodes a gzip file that may have multiple members.

This structure exposes a Read interface that will consume compressed data from the underlying reader and emit uncompressed data.

A gzip file consists of a series of members concatenated one after another. MultiGzDecoder decodes all members of a file and returns Ok(0) once the underlying reader does.

To handle members seperately, see GzDecoder or read more in the introduction.

write::MultiGzDecoder

A gzip streaming decoder that decodes a gzip file with multiple members.

This structure exposes a Write interface that will consume compressed data and write uncompressed data to the underlying writer.

A gzip file consists of a series of members concatenated one after another. MultiGzDecoder decodes all members of a file and writes them to the underlying writer one after another.

To handle members separately, see GzDecoder or read more in the introduction.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me, thanks everyone for their help and contributions!

I'd be fine merging this PR and probably will do so if there was silence here for a week or so, or nobody does it before me 😁.

src/gz/bufread.rs Outdated Show resolved Hide resolved
src/gz/bufread.rs Outdated Show resolved Hide resolved
src/gz/bufread.rs Outdated Show resolved Hide resolved
@@ -90,13 +90,22 @@ impl<R: Read + Write> Write for GzEncoder<R> {
}
}

/// A gzip streaming decoder
/// A decoder for the first member of a [gzip file].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same changes as for bufread

src/gz/write.rs Outdated Show resolved Hide resolved
src/gz/write.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated
Comment on lines 80 to 85
//! `gunzip`, and `zcat` command line tools.
//!
//! Chrome and Firefox appear to implement behavior like `GzDecoder`, ignoring data
//! after the first member. `curl` appears to implement behavior somewhat like
//! `GzDecoder`, only decoding the first member, but emitting an error if there is
//! data after the first member, whether or not it is gzip data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting for determining common practice, but the documentation should describe flate2 behavior, not that of other tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and will remove the last paragraph.

@jongiddy
Copy link
Contributor

We can wordsmith endlessly, so these are just suggestions that would clarify things for me. The only one that definitely needs to be changed is the instance that currently says writer will return XXX.

src/gz/read.rs Outdated Show resolved Hide resolved
@Byron
Copy link
Member

Byron commented Jul 30, 2023

Thanks a lot, @jongiddy , for pushing this PR over the finishing line.

I have applied all of your suggestions and followed up on all comments as well as I could. Given the massive size of this PR, working with it doesn't get easier and I would like to make the cut here by merging what we have, and invite those interested to double-check what's there to possibly submit follow-up PRs for fixes.

I for one learned a lot here and believe this PR represents a substantial improvement to the docs that will also help users to avoid these member-releated pitfalls (along with pitfalls related to GzDecoder reading a little bit ahead of the underlying stream so one wants to use the BufRead implementation to see these nonetheless).

@Byron Byron merged commit 956397a into rust-lang:main Jul 30, 2023
10 checks passed
bors added a commit to rust-lang/cargo that referenced this pull request Sep 1, 2023
chore(deps): update compatible

[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [anyhow](https://togithub.com/dtolnay/anyhow) | workspace.dependencies | patch | `1.0.72` -> `1.0.75` |
| [base64](https://togithub.com/marshallpierce/rust-base64) | workspace.dependencies | patch | `0.21.2` -> `0.21.3` |
| [bytesize](https://togithub.com/hyunsik/bytesize) | workspace.dependencies | minor | `1.2` -> `1.3` |
| [clap](https://togithub.com/clap-rs/clap) | workspace.dependencies | minor | `4.3.23` -> `4.4.2` |
| [filetime](https://togithub.com/alexcrichton/filetime) | workspace.dependencies | patch | `0.2.21` -> `0.2.22` |
| [flate2](https://togithub.com/rust-lang/flate2-rs) | workspace.dependencies | patch | `1.0.26` -> `1.0.27` |
| [memchr](https://togithub.com/BurntSushi/memchr) | workspace.dependencies | minor | `2.5.0` -> `2.6.2` |
| [openssl](https://togithub.com/sfackler/rust-openssl) | workspace.dependencies | patch | `0.10.55` -> `0.10.57` |
| [serde-untagged](https://togithub.com/dtolnay/serde-untagged) | workspace.dependencies | patch | `0.1.0` -> `0.1.1` |
| [serde_json](https://togithub.com/serde-rs/json) | workspace.dependencies | patch | `1.0.104` -> `1.0.105` |
| [snapbox](https://togithub.com/assert-rs/trycmd/tree/main/crates/snapbox) ([source](https://togithub.com/assert-rs/trycmd)) | workspace.dependencies | patch | `0.4.11` -> `0.4.12` |
| [syn](https://togithub.com/dtolnay/syn) | workspace.dependencies | patch | `2.0.28` -> `2.0.29` |
| [tar](https://togithub.com/alexcrichton/tar-rs) | workspace.dependencies | patch | `0.4.39` -> `0.4.40` |
| [tempfile](https://stebalien.com/projects/tempfile-rs/) ([source](https://togithub.com/Stebalien/tempfile)) | workspace.dependencies | minor | `3.7.0` -> `3.8.0` |
| [thiserror](https://togithub.com/dtolnay/thiserror) | workspace.dependencies | patch | `1.0.44` -> `1.0.47` |
| [unicase](https://togithub.com/seanmonstar/unicase) | workspace.dependencies | minor | `2.6.0` -> `2.7.0` |
| [url](https://togithub.com/servo/rust-url) | workspace.dependencies | patch | `2.4.0` -> `2.4.1` |

---

### Release Notes

<details>
<summary>dtolnay/anyhow (anyhow)</summary>

### [`v1.0.75`](https://togithub.com/dtolnay/anyhow/releases/tag/1.0.75)

[Compare Source](https://togithub.com/dtolnay/anyhow/compare/1.0.74...1.0.75)

-   Partially work around rust-analyzer bug ([rust-lang/rust-analyzer#9911)

### [`v1.0.74`](https://togithub.com/dtolnay/anyhow/releases/tag/1.0.74)

[Compare Source](https://togithub.com/dtolnay/anyhow/compare/1.0.73...1.0.74)

-   Add bootstrap workaround to allow rustc to depend on anyhow ([#&#8203;320](https://togithub.com/dtolnay/anyhow/issues/320), thanks [`@&#8203;RalfJung](https://togithub.com/RalfJung))`

### [`v1.0.73`](https://togithub.com/dtolnay/anyhow/releases/tag/1.0.73)

[Compare Source](https://togithub.com/dtolnay/anyhow/compare/1.0.72...1.0.73)

-   Update backtrace support to nightly's new Error::provide API ([rust-lang/rust#113464, [#&#8203;319](https://togithub.com/dtolnay/anyhow/issues/319))

</details>

<details>
<summary>marshallpierce/rust-base64 (base64)</summary>

### [`v0.21.3`](https://togithub.com/marshallpierce/rust-base64/blob/HEAD/RELEASE-NOTES.md#0213)

[Compare Source](https://togithub.com/marshallpierce/rust-base64/compare/v0.21.2...v0.21.3)

-   Implement `source` instead of `cause` on Error types
-   Roll back MSRV to 1.48.0 so Debian can continue to live in a time warp
-   Slightly faster chunked encoding for short inputs
-   Decrease binary size

</details>

<details>
<summary>hyunsik/bytesize (bytesize)</summary>

### [`v1.3.0`](https://togithub.com/hyunsik/bytesize/releases/tag/v1.3.0): Release 1.3.0

[Compare Source](https://togithub.com/hyunsik/bytesize/compare/v1.2.0...v1.3.0)

#### Changes

-   Improved performance by eliminating String creation by utilizing the original \&str slice [#&#8203;31](https://togithub.com/hyunsik/bytesize/issues/31) ([`@&#8203;ChanTsune](https://togithub.com/ChanTsune))`

</details>

<details>
<summary>clap-rs/clap (clap)</summary>

### [`v4.4.2`](https://togithub.com/clap-rs/clap/blob/HEAD/CHANGELOG.md#442---2023-08-31)

[Compare Source](https://togithub.com/clap-rs/clap/compare/v4.4.1...v4.4.2)

##### Performance

-   Improve build times by removing `once_cell` dependency

### [`v4.4.1`](https://togithub.com/clap-rs/clap/blob/HEAD/CHANGELOG.md#441---2023-08-28)

[Compare Source](https://togithub.com/clap-rs/clap/compare/v4.4.0...v4.4.1)

##### Features

-   Stabilize `Command::styles`

### [`v4.4.0`](https://togithub.com/clap-rs/clap/blob/HEAD/CHANGELOG.md#440---2023-08-24)

[Compare Source](https://togithub.com/clap-rs/clap/compare/v4.3.24...v4.4.0)

##### Compatibility

-   Update MSRV to 1.70.0

### [`v4.3.24`](https://togithub.com/clap-rs/clap/blob/HEAD/CHANGELOG.md#4324---2023-08-23)

[Compare Source](https://togithub.com/clap-rs/clap/compare/v4.3.23...v4.3.24)

##### Fixes

-   Ensure column padding is preserved in `--help` with custom templates

</details>

<details>
<summary>alexcrichton/filetime (filetime)</summary>

### [`v0.2.22`](https://togithub.com/alexcrichton/filetime/compare/0.2.21...0.2.22)

[Compare Source](https://togithub.com/alexcrichton/filetime/compare/0.2.21...0.2.22)

</details>

<details>
<summary>rust-lang/flate2-rs (flate2)</summary>

### [`v1.0.27`](https://togithub.com/rust-lang/flate2-rs/releases/tag/1.0.27)

[Compare Source](https://togithub.com/rust-lang/flate2-rs/compare/1.0.26...1.0.27)

#### What's Changed

-   Move GzHeader into GzState by [`@&#8203;jongiddy](https://togithub.com/jongiddy)` in [rust-lang/flate2-rs#344
-   Move blocked_partial_header_read test to read module by [`@&#8203;jongiddy](https://togithub.com/jongiddy)` in [rust-lang/flate2-rs#345
-   Move gzip header parsing out of bufread module by [`@&#8203;jongiddy](https://togithub.com/jongiddy)` in [rust-lang/flate2-rs#346
-   Fix a comment on the `Compression` struct by [`@&#8203;JohnTitor](https://togithub.com/JohnTitor)` in [rust-lang/flate2-rs#351
-   Add notes about multiple streams to `GzDecoder` by [`@&#8203;JohnTitor](https://togithub.com/JohnTitor)` in [rust-lang/flate2-rs#347
-   better error message when compiling with `--no-default-features` or `default-features = false` by [`@&#8203;Byron](https://togithub.com/Byron)` in [rust-lang/flate2-rs#360
-   Fix Read encoder examples by [`@&#8203;markgoddard](https://togithub.com/markgoddard)` in [rust-lang/flate2-rs#356
-   Add CIFuzz Github action by [`@&#8203;DavidKorczynski](https://togithub.com/DavidKorczynski)` in [rust-lang/flate2-rs#326
-   Fix GzDecoder Write partial filenames and comments by [`@&#8203;jongiddy](https://togithub.com/jongiddy)` in [rust-lang/flate2-rs#323
-   Fix header CRC calculation of trailing zeros by [`@&#8203;jongiddy](https://togithub.com/jongiddy)` in [rust-lang/flate2-rs#363
-   Fix broken link on README.md by [`@&#8203;wcampbell0x2a](https://togithub.com/wcampbell0x2a)` in [rust-lang/flate2-rs#366
-   Recommend MultiGzDecoder over GzDecoder in docs by [`@&#8203;jsha](https://togithub.com/jsha)` in [rust-lang/flate2-rs#324
-   Add functionality for custom (de)compress instances by [`@&#8203;PierreV23](https://togithub.com/PierreV23)` in [rust-lang/flate2-rs#361
-   Add maintenance document by [`@&#8203;Byron](https://togithub.com/Byron)` in [rust-lang/flate2-rs#362
-   Document that `read::GzDecoder` consumes bytes after end of gzip by [`@&#8203;jongiddy](https://togithub.com/jongiddy)` in [rust-lang/flate2-rs#367
-   prepare 1.0.27 release by [`@&#8203;Byron](https://togithub.com/Byron)` in [rust-lang/flate2-rs#369

#### New Contributors

-   [`@&#8203;Byron](https://togithub.com/Byron)` made their first contribution in [rust-lang/flate2-rs#360
-   [`@&#8203;markgoddard](https://togithub.com/markgoddard)` made their first contribution in [rust-lang/flate2-rs#356
-   [`@&#8203;jsha](https://togithub.com/jsha)` made their first contribution in [rust-lang/flate2-rs#324
-   [`@&#8203;PierreV23](https://togithub.com/PierreV23)` made their first contribution in [rust-lang/flate2-rs#361

**Full Changelog**: rust-lang/flate2-rs@1.0.26...1.0.27

</details>

<details>
<summary>BurntSushi/memchr (memchr)</summary>

### [`v2.6.2`](https://togithub.com/BurntSushi/memchr/compare/2.6.1...2.6.2)

[Compare Source](https://togithub.com/BurntSushi/memchr/compare/2.6.1...2.6.2)

### [`v2.6.1`](https://togithub.com/BurntSushi/memchr/compare/2.6.0...2.6.1)

[Compare Source](https://togithub.com/BurntSushi/memchr/compare/2.6.0...2.6.1)

### [`v2.6.0`](https://togithub.com/BurntSushi/memchr/compare/2.5.0...2.6.0)

[Compare Source](https://togithub.com/BurntSushi/memchr/compare/2.5.0...2.6.0)

</details>

<details>
<summary>sfackler/rust-openssl (openssl)</summary>

### [`v0.10.57`](https://togithub.com/sfackler/rust-openssl/releases/tag/openssl-v0.10.57)

[Compare Source](https://togithub.com/sfackler/rust-openssl/compare/openssl-v0.10.56...openssl-v0.10.57)

#### What's Changed

-   Expose chacha20\_poly1305 on LibreSSL by [`@&#8203;alex](https://togithub.com/alex)` in [sfackler/rust-openssl#2011
-   Add openssl::cipher_ctx::CipherCtx::clone by [`@&#8203;johntyner](https://togithub.com/johntyner)` in [sfackler/rust-openssl#2017
-   Add X509VerifyParam::set_email by [`@&#8203;dhouck](https://togithub.com/dhouck)` in [sfackler/rust-openssl#2018
-   Add perl-FindBin dep for fedora by [`@&#8203;JadedBlueEyes](https://togithub.com/JadedBlueEyes)` in [sfackler/rust-openssl#2023
-   Update to bitflags 2.2.1. by [`@&#8203;qwandor](https://togithub.com/qwandor)` in [sfackler/rust-openssl#1906
-   Release openssl v0.10.57 and openssl-sys v0.9.92 by [`@&#8203;alex](https://togithub.com/alex)` in [sfackler/rust-openssl#2025

#### New Contributors

-   [`@&#8203;johntyner](https://togithub.com/johntyner)` made their first contribution in [sfackler/rust-openssl#2017
-   [`@&#8203;dhouck](https://togithub.com/dhouck)` made their first contribution in [sfackler/rust-openssl#2018
-   [`@&#8203;JadedBlueEyes](https://togithub.com/JadedBlueEyes)` made their first contribution in [sfackler/rust-openssl#2023
-   [`@&#8203;qwandor](https://togithub.com/qwandor)` made their first contribution in [sfackler/rust-openssl#1906

**Full Changelog**: sfackler/rust-openssl@openssl-v0.10.56...openssl-v0.10.57

### [`v0.10.56`](https://togithub.com/sfackler/rust-openssl/releases/tag/openssl-v0.10.56): openssl v0.10.56

[Compare Source](https://togithub.com/sfackler/rust-openssl/compare/openssl-v0.10.55...openssl-v0.10.56)

</details>

<details>
<summary>dtolnay/serde-untagged (serde-untagged)</summary>

### [`v0.1.1`](https://togithub.com/dtolnay/serde-untagged/compare/0.1.0...0.1.1)

[Compare Source](https://togithub.com/dtolnay/serde-untagged/compare/0.1.0...0.1.1)

</details>

<details>
<summary>serde-rs/json (serde_json)</summary>

### [`v1.0.105`](https://togithub.com/serde-rs/json/releases/tag/v1.0.105)

[Compare Source](https://togithub.com/serde-rs/json/compare/v1.0.104...v1.0.105)

-   Support bool in map keys ([#&#8203;1054](https://togithub.com/serde-rs/json/issues/1054))

</details>

<details>
<summary>assert-rs/trycmd (snapbox)</summary>

### [`v0.4.12`](https://togithub.com/assert-rs/trycmd/compare/snapbox-v0.4.11...snapbox-v0.4.12)

[Compare Source](https://togithub.com/assert-rs/trycmd/compare/snapbox-v0.4.11...snapbox-v0.4.12)

</details>

<details>
<summary>dtolnay/syn (syn)</summary>

### [`v2.0.29`](https://togithub.com/dtolnay/syn/releases/tag/2.0.29)

[Compare Source](https://togithub.com/dtolnay/syn/compare/2.0.28...2.0.29)

-   Partially work around rust-analyzer bug ([rust-lang/rust-analyzer#9911)

</details>

<details>
<summary>alexcrichton/tar-rs (tar)</summary>

### [`v0.4.40`](https://togithub.com/alexcrichton/tar-rs/compare/0.4.39...0.4.40)

[Compare Source](https://togithub.com/alexcrichton/tar-rs/compare/0.4.39...0.4.40)

</details>

<details>
<summary>Stebalien/tempfile (tempfile)</summary>

### [`v3.8.0`](https://togithub.com/Stebalien/tempfile/blob/HEAD/CHANGELOG.md#380)

[Compare Source](https://togithub.com/Stebalien/tempfile/compare/v3.7.1...v3.8.0)

-   Added `with_prefix` and `with_prefix_in` to `TempDir` and `NamedTempFile` to make it easier to create temporary files/directories with nice prefixes.
-   Misc cleanups.

### [`v3.7.1`](https://togithub.com/Stebalien/tempfile/blob/HEAD/CHANGELOG.md#371)

[Compare Source](https://togithub.com/Stebalien/tempfile/compare/v3.7.0...v3.7.1)

-   Tempfile builds on haiku again.
-   Under the hood, we've switched from the unlinkat/linkat syscalls to the regular unlink/link syscalls where possible.

</details>

<details>
<summary>dtolnay/thiserror (thiserror)</summary>

### [`v1.0.47`](https://togithub.com/dtolnay/thiserror/releases/tag/1.0.47)

[Compare Source](https://togithub.com/dtolnay/thiserror/compare/1.0.46...1.0.47)

-   Work around rust-analyzer bug ([rust-lang/rust-analyzer#9911)

### [`v1.0.46`](https://togithub.com/dtolnay/thiserror/releases/tag/1.0.46)

[Compare Source](https://togithub.com/dtolnay/thiserror/compare/1.0.45...1.0.46)

-   Add bootstrap workaround to allow rustc to depend on thiserror ([#&#8203;248](https://togithub.com/dtolnay/thiserror/issues/248), thanks [`@&#8203;RalfJung](https://togithub.com/RalfJung))`

### [`v1.0.45`](https://togithub.com/dtolnay/thiserror/releases/tag/1.0.45)

[Compare Source](https://togithub.com/dtolnay/thiserror/compare/1.0.44...1.0.45)

-   Update backtrace support to nightly's new Error::provide API ([rust-lang/rust#113464, [#&#8203;246](https://togithub.com/dtolnay/thiserror/issues/246))

</details>

<details>
<summary>seanmonstar/unicase (unicase)</summary>

### [`v2.7.0`](https://togithub.com/seanmonstar/unicase/releases/tag/v2.7.0)

[Compare Source](https://togithub.com/seanmonstar/unicase/compare/v2.6.0...v2.7.0)

#### What's Changed

-   Update to Unicode 15.0.0 by [`@&#8203;seanmonstar](https://togithub.com/seanmonstar)` in [seanmonstar/unicase#59

</details>

<details>
<summary>servo/rust-url (url)</summary>

### [`v2.4.1`](https://togithub.com/servo/rust-url/releases/tag/v2.4.1)

[Compare Source](https://togithub.com/servo/rust-url/compare/v2.4.0...v2.4.1)

##### What's Changed

-   Move debugger_visualizer tests to separate crate by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#853
-   Remove obsolete badge references by [`@&#8203;atouchet](https://togithub.com/atouchet)` in [servo/rust-url#852
-   Fix trailing spaces in scheme / pathname / search setters by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#848
-   fix: implement std::error::Error for data-url by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#698
-   Enable the GitHub merge queue by [`@&#8203;mrobinson](https://togithub.com/mrobinson)` in [servo/rust-url#851
-   Rewrite WPT runner by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#857
-   Implement std::error::Error for InvalidBase64 by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#856
-   Add `--generate-link-to-definition` option when building on docs.rs by [`@&#8203;GuillaumeGomez](https://togithub.com/GuillaumeGomez)` in [servo/rust-url#858
-   Stabilize debugger_visualizer feature by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#855
-   Update WPT data and expectations by [`@&#8203;lucacasonato](https://togithub.com/lucacasonato)` in [servo/rust-url#859
-   Fix no_std Support for idna by [`@&#8203;domenukk](https://togithub.com/domenukk)` in [servo/rust-url#843
-   Fix panic in set_path for file URLs  by [`@&#8203;valenting](https://togithub.com/valenting)` in [servo/rust-url#865

##### New Contributors

-   [`@&#8203;mrobinson](https://togithub.com/mrobinson)` made their first contribution in [servo/rust-url#851
-   [`@&#8203;GuillaumeGomez](https://togithub.com/GuillaumeGomez)` made their first contribution in [servo/rust-url#858
-   [`@&#8203;domenukk](https://togithub.com/domenukk)` made their first contribution in [servo/rust-url#843

**Full Changelog**: servo/rust-url@v2.4.0...v2.4.1

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 5am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://togithub.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/rust-lang/cargo).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi42OC4xIiwidXBkYXRlZEluVmVyIjoiMzYuNjguMSIsInRhcmdldEJyYW5jaCI6Im1hc3RlciJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants