-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider runtime CPU feature detection #514
Comments
I'm not thrilled about introducing proc-macro generated unsafe code. At the moment we've reduced the (non-std) transitive uses of unsafe to only alder32 and CRC computations, and the former at least may plausibly by replaced by |
I think multiversioning is low-risk as far as unsafe code goes because unlike checksum computation it never operates on attacker-controlled values. |
Assuming the performance is compelling enough, I think multiversion is fine as an optional disabled-by-default feature. I have no reason to believe multiversion has any soundness issues, but to explain a bit why I feel slightly uncomfortable with it: It doesn't operate on attacker-controlled program inputs which certainly reduces risk, but IMO generating the unsafe blocks via proc macros adds its own risks. Or perhaps increases the difficulty of auditing for the remaining risks. Like consider this unsafe block. You can't simply look at the immediate surrounding context to figure out if it is OK, but rather need to have a broad understanding of the rest of the crate to convince yourself. Another approach is inspecting the output after macro expansion, but that doesn't tell you whether seemingly innocuous changes to the macro expansion site could result in different and unsound code being produced. |
I don't have a strong opinion.
(*) I understand that auditability is not a black/white property of a crate, but it really helps when:
And the example linked above by @fintelia above illustrates how an audit may be tricky. |
I understand the concern about auditability, but I don't really see optional dependencies as a particularly good way of addressing it. We do include the standard library. Consequently it's clear that the presence of Runtime feature detection as in As a counter-example, I'd very much like to avoid dependencies such as I'm not in favor of optional dependencies as a solution. It's a weak mitigation in itself against the purely potential threat of Alternative to consider:A team of |
Google already publishes their security team's audits through |
Okay, so I've experimented with this a bit in #515. This is what the trade-offs look like: In decoding, multiversioning gets up to 4% better end-to-end performance on large images that use Paeth filter, on some CPUs. On CPUs that do not benefit, there is a potential for a ~5% end-to-end improvement from multiversioning something other than the filters (source). I'd say we shouldn't be pursuing multiversioning in decoding while other opportunities for performance improvements are available, such as image-rs/fdeflate#31. The development effort is better spent there, not on multiversioning. There are promising results for encoding, but not in multiversioning filters. This might be worth following up on, but AFAIK Chromium doesn't use this crate for encoding so this will not be relevant to Skia/Chromium use cases. |
I've tried compiling the entire decode+encode roundtrip with Baseline: https://share.firefox.dev/4dzkdW8 Code used for measurementsuse std::{fs::File, io::BufWriter};
fn main() {
// loop 10 times to get lots of samples and a more clear profile
for _i in 0..10 {
let filename = std::env::args_os().nth(1).expect("No input file provided");
let mut decoder = png::Decoder::new(File::open(filename).unwrap());
// Set the decoder to expand palletted images into RGBA instead of simply outputting palette indices
decoder.set_transformations(png::Transformations::EXPAND);
let mut reader = decoder.read_info().unwrap();
// Allocate the output buffer.
let mut buf = vec![0; reader.output_buffer_size()];
// Read the next frame. An APNG might contain multiple frames.
let info = reader.next_frame(&mut buf).unwrap();
let mut out_info = png::Info::default();
out_info.width = info.width;
out_info.height = info.height;
out_info.color_type = info.color_type;
out_info.bit_depth = info.bit_depth;
let out_file = File::create("/tmp/output.png").unwrap();
let ref mut w = BufWriter::new(out_file);
let mut encoder = png::Encoder::with_info(w, out_info).unwrap();
encoder.set_compression(png::Compression::Fast);
encoder.set_adaptive_filter(png::AdaptiveFilterType::Adaptive);
let mut writer = encoder.write_header().unwrap();
writer.write_image_data(&buf).unwrap();
}
} So it seems we won't be able to get these gains via multiversioning, or at least with anything short of Therefore, given the concerns about the unsafe code generated by proc macros, I don't think there is a sufficiently strong performance benefit to justify multiversioning at this time. |
I've tried the
multiversion
crate on top of #513 just to see what would happen.In unfiltering, only Paeth benefits. On stable the only important change is bpp=4 improving by 15%, while 3 and 1 are unchanged. Also 2 regresses while 6 and 8 improve, but those are for 16-bit images only so they don't matter as much.
Explicit SIMD benefits a bit more, with +7% on 3bpp and +17% on 4, 6 and 8. Same regression on 2, but 2 is rare, so who cares.
There is no benefit to using AVX or AVX2 - SSE4.1 is sufficient to get optimal performance in unfiltering. This is not surprising since we literally never use 256-bit vectors in unfiltering. AVX might still be of use in filtering because it does operate on 256-bit wide chunks, but we don't have any benchmarks for those at present, so I haven't tried that yet.
Full unfiltering benchmark results on #513, stable
Full unfiltering benchmark results on #512, nightly
The drawbacks of multiversioning are slightly increased binary size and some trivial unsafe code in a dependency, but we get to keep our
#![forbid(unsafe_code)]
.I'm considering adding an optional feature for multiversioning.
@anforowicz would this be of interest to Chromium? Right now Chromium only mandates SSE3, while we get improved performance from SSE 4.1, so it should be beneficial. Are these gains meaningful? Would this kind of transitive
unsafe
be trivial enough to enable?The text was updated successfully, but these errors were encountered: