Use rustc_layout_scalar_valid_range_end(usize::MAX - 1) for the index #72

jyn514 · 2020-06-14T14:14:43Z

The index is always less than the length. So even if the length is usize::MAX, the index will be at most MAX - 1 and so cannot overflow.

It would be great to tell rustc this is the case by using rustc_layout_scalar_valid_range_end(usize::MAX - 1). That would allow storing Option<index> in usize instead of needing an extra bit, which in some cases could double the size of the struct due to alignment requirements.

Failing that (since rustc_layout_scalar_valid_range_end is unstable and likely will never be stabilized), would it be possible to document that the index is always less than usize::MAX?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2020-06-14T14:20:52Z

I guess I'm not opposed to documenting this, although it kind of seems self evident to me? memchr returns the position of a particular byte in a slice, if it exists. It it doesn't exist, then None is returned. Since the largest possible size of a slice is usize::MAX and since memchr can never possibly return an index greater than or equal to the length of the given slice (since it would be an invalid index), it follows immediately that the value returned is guaranteed to be less than usize::MAX.

This commit primarily adds vectorized substring search routines in a new memmem sub-module. They were originally taken from bstr, but heavily modified to incorporate a variant of the "generic SIMD" algorithm[1]. The main highlights: * We guarantee `O(m + n)` time complexity and constant space complexity. * Two-Way is the primary implementation that can handle all cases. * Vectorized variants handle a number of common cases. * Vectorized code uses a heuristic informed by a frequency background distribution of bytes, originally devised inside the regex crate. This makes it more likely that searching will spend more time in the fast vector loops. While adding memmem to this crate is perhaps a bit of a scope increase, I think it fits well. It also puts a core primitive, substring search, very low in the dependency DAG and therefore making it widely available. For example, it is intended to use these new routines in the regex, aho-corasick and bstr crates. This commit does a number of other things, mainly as a result of convenience. It drastically improves test coverage for substring search (as compared to what bstr had), completely overhauls the benchmark suite to make it more comprehensive and adds `cargo fuzz` support for all API items in the crate. Closes #58, Closes #72 [1] - http://0x80.pl/articles/simd-strfind.html#algorithm-1-generic-simd

BurntSushi closed this as completed in 1233467 Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use rustc_layout_scalar_valid_range_end(usize::MAX - 1) for the index #72

Use rustc_layout_scalar_valid_range_end(usize::MAX - 1) for the index #72

jyn514 commented Jun 14, 2020

BurntSushi commented Jun 14, 2020

Use rustc_layout_scalar_valid_range_end(usize::MAX - 1) for the index #72

Use rustc_layout_scalar_valid_range_end(usize::MAX - 1) for the index #72

Comments

jyn514 commented Jun 14, 2020

BurntSushi commented Jun 14, 2020