- Feature Name:
const_char_encode_utf8
- Start Date: 2024-09-17
- RFC PR: rust-lang/rfcs#3696
- Rust Issue: rust-lang/rust#130512
char::encode_utf8
should be marked const to allow for compile-time conversions.
Considering mutable references now being stable in const environments, this implementation would be trivial even without compiler magic.
The encode_utf8
method (in char
) is currently not marked as "const" and is therefore rendered unusable in scenarios that require const-compatibility.
With the recent stabilisation of const_mut_refs
, implementing encode_utf8
with the current signature is trivial and would (in practice) yield no incompatibilities with existing code.
I expect that implementing this RFC – despite its limited scope – will however prove useful in supporting compile-time string handling in the future.
Currently, the encode_utf8
method has the following prototype:
pub fn encode_utf8(self, dst: &mut [u8]) -> &mut str;
This is to simply be marked as const:
pub const fn encode_utf8(self, dst: &mut [u8]) -> &mut str;
This is not a breaking change.
Other than just adding the const
qualifier to the function prototype, the function body would have to be changed due to some constructs currently not being supported in constant expressions.
A working implementation can be found at bjoernager/rust:const-char-encode-utf8
.
Required changes are in /library/core/src/char/methods.rs
.
Note that this implementation assumes const_slice_from_raw_parts_mut
.
Implementing this RFC at the current moment could degenerate diagnostics as the assert
call in the encode_utf8_raw
function relies on formatters that are non-const.
The reference implementation resolves this by instead using a generic message, although this may not be desired:
encode_utf8: buffer does not have enough bytes to encode code point
This could be changed to have the number of bytes required hard-coded, but doing so may instead sacrifice code readability.
If the initial diagnostics are deemed to be worth more than const-compatibility then an encode_utf8_unchecked
method could be considered instead:
pub const unsafe fn encode_utf8_unchecked(self, dst: &mut [u8]) -> &mut str;
// ... or...
pub const unsafe fn encode_utf8_unchecked(self, dst: *mut u8) -> *mut str;
This function would perform the same operation but without testing the length of dst
, allowing for const conversions at least in the short-term (until formatters are stabilised).
Currently none that I know of.
The problem with diagnostic degeneration could be solved by allowing the used formatters in const environments. I do not know if there already exists such a feature for use by the standard library.
I suspect that having a similar decode_utf8
method may be desired.