Feature Name: const_char_encode_utf8
Start Date: 2024-09-17
RFC PR: rust-lang/rfcs#3696
Rust Issue: rust-lang/rust#130512

Summary

char::encode_utf8 should be marked const to allow for compile-time conversions. Considering mutable references now being stable in const environments, this implementation would be trivial even without compiler magic.

Motivation

The encode_utf8 method (in char) is currently not marked as "const" and is therefore rendered unusable in scenarios that require const-compatibility.

With the recent stabilisation of const_mut_refs, implementing encode_utf8 with the current signature is trivial and would (in practice) yield no incompatibilities with existing code.

I expect that implementing this RFC – despite its limited scope – will however prove useful in supporting compile-time string handling in the future.

Guide-level explanation

Currently, the encode_utf8 method has the following prototype:

pub fn encode_utf8(self, dst: &mut [u8]) -> &mut str;

This is to simply be marked as const:

pub const fn encode_utf8(self, dst: &mut [u8]) -> &mut str;

This is not a breaking change.

Reference-level explanation

Other than just adding the const qualifier to the function prototype, the function body would have to be changed due to some constructs currently not being supported in constant expressions.

A working implementation can be found at bjoernager/rust:const-char-encode-utf8. Required changes are in /library/core/src/char/methods.rs.

Note that this implementation assumes const_slice_from_raw_parts_mut.

Drawbacks

Implementing this RFC at the current moment could degenerate diagnostics as the assert call in the encode_utf8_raw function relies on formatters that are non-const.

The reference implementation resolves this by instead using a generic message, although this may not be desired:

encode_utf8: buffer does not have enough bytes to encode code point

This could be changed to have the number of bytes required hard-coded, but doing so may instead sacrifice code readability.

Rationale and alternatives

If the initial diagnostics are deemed to be worth more than const-compatibility then an encode_utf8_unchecked method could be considered instead:

pub const unsafe fn encode_utf8_unchecked(self, dst: &mut [u8]) -> &mut str;

// ... or...

pub const unsafe fn encode_utf8_unchecked(self, dst: *mut u8) -> *mut str;

This function would perform the same operation but without testing the length of dst, allowing for const conversions at least in the short-term (until formatters are stabilised).

Prior art

Currently none that I know of.

Unresolved questions

The problem with diagnostic degeneration could be solved by allowing the used formatters in const environments. I do not know if there already exists such a feature for use by the standard library.

Future possibilities

I suspect that having a similar decode_utf8 method may be desired.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3696-const-char-encode-utf8.md

3696-const-char-encode-utf8.md

Summary

Motivation

Guide-level explanation

Reference-level explanation

Drawbacks

Rationale and alternatives

Prior art

Unresolved questions

Future possibilities

Files

3696-const-char-encode-utf8.md

Latest commit

History

3696-const-char-encode-utf8.md

File metadata and controls

Summary

Motivation

Guide-level explanation

Reference-level explanation

Drawbacks

Rationale and alternatives

Prior art

Unresolved questions

Future possibilities