Skip to content

Commit

Permalink
Add safety comment for String splice
Browse files Browse the repository at this point in the history
  • Loading branch information
Manishearth committed Oct 11, 2024
1 parent 37dd9c2 commit 4715a27
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions tokenizers/src/tokenizer/normalizer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -411,8 +411,16 @@ impl NormalizedString {
.collect::<String>();

self.alignments.splice(n_range.clone(), alignments);

// This bounds check already happens above (`self.normalized[n_range.clone()]`), but future
// code could change to mutate `self` or `self.normalized` in the interim.
// Perform it again and hope the optimizer collapses it.
assert!(self.normalized.get(n_range.clone()).is_some());
unsafe {
self.normalized
// Safety: This is safe as long as we do not splice across a
// UTF-8 character, and we only add UTF-8 text. `normalized` is a String
// so the latter is trivially true, and we assert for the former above.
.as_mut_vec()
.splice(n_range, normalized.bytes());
}
Expand Down

0 comments on commit 4715a27

Please sign in to comment.