Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't attempt to process ANSI sequences in non-UTF8 input #1117

Merged
merged 1 commit into from
Jul 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions src/delta.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ use crate::handlers::hunk_header::ParsedHunkHeader;
use crate::handlers::{self, merge_conflict};
use crate::paint::Painter;
use crate::style::DecorationStyle;
use crate::utils;

#[derive(Clone, Debug, PartialEq)]
pub enum State {
Expand Down Expand Up @@ -181,10 +182,25 @@ impl<'a> StateMachine<'a> {
}

fn ingest_line(&mut self, raw_line_bytes: &[u8]) {
// TODO: retain raw_line as Cow
self.raw_line = String::from_utf8_lossy(raw_line_bytes).to_string();
match String::from_utf8(raw_line_bytes.to_vec()) {
Ok(utf8) => self.ingest_line_utf8(utf8),
Err(_) => {
let raw_line = String::from_utf8_lossy(raw_line_bytes);
let truncated_len = utils::round_char_boundary::floor_char_boundary(
&raw_line,
self.config.max_line_length,
);
self.raw_line = raw_line[..truncated_len].to_string();
self.line = self.raw_line.clone();
}
}
}

fn ingest_line_utf8(&mut self, raw_line: String) {
self.raw_line = raw_line;
// When a file has \r\n line endings, git sometimes adds ANSI escape sequences between the
// \r and \n, in which case byte_lines does not remove the \r. Remove it now.
// TODO: Limit the number of characters we examine when looking for the \r?
if let Some(cr_index) = self.raw_line.rfind('\r') {
if ansi::strip_ansi_codes(&self.raw_line[cr_index + 1..]).is_empty() {
self.raw_line = format!(
Expand Down
1 change: 1 addition & 0 deletions src/utils/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ pub mod bat;
pub mod path;
pub mod process;
pub mod regex_replacement;
pub mod round_char_boundary;
pub mod syntect;
24 changes: 24 additions & 0 deletions src/utils/round_char_boundary.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
// Taken from https://github.com/rust-lang/rust/pull/86497
// TODO: Remove when this is in the version of the Rust standard library that delta is building
// against.

#[inline]
const fn is_utf8_char_boundary(b: u8) -> bool {
// This is bit magic equivalent to: b < 128 || b >= 192
(b as i8) >= -0x40
}

#[inline]
pub fn floor_char_boundary(s: &str, index: usize) -> usize {
if index >= s.len() {
s.len()
} else {
let lower_bound = index.saturating_sub(3);
let new_index = s.as_bytes()[lower_bound..=index]
.iter()
.rposition(|b| is_utf8_char_boundary(*b));

// SAFETY: we know that the character boundary will be within four bytes
unsafe { lower_bound + new_index.unwrap_unchecked() }
}
}