Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Quotes within quotes of Formatter IR isn't escaped #1056

Closed
1 task done
Yash-Singh1 opened this issue Dec 5, 2023 · 11 comments · Fixed by #3036
Closed
1 task done

🐛 Quotes within quotes of Formatter IR isn't escaped #1056

Yash-Singh1 opened this issue Dec 5, 2023 · 11 comments · Fixed by #3036
Assignees
Labels
A-Formatter Area: formatter good first issue Good for newcomers S-Bug-confirmed Status: report has been confirmed as a valid bug S-Help-wanted Status: you're familiar with the code base and want to help the project

Comments

@Yash-Singh1
Copy link
Contributor

Yash-Singh1 commented Dec 5, 2023

Environment information

This is running Biomejs playground with no special options enabled.

What happened?

  1. Had a string in Biome.js playground: https://biomejs.dev/playground/?code=YQB3AGEAaQB0ACAAIgBhACIAIAA%3D
await "a" 
  1. Double quotes is not escaped or enclosing string is not single quoted (as in Prettier Formatter IR):
    Screenshot 2023-12-04 at 7 53 08 PM
  2. This also causes broken syntax highlighting

Expected result

Expected result is either the double quotes surrounding "a" are escaped:

["await \"a\";", hard_line_break]

Or single quotes used for enclosing quotes instead:

['await "a";', hard_line_break]

Code of Conduct

  • I agree to follow Biome's Code of Conduct
@ematipico ematipico added good first issue Good for newcomers S-Help-wanted Status: you're familiar with the code base and want to help the project A-Formatter Area: formatter S-Bug-confirmed Status: report has been confirmed as a valid bug labels Dec 5, 2023
@yossydev
Copy link

yossydev commented Dec 5, 2023

@ematipico
I would like to address this issue!

@yossydev
Copy link

yossydev commented Dec 14, 2023

@ematipico

Thanks for your hard work! I have a question about this issue!
I'm looking at a lot of implementations right now, and the one that needs to be fixed is https://github.com/biomejs/biome/blob/main/crates/biome_js_formatter/src/js/expressions/await_ In the :expression.rs, am I correct? 👀.

@Yash-Singh1
Copy link
Contributor Author

@ematipico Thanks for your hard work! I have a question about this issue! I'm looking at a lot of implementations right now, and the one that needs to be fixed is https://github.com/biomejs/biome/blob/main/crates/biome_js_formatter/src/js/expressions/await_ In the :expression.rs, am I correct? 👀.

The bug is around how the formatter IR is stringified in the playground regardless of what the inputted code is. I think this will be somewhere in the playground packages, so you can start by bundling that and looking at how the WASM libraries run the actual Rust code. I went along that a bit and found the

#[wasm_bindgen(js_name = getFormatterIr)]
function to be related.

@yossydev
Copy link

@Yash-Singh1
Thanks for the advice!
I expected that there must be a further conversion process in getFormatterIr, so I went deeper and found the format_node, is this related...?

pub fn format_node<L: FormatLanguage>(
root: &SyntaxNode<L::SyntaxLanguage>,
language: L,
) -> FormatResult<Formatted<L::Context>> {
tracing::trace_span!("format_node").in_scope(move || {
let (root, source_map) = match language.transform(&root.clone()) {
Some((transformed, source_map)) => {
// we don't need to insert the node back if it has the same offset
if &transformed == root {
(transformed, Some(source_map))
} else {
match root
.ancestors()
// ancestors() always returns self as the first element of the iterator.
.skip(1)
.last()
{
// current root node is the topmost node we don't need to insert the transformed node back
None => (transformed, Some(source_map)),
Some(top_root) => {
// we have to return transformed node back into subtree
let transformed_range = transformed.text_range();
let root_range = root.text_range();
let transformed_root = top_root
.replace_child(root.clone().into(), transformed.into())
// SAFETY: Calling `unwrap` is safe because we know that `root` is part of the `top_root` subtree.
.unwrap();
let transformed = transformed_root.covering_element(TextRange::new(
root_range.start(),
root_range.start() + transformed_range.len(),
));
let node = match transformed {
NodeOrToken::Node(node) => node,
NodeOrToken::Token(token) => {
// if we get a token we need to get the parent node
token.parent().unwrap_or(transformed_root)
}
};
(node, Some(source_map))
}
}
}
}
None => (root.clone(), None),
};
let context = language.create_context(&root, source_map);
let format_node = FormatRefWithRule::new(&root, L::FormatRule::default());
let mut state = FormatState::new(context);
let mut buffer = VecBuffer::new(&mut state);
write!(buffer, [format_node])?;
let mut document = Document::from(buffer.into_vec());
document.propagate_expand();
state.assert_formatted_all_tokens(&root);
let context = state.into_context();
let comments = context.comments();
comments.assert_checked_all_suppressions(&root);
comments.assert_formatted_all_comments();
Ok(Formatted::new(document, context))
})
}

@ematipico
Copy link
Member

ematipico commented Dec 14, 2023

Hi @Yash-Singh1 , apologies for let you stranded with this PR.

The issue isn't related to the playground, it's related to our internal formatter.

First, head to crates/biome_js_formatter/tests/quick_test.rs and run the quick_test test. You might need to remove the #[ignore] macro and add the -- --show-output argument if you plan to run test from the CLI. If you use a IntelliJ IDE, you just run it from the UI and it should work.

Inside src put the code await "a" and the test should print the IR and the formatted code. You will notice that IR isn't correct.

How to debug it? Use the playground and inspect the CST. You will notice that it's a JsAwaitExpression. Then go in biome_js_formatter/src/js and look for JsAwaitExpression and eventually you will find the actual implementation, which is here:

impl FormatNodeRule<JsAwaitExpression> for FormatJsAwaitExpression {

It's possible that we aren't handling string literals correctly.

@ematipico
Copy link
Member

Also, the expected result should be the same as prettier, so use prettier's IR as solution

@yossydev
Copy link

@ematipico
Sorry, I'm sorry for all the explanations, but I felt I couldn't solve this problem myself...

I am very sorry, but I would like someone else to tackle this problem 🙏

@southball
Copy link

@ematipico (sorry for pinging, this is not urgent)

I have taken a look at the issue, and if there seems to be a solution I would like to try to implement it.

First of all, I think this would be a reasonable testcase to try to make pass:

await "a";
await 'a';
await "'a'";
await '"a"';
f("a");
f('a');
f("'a'");
f('"a"');

https://biomejs.dev/playground/?code=YQB3AGEAaQB0ACAAIgBhACIAOwAKAGEAdwBhAGkAdAAgACcAYQAnADsACgBhAHcAYQBpAHQAIAAiACcAYQAnACIAOwAKAGEAdwBhAGkAdAAgACcAIgBhACIAJwA7AAoAZgAoACIAYQAiACkAOwAKAGYAKAAnAGEAJwApADsACgBmACgAIgAnAGEAJwAiACkAOwAKAGYAKAAnACIAYQAiACcAKQA7AA%3D%3D

There are two issues as you have mentioned:

  1. String literals aren't formatted correctly
  2. await literals do not emit the same IR as prettier.

The second issue should be (I'm not sure) easier to solve, but the first issue isn't.

I have taken a look at the code and since the code formats correctly, I believe that the issue lies in the trait implementation of std::fmt::Display of Document. Digging deeper, I think this part of the code looks suspicious.

if !in_text {
write!(f, [text("\"")])?;
}
in_text = true;
match element {
FormatElement::Space => {
write!(f, [text(" ")])?;
}
element if element.is_text() => f.write_element(element.clone())?,
_ => unreachable!(),
}
let is_next_text = iter.peek().map_or(false, |e| e.is_text() || e.is_space());
if !is_next_text {
write!(f, [text("\"")])?;
in_text = false;
}

It seems that we are adding StaticText("\""), LocatedTokenText("\"a\""), StaticText("\"") to the buffer, and it is stored in a VecBuffer through the call to biome_formatter::format! which delegates to biome_formatter::format. We have then lost the context that "a" is nested in double quotes.

I am not so sure what we can do in order to fix this problem. Some potential solutions I can think of are

  1. Introducing a new FormatElement::String variant that wraps a Vec<FormatElement> or Box<[FormatElement]> that represents strings wrapped in strings. This probably is not the solution as it would not be used outside IRs and also it would make invalid state like FormatElement::String(Box::new([Tag(...)])) representable.
  2. Escape each text in FormatElement::StaticText | FormatElement::DynamicText | FormatElement::LocatedTokenText before passing it to write_element. This encodes the knowledge that these tokens are in text before passing them to the formatter, but I am not sure if this is correct.

Please kindly advise if there are any ideas, or feel free to take over the implementation.

@ematipico
Copy link
Member

ematipico commented Jan 18, 2024

Thank you @southball for looking into this.

This is indeed a tricky issue, and you did a great job to narrow down the culprit!

Between the two solutions, my gut tells me to attempt the second solution, mostly because adding a new element to the IR might have more side-effects (memory, performance, etc.).

@southball
Copy link

Thank you!

I may have time to implement this at the end of this month / at the beginning of next month, so if anyone wants to send a PR please feel free to do so! Otherwise I will send a PR as soon as I can get to it!

@dyc3
Copy link
Contributor

dyc3 commented May 31, 2024

I'd like to give this a shot. Thanks @southball for the writeup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Formatter Area: formatter good first issue Good for newcomers S-Bug-confirmed Status: report has been confirmed as a valid bug S-Help-wanted Status: you're familiar with the code base and want to help the project
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants