Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't format s-strings #1284

Open
max-sixty opened this issue Dec 17, 2022 · 15 comments
Open

Don't format s-strings #1284

max-sixty opened this issue Dec 17, 2022 · 15 comments

Comments

@max-sixty
Copy link
Member

From #965, broadening that issue to being able to avoid the auto-formatting. For context, this is can be important with non-standard SQL, like Snowflake's semi-structured data.

A couple of options for how we could do this:

  • Implement an "ignore" option upstream, in Possible to provide an escape hatch for expressions? shssoichiro/sqlformat-rs#15. I'm not sure whether upstream would accept a PR (though it seems reasonable, and worst case we could maintain a fork)
  • Replace s-strings after the auto-formatting, using $s_string_n. This requires passing s-strings all the way through the compiler, reducing the modularity of the compiler phases, but otherwise being fairly simple.
  • Relying on the new Options struct to disable formatting entirely. This has the disadvantage of being an all-or-nothing option, but might be an acceptable temporary solution.
@aljazerzen aljazerzen changed the title Escaping auto-SQL formatting Don't format s-strings Dec 21, 2022
@aljazerzen
Copy link
Member

Yeah, we should never format s-strings. I think the second in is the best option and we can fallback to using the third for the time being.

@BlurrechDev
Copy link
Contributor

I'm going to look at this. I will explore a fix for this based on the second option above to see how feasible this is.

@max-sixty
Copy link
Member Author

Great @BlurrechDev !

If that becomes unwieldy (i.e. we're passing a huge struct along), then we can reassess.

Feel free to post half-complete code — either for feedback or to merge something initial.

@aljazerzen
Copy link
Member

Great. If you need some pointers, this is how I'd do it:

  • the SQL is generated here, so before that, I'd replace all s-strings in RQ AST with generated s-strings.
  • for that, I'd implement rq::RqFold, similar to CidRedirector.
  • generated s-strings should be something that will never appear in actual queries. Max's suggestion of $s_string_1, $s_string_2, $s_string_3, ... is ok.
  • the s-string-extractor must return the new AST and something like a Vec<(String, String)> (vec of pairs of generated and actual s-strings).
  • after SQL is generated and formatted, replace the generated placeholders with actual s-strings using basic string replacement.

@max-sixty
Copy link
Member Author

This issue prevents folks from using the escape hatch, so bumping this to "Priority"...

@aljazerzen
Copy link
Member

I've did a little work on this, but it's much harder than I anticipated.

My idea was to replace s-strings with some unique identifier before compiling to SQL. After SQL is formatted, we can replace the identifier back with s-strings.

Starting PRQL:

from my_table
select s"COUNT ( DISTINCT {my_col})"

AST:

 From: my_table
 Select:
   SString:
   - "COUNT ( DISTINCT "
   - my_col
   - ")"

SStrings extracted:

From: my_table
Select:
  SString:
  - '_anchor_1'
  - my_col
  - '_anchor_2'

Compiled to SQL:

SELECT '_anchor_1'my_col'_anchor_2' FROM my_table

Formatted:

SELECT
  '_anchor_1' my_col '_anchor_2'
FROM
  my_table

Inject SStrings back in:

SELECT
  COUNT ( DISTINCT  my_col )
FROM
  my_table

... which is pretty close to what we'd want. The spacing after COUNT and before DISTINCT was preserved, as intended. But because formatting adds spacing between anchors, there are spaces around my_col.

I'm not sure we want to merge this, as it feels like a workaround using hacky text manipulation.

@max-sixty
Copy link
Member Author

Is there anything to having the whole S-string as a variable?

So the expression to be formatted is:

SELECT 
- '_anchor_1'my_col'_anchor_2' 
+ $_s_string_
FROM my_table

...and then we replace the variable after the formatting? So the S-string is completely opaque to the formatter.


a workaround using hacky text manipulation.

This used to be all of the compiler! 😀

@aljazerzen
Copy link
Member

That's a good idea, but a bit problematic because you have to translate my_col somehow. If can could do this separately, then that's the way to go.

@max-sixty
Copy link
Member Author

I was thinking of starting on this. But is it now intractable — everything is an s-string since the stdlib changes?

Or could we format the expressions that go into the s-strings separately? That seems quite difficult, if I'm thinking about it correctly.

@aljazerzen
Copy link
Member

No, not really. The s-strings in std.sql.prql have a completely separate codepath and never land in the AST as s-strings.

So it still as tractable as it was before.

@max-sixty
Copy link
Member Author

Ah great! I hadn't realized that

@aljazerzen
Copy link
Member

I did remember you just asking about this: #2694 (comment)

:D

@max-sixty
Copy link
Member Author

Sorry, that's bad memory even by my standards!

@philpep
Copy link
Contributor

philpep commented Oct 22, 2024

Another use case for this with postgres range operator:

from foo
select x=s"range @> time"

Produce:

SELECT
  range @ > time AS x
FROM
  foo

Which is invalid due to extra space added in @ >.

Any workaround would be appreciated, thanks!

@aljazerzen
Copy link
Member

The workaround is to disable formatting. In the CLI, there is --no-format, in Rust API there is Options::format, but in the Playground this option is not accessible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants