Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT * EXCEPT/EXCLUDE #27

Open
johnedquinn opened this issue Feb 6, 2023 · 0 comments
Open

SELECT * EXCEPT/EXCLUDE #27

johnedquinn opened this issue Feb 6, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@johnedquinn
Copy link
Member

johnedquinn commented Feb 6, 2023

Description

I have personally seen several users of PartiQL with obscenely large queries using SELECT VALUE for the specific use-case of filtering out attributes from the input binding tuple. As an illustrative example, consider a table containing 26 attributes, where each attribute's name is a letter in the English alphabet.

In order to exclude an attribute (say, d) from the table in a PartiQL query, we would need to write something like:

SELECT
  a, b, c, e, ...
  q, r, s, t, u, v, w, x, y, z
FROM
  alphabet_table

Or, another common approach from PartiQL users is:

SELECT VALUE {
  'a': a,
  'b': b,
  ...
  'y': y,
  'z': z
}
FROM
  alphabet_table

I've even conversed with some individuals who needed to generate queries to exclude certain attributes from tables that have multiple thousands of attributes.

Possible Solutions

SELECT * EXCEPT (BigQuery)

A possible solution is syntax such as:

SELECT * EXCEPT (d)
FROM alphabet_table

The above is syntax supported by Google's BigQuery. Similarly, they have support for syntax such as:

SELECT * REPLACE (quantity/2 AS quantity)
FROM some_table

We could also accomplish the same as SELECT * EXCEPT by writing:

SELECT * REPLACE (MISSING AS d)
FROM alphabet_table

As MISSING is stripped from the output result in PartiQL, the above is likely valid as a replacement for EXCEPT!

Allowing LET to Modify Binding Tuples

A somewhat similar approach of replacement could be a syntax such as:

SELECT *
FROM some_table AS t
LET MISSING AS t.d

By expanding support for introducing variables in the LET to include replacing attributes of a binding tuple, this approach could be possible. However, there are some immediate cons such as ambiguity in naming for JOINs, projection pushdown, and variables introduced AFTER the LET.

Nested Data (Using both SELECT REPLACE and a function)

While SELECT * EXCEPT could be valuable for removing columns from the final projection, how could we remove attributes from any nested data (struct) in the result projection? As an example, consider the binding tuple:

<
  'a': 0,
  'b': {
    'c': 2,
    'd': 3,
    ...
  },
  'x': 23,
  'y': 25,
  'z': 26
>

How could we remove d from b while still outputting the rest of the result?

SELECT * REPLACE (
  (SELECT * EXCEPT d FROM src.b) AS b
)
FROM alphabet_nested_table AS src

This could theoretically work if we coerce src.b into a bag, but now we have a bag in the projection list. Perhaps, then, for nested attributes, implementing a function to redact a struct's columns could be easier & worthwhile. I am personally unknowledgeable on the topic of projection pushdown for nested structures, but this custom function could possibly negatively affect our ability to perform the pushdown. However, let's keep the train of thought:

Let's try to remove both y and b.d from the result:

SELECT * REPLACE (
  MISSING AS y,
  redact_columns(b, ['d']) AS b
)
FROM alphabet_nested_table

Or, with some naming:

SELECT * REPLACE (
  MISSING AS y,
  redact_columns(b, columns_to_redact_from_b) AS b
)
LET ['d'] AS columns_to_redact_from_b
FROM alphabet_nested_table

Well, that's certainly interesting and readable.

Revisiting Nested Data and Projection Pushdown

Ideally, PartiQL could support filtering of both columns and nested attributes of structs. I'll take this opportunity to explore some possible syntax:

SELECT * REPLACE (
  MISSING AS y,
  y.* EXCEPT y.b,  --- Or maybe just b. Or, even allowing just `y EXCEPT b` (not `y.*`)
  z.* REPLACE (
    some_nested_attr + 1 AS some_nested_attr,
    MISSING AS some_attr_to_be_removed
  )
)
FROM alphabet_nested_table

Then, we'd have to define rules for type mismatches, missing, null, etc. But, I'd argue that this is also interesting and readable! Even more interesting is possibly allowing replacing attributes of a struct as a first class expression! So, the following could be a valid query:

a REPLACE (
  b + 1 AS b
)
===
{
  'b': 2
}

And with highly nested data such as:

{
  'a': {
    'b': {
      'c': {
        'd': 0,
        'e': 1,
        'f': 2
      }
    }
  }
}
SELECT * REPLACE (

  -- Option # 1 (Un-nesting to replace)
  a REPLACE (
    b REPLACE (
      c REPLACE (
        d + 1 AS d
 )))

  -- Option # 2 (Unsure about this)
  a.b.c REPLACE (
    d + 1 AS d
  ) AS a.b.c

  -- Option # 3 (Same as option # 2, but we don't have the nested renaming)
  a.b.c REPLACE (
    d + 1 AS d
  )

) FROM some_highly_nested_table

Other Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant