Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hogql): automatic person table joins #14286

Merged
merged 170 commits into from
Feb 28, 2023
Merged

feat(hogql): automatic person table joins #14286

merged 170 commits into from
Feb 28, 2023

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Feb 16, 2023

Changes

  • Adds a LazyTable database type and a corresponding LazyTableSymbol symbol.
  • Adds a separate "lazy table resolution" step after symbol resolution. During this step we gather all fields that were accessed on this table, and pass them on to a custom AST generator that gives us the code for the JOINs.
  • Implements events.pdi and events.pdi.person joins using lazy tables, including materialised columns where applicable.
  • Adds a VirtualTable database type and a corresponding VirtualTableSymbol symbol.
  • These tables are just a different view on top of the same physical database table.
  • Implements poe as a virtual table with a structure resembling the persons table
  • Implements a FieldTraverser class and symbol. This lets us override that "events.person" == "events.pdi.person".

Here's a table where I'm comparing the "email" property from PoE and from PDI.

image

It does not yet implement a mechanism for swapping between PoE on and off as the default behaviour for events.person, but locks it to be events.pdi.person (PoE off) because that's where we are. I'd like implement a proper swap in another PR, as that requires creating a new virtual database table for each team, and other fun.

Also out of scope: we will currently copy the entire properties field between shards. It's easy to fix (fields are already lazily added after all), but I'd like to keep this PR from totally exploding.

How did you test this code?

Added tests

@mariusandra mariusandra marked this pull request as ready for review February 23, 2023 00:19
@mariusandra mariusandra requested a review from a team February 24, 2023 11:56
Copy link
Member

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 files now, 12 files on Monday 🤣

def clickhouse_table(self):
# This is a bit of a hack to make sure person.properties.x works
return "events"
def join_with_max_person_distinct_id_table(from_table: str, to_table: str, requested_fields: List[str]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's maybe too much refactoring for now but this method and join_with_persons_table are almost identical and feel like they'd change together... but they're also almost next to each other in a single file so 🤷‍♀️

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change even more when I add support for querying "is person in cohort", which needs a similar join. So I'd punt on refactoring this file much further for now. The repetitive patterns and absstractions will emerge as soon as we add a few more joins.

elif isinstance(symbol.table, ast.SelectQuerySymbol) or isinstance(symbol.table, ast.SelectQueryAliasSymbol):
field_sql = self._print_identifier(symbol.name)
if isinstance(symbol.table, ast.SelectQueryAliasSymbol) or symbol_with_name_in_scope != symbol:
field_sql = f"{self.visit(symbol.table)}.{field_sql}"

# :KLUDGE: Legacy person properties handling. Only used within non-HogQL queries, such as insights.
if self.context.legacy_person_property_handling and field_sql == "events__pdi__person.properties":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like within_non_hogql_query is shorter than legacy_person_property_handling and carries more information. Then the comment can be deleted

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much better name 👍

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted
  • firefox: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted
  • firefox: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

@posthog-bot
Copy link
Contributor

📸 UI snapshots have been updated

1 snapshot changes in total. 0 added, 1 modified, 0 deleted:

  • chromium: 0 added, 1 modified, 0 deleted (diff for shard 1)
  • webkit: 0 added, 0 modified, 0 deleted
  • firefox: 0 added, 0 modified, 0 deleted

Triggered by this commit.

👉 Review this PR's diff of snapshots.

Copy link
Collaborator

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite right to me

@Twixes
Copy link
Collaborator

Twixes commented Feb 27, 2023

I don't have comments though, as I haven't really worked with this code and so at the increased level of complexity I might be missing something

@mariusandra mariusandra merged commit 80fba8c into master Feb 28, 2023
@mariusandra mariusandra deleted the person-table-joins branch February 28, 2023 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants