Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog - Handling null for fun and profit #6115

Closed
ivanagas opened this issue Jun 12, 2023 · 1 comment · Fixed by #6126
Closed

Blog - Handling null for fun and profit #6115

ivanagas opened this issue Jun 12, 2023 · 1 comment · Fixed by #6126
Assignees

Comments

@ivanagas
Copy link
Contributor

ivanagas commented Jun 12, 2023

Strapline

Explain the idea in a sentence or two

Our adventures in handling NULL in HogQL, the problem we ran into, potential solutions we came up with, and ultimate solution we chose.

Why should we do it?

How it will it be useful or interesting for readers/viewers

Handling null is something a lot of projects need to do, our experience could help them.

Change of pace technical blog post, which we haven’t done in a while.

  1. https://posthog.slack.com/archives/C0368RPHLQH/p1686307855624409
  2. https://posthog.slack.com/archives/C0368RPHLQH/p1685047495690929

Headlines options

think of two of three possible headlines we could use

  • Handling null for fun and profit
  • The many solutions to handling null
  • Our adventures in null handling

Outline

Bullet point outline of structure / questions / topics to be covered

  • During the development of HogQL, we discover null are difficult to handle
    • SQL is strict about nulls
    • We traditionally convert null to strings
    • How ClickHouse handles
  • If any of the values (properties) of a statement were null, that would make the entire statement null.
    • For example, For example, math like sumIf(properties.toppings, event='pizza') + sumIf(properties.cheese, event='pasta') will return null if any of the two sides is null , which might happen if there are just no matching events, or the property is null
    • concat('code: ', properties.code) returns null if there's no code property, though I'd expect either 'code: '  or 'code: null'
    • Spent time debugging customer issues around this.
  • Potential solutions
    • Property returns default value such as “” or 0 or false depending on type.
      • Problem with this is the statement properties.bla is null needs to be handled too
      • Would require rewriting is set and is not set logic to be ≠ ‘’ or empty().
    • Nullable checkbox in data management
    • is set and is not set as native constructs.
    • JS-like 'bla' (NOT) IN properties for "is (not) set"
      • might be too complex,
      • doesn’t solve is null for unset property
    • wrap sumIf in nullIf(x, 0)
      • Not elegant
    • Use sumOrDefaultIf
    • non-ANSI compliant string operator, e.g. f'{NULL}-haha-{properties.bla}',
    • Some other ones
  • Chosen solution
    • Don’t return 0 on numbers, ClickHouse will skip these in aggregations
    • Make concat null-tolerant
      • concat(null, 'a', 3, toString(4), toString(NULL), properties.$screen_width) turns into concat('', 'a', toString(3), toString(4), '')
      • Basically add a bunch of ifNull for users instead of having them do it
      • Leads to a “dorky” looking ClickHouse SQL string
    • Dealing with nulls are hard
@ivanagas ivanagas self-assigned this Jun 12, 2023
@ivanagas ivanagas linked a pull request Jun 14, 2023 that will close this issue
4 tasks
@ivanagas
Copy link
Contributor Author

Looks like this isn't over yet: PostHog/posthog#16259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant