Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hog): remove semicolons, require "as" for column aliases #22868

Merged
merged 22 commits into from
Jun 12, 2024

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Jun 11, 2024

Problem

One of the biggest subtle improvements in my programming life has been the elimination of semicolons

print('here we go');
let i := 0;
while (i < 3) {
    i := i + 1;
    print(i);
}
print(i);

We're using them in Hog because it made getting started easier, but I'd like to get rid of them. Every Hog developer so far has had many cases of forgetting a semicolon. I'm confident it'll frustrate our users as well.

Changes

Main change

Unchains the Removes semicolons from Hog:

print('here we go')
let i := 0
while (i < 3) {
    i := i + 1
    print(i)
}
print(i)

It might not look like much of a difference, but when writing code... it really adds up.

"Side-effect" - breaking HogQL change

A breaking change needs to be made to HogQL if we want to get rid of semicolons. It might not be as painful as it sounds.

This is currently allowed, but soon won't be:

select event ev, properties.$browser browsy, properties.$os as ossy from events

This is how you'll need to write it:

select event as ev, properties.$browser as browsy, properties.$os as ossy from events

Note: table aliases are unaffected, so this will keep working:

select e.event as ev from events e; -- no need for an explicit `AS` for the table

We need to get rid of this columnExpr alias rule (and keep columnExpr AS alias), because the parser ignores whitespace and the alias part often catches the symbol from the next line. let i := 0 while (){...} is read as let i := 0 as "while";;{...};.

I've personally almost always write "select bla as bla", and so do our users: out of ~5k saved HogQL insights on US cloud, only 117 of them contain a column alias without the explicit "as". On EU it was 210.

I have now updated all of them and added " as " in front of each alias, so there are zero such saved HogQL insights on cloud.

How did I get the data and how did I clean it up?

I copy/pasted bits of code that looked something like this into a CLI in production:

from posthog.hogql.grammar.HogQLParser import HogQLParser
from posthog.hogql.parser import parse_expr, parse_select, RULE_TO_PARSE_FUNCTION, get_parser, HogQLParseTreeConverter
from posthog.hogql.parse_string import parse_string_literal_ctx
from posthog.hogql.constants import RESERVED_KEYWORDS
from django.db.models import Q
from posthog.hogql import ast
from posthog.models import Insight
insight_debug = 0
class HogQLParseTreeConverter_(HogQLParseTreeConverter):
    def visitColumnExprAlias(self, ctx: HogQLParser.ColumnExprAliasContext):
        alias: str
        if ctx.alias():
            alias = self.visit(ctx.alias())
            print("!!!!!! " + insight_debug)
        elif ctx.identifier():
            alias = self.visit(ctx.identifier())
        elif ctx.STRING_LITERAL():
            alias = parse_string_literal_ctx(ctx.STRING_LITERAL())
        else:
            raise NotImplementedError(f"Must specify an alias")
        expr = self.visit(ctx.columnExpr())
        if alias.lower() in RESERVED_KEYWORDS:
            raise SyntaxError(f'"{alias}" cannot be an alias or identifier, as it\'s a reserved keyword')
        return ast.Alias(expr=expr, alias=alias)

RULE_TO_PARSE_FUNCTION['python'] = {
    "expr": lambda string, start: HogQLParseTreeConverter_(start=start).visit(get_parser(string).expr()),
    "order_expr": lambda string: HogQLParseTreeConverter_().visit(get_parser(string).orderExpr()),
    "select": lambda string: HogQLParseTreeConverter_().visit(get_parser(string).select()),
    "full_template_string": lambda string: HogQLParseTreeConverter_().visit(get_parser(string).fullTemplateString()),
}

insights = (
    Insight.objects.filter(
        Q(query__source__kind="HogQLQuery"),
        saved=True,
        deleted=False,
    )
    .order_by("id")
    .all()
)

new_insights = []
for insight in insights:
    insight.refresh_from_db()
    query = insight.query['source']['query']
    insight_debug = f"insight {insight.id} {insight.short_id}, team: {insight.team_id}"
    try:
        last_start = 0
        ignore = parse_select(query, backend='python')
        if last_start != 0:
            # print("<<" + query[last_start-10:last_start] + "<<")
            # print(">>" + query[last_start:last_start+10] + ">>")
            new_query = query[:last_start] + "as " + query[last_start:]
            print("-----------------------------------")
            print("old query: " + query)
            print("-----------------------------------")
            print("NEW query: " + new_query)
            # insight.query['source']['query'] = new_query
            # insight.save()
            new_insights.append(insight)
    except Exception as e:
        print(f"Error: {e} {insight_debug}")
insights = new_insights

When I was sure everything looked good, I saved the changes into the insights.

For example, this insight for us changed from this:

Screenshot 2024-06-11 at 21 24 47

to this:

Screenshot 2024-06-11 at 21 34 02

Thus this should be safe to merge. Is there anything I've missed?

  • Notebooks? I should probably do the same quick check and conversion as I did for insights. Done.
  • People hitting the API. I'm personally OK with breaking it here. HogQL is beta, we're allowed to do this. The number of people impacted should be really tiny: people hitting out API using raw HogQL queries and aliasing columns without an "as" keyword. Hopefully they'll adapt the query based on the error.
  • Self hosted? We'll hear on Github, and hopefully they'll adapt.

Are we OK rolling it in?

Other changes

  • Empty return probably works again (will test in a later PR)
  • Removed a bunch of typing errors from some HogQL files.

Alternatives

We're currently reusing columnExpr between Hog and HogQL. It's a huge set of rules. We could duplicate this set: 1) once for Hog without the alias rule, 2) once for HogQL with it... but this sounds messy. I tried and couldn't figure a clean way to split and share part of the rules, going rather deep. We'd effectively need to split the parser into Hog and HogQL modes (we already have HogQL and String Template modes). However that requires duplicating a lot of code.

How did you test this code?

Added tests.

TODO: figure out the impact:

  • Do tests pass? now they do
  • How many saved insights use this type of aliasing? 117 + EU
  • What will break if we enable this? funnels
  • Is this safe to roll out or do we need to split the grammar? There may also be other reasons to split it. safe if editing old insights manually. Will take just a bit of work.
  • update all HogQL insights on EU & US clouds to have "as " in front of the alias

@mariusandra mariusandra mentioned this pull request Jun 11, 2024
51 tasks
@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.12. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.12. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@posthog-bot
Copy link
Contributor

It looks like the code of hogql-parser has changed since last push, but its version stayed the same at 1.0.12. 👀
Make sure to resolve this in hogql_parser/setup.py before merging!

@mariusandra mariusandra changed the title feat(hog): yeet ; feat(hog): remove semicolons Jun 11, 2024
@mariusandra mariusandra marked this pull request as ready for review June 11, 2024 16:09
@mariusandra mariusandra requested review from benjackwhite and a team June 11, 2024 16:10
@mariusandra mariusandra changed the title feat(hog): remove semicolons feat(hog): remove semicolons, require "as" for column aliases Jun 11, 2024
@mariusandra mariusandra merged commit 779b17f into master Jun 12, 2024
107 checks passed
@mariusandra mariusandra deleted the no-semicolons branch June 12, 2024 09:46
Copy link

sentry-io bot commented Jun 12, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ SocketTimeoutError: (ch-offline.posthog.net:9440) posthog.tasks.tasks.process_query_task View Issue
  • ‼️ SocketTimeoutError: (ch-offline.posthog.net:9440) posthog.tasks.tasks.process_query_task View Issue
  • ‼️ SocketTimeoutError: (ch-offline.posthog.net:9440) posthog.tasks.tasks.process_query_task View Issue
  • ‼️ SocketTimeoutError: (ch-offline.posthog.net:9440) /api/projects/{parent_lookup_team_id}/query/ View Issue
  • ‼️ SocketTimeoutError: (ch-offline.posthog.net:9440) /api/projects/{parent_lookup_team_id}/query/ View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants