Skip to content

Commit

Permalink
Merge pull request #530 from mabel-dev/FIX/#527
Browse files Browse the repository at this point in the history
Fix/#527 `SET` variables not available to subqueries
  • Loading branch information
joocer authored Sep 17, 2022
2 parents 3dbf9fe + 513ce94 commit f0b0834
Show file tree
Hide file tree
Showing 9 changed files with 60 additions and 21 deletions.
20 changes: 20 additions & 0 deletions .github/build_number.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import os

BUILD_NUMBER = os.environ.get("GITHUB_RUN_NUMBER", 0)

try:
with open("opteryx/version.py", mode="rt", encoding="UTF8") as vf:
version_file_contents = vf.read()

version_file_contents = version_file_contents.replace(
"{BUILD_NUMBER}", str(BUILD_NUMBER)
)

with open("opteryx/version.py", mode="wt", encoding="UTF8") as vf:
vf.write(version_file_contents)

print(f"updated to build {BUILD_NUMBER}")

except Exception as e:

print(f"failed to update build - {e}")
13 changes: 13 additions & 0 deletions docs/Deployment/Internals/Query Engine.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,16 @@ The Query Plan can be seen for a given query using the `EXPLAIN` query.
The goal of the Query Executor is to produce the results for the user. It takes the Plan and executes the steps in the plan.

Opteryx implements a vectorized Volcano model executor. This means that the planner starts at the node closest to the end of the plan (e.g. `LIMIT`) and asks it for a page of data. This node asks its preceeding node for a page of data, etc etc until it gets to the node which aquires data from source. The data is then processed by each node until it is returned to the `LIMIT` node at the end.


## Performance Features

The following features are build into the query engine to improve performance

- Small pages are merged together before activities which operate on the entire page-at-a-time (such as selections)
- Projections are pushed to the parser, either to prevent parsing of unwanted fields (Parquet), or before passing to the next operation
- A page cache is used (local or memcached) to reduce reads to attached or remote storage
- A LRU-K cache eviction strategy with a fixed eviction budget per query to help ensure effective use of the page cache
- Aggressive pruning of date partitioned datasets
- SIMD and vectorized execution where available (via Numpy and PyArrow)
- Projection before GROUP BY to reduce data handled by the aggregators
10 changes: 3 additions & 7 deletions docs/SQL Reference/02 Statements.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,19 +136,15 @@ LIMIT count

## SET

Specifies the value of a query variable.
Specifies the value of a variable, the variable is available to the scope of the executing query batch.

~~~sql
SET variable = value
~~~

### SET clause
User defined variable names must be prefixed with an 'at' symbol (`@`) and the value must be a literal value. The variable can be used within `SELECT` clauses within the same query batch. A `SET` statement without a `SELECT` statement is invalid.

Specifies the value of a variabled available to the scope of the executing query batch. The variable name must start with an at sign (`@`) and the value must be a literal value.

The variable can be used within `SELECT` clauses within the same query batch.

A `SET` statement without a `SELECT` statement is invalid.
Note, System variables are prefixed with a dollar sign (`$`).

## SHOW COLUMNS

Expand Down
13 changes: 10 additions & 3 deletions docs/stylesheets/extra.css
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
table {
width: 100% !important;
font-size: 0.9em !important;
line-height: 1.3em;
}

Expand All @@ -12,13 +11,20 @@ th {
font-weight: 300;
color: #fff;
background-color: #666;
font-size: 1.2em !important;
}

td {
padding-top: 0.4em !important;
padding-bottom: 0.4em !important;
margin-left: 0.8em !important;
margin-right: 0.8em !important;
font-size: 1.2em !important;
min-width: 9em;
}

td code {
font-size: 1.1em !important;
}

code {
Expand All @@ -27,11 +33,12 @@ code {
color: #D63384 !important;
font-weight: 300;
word-wrap: break-word;
white-space:pre-wrap;
background-color: transparent !important;
padding: 0px !important;
}

pre>code {
pre > code {
background-color: #faf7fd !important;
font-size: 1em !important;
padding: 1em !important;
Expand All @@ -49,7 +56,7 @@ p, li {
h1 {
font-family: Georgia, 'Times New Roman', Times, serif;
font-weight: 500 !important;
font-size: 2em !important;
font-size: 2.2em !important;
margin-bottom: 0.7em !important;
color: #333 !important;
}
Expand Down
9 changes: 5 additions & 4 deletions opteryx/managers/query/planner/planner.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def __init__(self, statistics, cache=None, properties=None):

self._statistics = statistics

self._properties = (
self.properties = (
QueryProperties()
if not isinstance(properties, QueryProperties)
else properties
Expand All @@ -102,6 +102,7 @@ def copy(self):
statistics=self._statistics,
cache=self._cache,
)
planner.properties = self.properties
planner.start_date = self.start_date
planner.end_date = self.end_date
return planner
Expand Down Expand Up @@ -309,9 +310,9 @@ def _filter_extract(self, function):
if "Identifier" in function:
token_name = function["Identifier"]["value"]
if token_name[0] == "@":
if token_name not in self._properties.variables: # pragma: no cover
if token_name not in self.properties.variables: # pragma: no cover
raise SqlError(f"Undefined variable found in query `{token_name}`.")
return self._properties.variables.get(token_name)
return self.properties.variables.get(token_name)
else:
return ExpressionTreeNode(
token_type=NodeType.IDENTIFIER,
Expand Down Expand Up @@ -730,7 +731,7 @@ def _set_variable_planner(self, ast, statistics):
raise SqlError("Variable definitions must start with '@'.")
value = self._build_literal_node(ast["SetVariable"]["value"][0]["Value"])

self._properties.variables[key] = value
self.properties.variables[key] = value

def _show_create_planner(self, ast, statistics):

Expand Down
2 changes: 1 addition & 1 deletion opteryx/operators/base_plan_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def __init__(
different nodes differently to record what happened during the query
execution.
"""
self._properties = properties
self.properties = properties
self._statistics = statistics

def __call__(self):
Expand Down
10 changes: 5 additions & 5 deletions opteryx/utils/display.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def sanitize(htmlstring):
"{ " + ", ".join([f'"{k}": {v}' for k, v in htmlstring.items()]) + " }"
)
if not isinstance(htmlstring, str):
return htmlstring
return str(htmlstring)
escapes = {'"': "&quot;", "'": "&#39;", "<": "&lt;", ">": "&gt;", "$": "&#x24;"}
# This is done first to prevent escaping other escapes.
htmlstring = htmlstring.replace("&", "&amp;")
Expand Down Expand Up @@ -122,15 +122,15 @@ def format_value(val):
break

cache.append(row)
for k, v in row.items():
v = format_value(v)
length = max(len(str(v)), len(str(k)))
for k, value in row.items():
value = format_value(value)
length = max(len(str(value)), len(str(k)))
if length > columns.get(k, 0):
columns[k] = length

# draw table
bars = []
for header, width in columns.items():
for _, width in columns.items():
bars.append("-" * (width + 2))

# display headers
Expand Down
2 changes: 1 addition & 1 deletion opteryx/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,4 @@
"""

# __version__ = "0.4.0-alpha.6"
__version__ = "0.5.0-alpha.1"
__version__ = "0.5.0-alpha.2"
2 changes: 2 additions & 0 deletions tests/sql_battery/test_battery.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,6 +671,8 @@
("SELECT CONCAT(LIST(name)) FROM $planets GROUP BY gravity", 8, 1),
# AGG (FUNCTION)
("SELECT SUM(IIF(year < 1970, 1, 0)), MAX(year) FROM $astronauts", 1, 2),
# [#527] variables referenced in subqueries
("SET @v = 1; SELECT * FROM (SELECT @v);", 1, 1),
]
# fmt:on

Expand Down

0 comments on commit f0b0834

Please sign in to comment.