Skip to content

Commit

Permalink
Merge pull request #447 from mabel-dev/FEATURE/#442-PART-01
Browse files Browse the repository at this point in the history
FEATURE/#442 Various functions
  • Loading branch information
joocer authored Aug 29, 2022
2 parents eec7fa0 + 0f2f534 commit 30c1471
Show file tree
Hide file tree
Showing 9 changed files with 134 additions and 4 deletions.
1 change: 1 addition & 0 deletions docs/Release Notes/Change Log.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
- [[#366](https://github.com/mabel-dev/opteryx/issues/336)] Implement 'function not found' suggestions. ([@joocer](https://github.com/joocer))
- [[#443](https://github.com/mabel-dev/opteryx/issues/443)] Introduce a CLI. ([@joocer](https://github.com/joocer))
- [[#351](https://github.com/mabel-dev/opteryx/issues/351)] Support `SHOW FUNCTIONS`. ([@joocer](https://github.com/joocer))
- [[#442](https://github.com/mabel-dev/opteryx/issues/442)] Various functions. ([@joocer](https://github.com/joocer))

**Changed**

Expand Down
34 changes: 33 additions & 1 deletion docs/SQL Reference/06 Functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,24 @@ Definitions noted with a 🔻 accept different input arguments.
## Conversion Functions

!!! function "`BOOLEAN` (**any**: _any_) → _boolean_"
Cast **any** to a `boolean`, raises an error if cast is not possible.
Cast **any** to a `BOOLEAN`, raises an error if cast is not possible.
Alias for `CAST`(**any** AS BOOLEAN).

!!! function "`CAST` (**any**: _any_ AS **type**) → _[type]_"
Cast **any** to **type**, raises an error if cast is not possible.
Also implemented as individual cast functions.

!!! function "`INT` (**num**: _numeric_) → _numeric_"
Alias for `INTEGER`.

!!! function "`INTEGER` (**num**: _numeric_) → _numeric_"
Convert **num** to an integer.
`INTEGER` is a psuedo-type, `CAST` is not supported and values may be coerced to `NUMERIC`.

!!! function "`FLOAT` (**num**: _numeric_) → _numeric_"
Convert **num** to a floating point number.
`FLOAT` is a psuedo-type, `CAST` is not supported and values may be coerced to `NUMERIC`..

!!! function "`NUMERIC` (**any**: _any_) → _numeric_"
Cast **any** to a floating point number, raises an error if cast is not possible.
Alias for `CAST`(**any** AS NUMERIC).
Expand Down Expand Up @@ -259,6 +270,18 @@ For more details, see [Working with Structs](https://mabel-dev.github.io/opteryx

## Other Functions

!!! function "`BASE64_DECODE` (**any**) → _varchar_"
Decode a value which has been encoded using BASE64 encoding.

!!! function "`BASE64_ENCODE` (**any**) → _varchar_"
Encode value with BASE64 encoding.

!!! function "`BASE85_DECODE` (**any**) → _varchar_"
Decode a value which has been encoded using BASE85 encoding.

!!! function "`BASE85_ENCODE` (**any**) → _varchar_"
Encode value with BASE85 encoding.

!!! function "`COALESCE` (**arg1**, **arg2**, ...) → _[input type]_"
Return the first item from args which is not `NULL`.

Expand All @@ -280,6 +303,15 @@ For more details, see [Working with Structs](https://mabel-dev.github.io/opteryx
!!! function "`HASH` (**any**) → _varchar_"
Calculate the [CityHash](https://opensource.googleblog.com/2011/04/introducing-cityhash.html) (64 bit).

!!! function "`HEX_DECODE` (**any**) → _varchar_"
Decode a value which has been encoded using HEX (BASE16) encoding.

!!! function "`HEX_ENCODE` (**any**) → _varchar_"
Encode value with HEX (BASE16) encoding.

!!! function "`NORMAL` () → _numeric_"
Random number from a normal (Gaussian) distribution; distribution is centred at 0.0 and have a standard deviation of 1.0.

!!! function "`MD5` (**any**) → _varchar_"
Calculate the MD5 hash.

Expand Down
5 changes: 4 additions & 1 deletion docs/SQL Reference/07 Aggregates.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ Aggregate functions generally ignore `NULL` values when performing calculations.

## General Functions

!!! function "`ANY_VALUE` (**column**) → _any_"
Select any single value from the grouping.

!!! function "`APPROXIMATE_MEDIAN` (**column**: _numeric_) → _numeric_"
Approximate median of a column with T-Digest algorithm.

Expand Down Expand Up @@ -42,7 +45,7 @@ Aggregate functions generally ignore `NULL` values when performing calculations.
The minimum and maximum values in **column**.

!!! function "`ONE` (**column**) → _any_"
Select a single value from the grouping.
Alias for 'ANY_VALUE'()

!!! function "`PRODUCT` (**column**: _numeric_) → _numeric_"
The product of values in **column**.
Expand Down
15 changes: 15 additions & 0 deletions opteryx/functions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ def _raise_exception(text):
# the first entry is NONE.
FUNCTIONS = {
"VERSION": _repeat_no_parameters(get_version),

# TYPE CONVERSION
"TIMESTAMP": cast("TIMESTAMP"),
"BOOLEAN": cast("BOOLEAN"),
Expand All @@ -174,6 +175,7 @@ def _raise_exception(text):
"TRY_NUMERIC": try_cast("NUMERIC"),
"TRY_VARCHAR": try_cast("VARCHAR"),
"TRY_STRING": try_cast("VARCHAR"), # alias for VARCHAR

# STRINGS
"LEN": _iterate_single_parameter(get_len), # LENGTH(str) -> int
"LENGTH": _iterate_single_parameter(get_len), # LENGTH(str) -> int
Expand All @@ -185,6 +187,7 @@ def _raise_exception(text):
"REVERSE": compute.utf8_reverse,
"SOUNDEX": string_functions.soundex,
"TITLE": compute.utf8_title,

# HASHING & ENCODING
"HASH": _iterate_single_parameter(lambda x: format(CityHash64(str(x)), "X")),
"MD5": _iterate_single_parameter(string_functions.get_md5),
Expand All @@ -193,6 +196,14 @@ def _raise_exception(text):
"SHA512": _iterate_single_parameter(string_functions.get_sha512),
"RANDOM": number_functions.random,
"RAND": number_functions.random,
"NORMAL": number_functions.random_normal,
"BASE64_ENCODE": _iterate_single_parameter(string_functions.get_base64_encode),
"BASE64_DECODE": _iterate_single_parameter(string_functions.get_base64_decode),
"BASE85_ENCODE": _iterate_single_parameter(string_functions.get_base85_encode),
"BASE85_DECODE": _iterate_single_parameter(string_functions.get_base85_decode),
"HEX_ENCODE": _iterate_single_parameter(string_functions.get_hex_encode),
"HEX_DECODE": _iterate_single_parameter(string_functions.get_hex_decode),

# OTHER
"GET": _iterate_double_parameter(_get), # GET(LIST, index) => LIST[index] or GET(STRUCT, accessor) => STRUCT[accessor]
"LIST_CONTAINS": _iterate_double_parameter(other_functions.list_contains),
Expand All @@ -213,6 +224,10 @@ def _raise_exception(text):
"TRUNC": compute.trunc,
"TRUNCATE": compute.trunc,
"PI": _repeat_no_parameters(number_functions.pi),
"INT": _iterate_single_parameter(int),
"INTEGER": _iterate_single_parameter(int),
"FLOAT": _iterate_single_parameter(float),

# DATES & TIMES
"DATE_TRUNC": _iterate_double_parameter_field_second(date_trunc),
"TIME_BUCKET": date_functions.date_floor,
Expand Down
7 changes: 7 additions & 0 deletions opteryx/functions/number_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,10 @@ def round(*args):

def random(size):
return numpy.random.uniform(size=size)


def random_normal(size):
from numpy.random import default_rng

rng = default_rng()
return rng.standard_normal(size)
55 changes: 55 additions & 0 deletions opteryx/functions/string_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from email.mime import base
import numpy


Expand Down Expand Up @@ -92,3 +93,57 @@ def get_sha512(item):
import hashlib # delay the import - it's rarely needed

return hashlib.sha512(str(item).encode()).hexdigest() # nosec - meant to be MD5


def get_base64_encode(item):
"""calculate BASE64 encoding of a string"""
import base64

if not isinstance(item, bytes):
item = str(item).encode()
return base64.b64encode(item).decode("UTF8")


def get_base64_decode(item):
"""calculate BASE64 encoding of a string"""
import base64

if not isinstance(item, bytes):
item = str(item).encode()
return base64.b64decode(item).decode("UTF8")


def get_base85_encode(item):
"""calculate BASE85 encoding of a string"""
import base64

if not isinstance(item, bytes):
item = str(item).encode()
return base64.b85encode(item).decode("UTF8")


def get_base85_decode(item):
"""calculate BASE85 encoding of a string"""
import base64

if not isinstance(item, bytes):
item = str(item).encode()
return base64.b85decode(item).decode("UTF8")


def get_hex_encode(item):
"""calculate HEX encoding of a string"""
import base64

if not isinstance(item, bytes):
item = str(item).encode()
return base64.b16encode(item).decode("UTF8")


def get_hex_decode(item):
"""calculate HEX encoding of a string"""
import base64

if not isinstance(item, bytes):
item = str(item).encode()
return base64.b16decode(item).decode("UTF8")
1 change: 1 addition & 0 deletions opteryx/operators/aggregate_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
"MINIMUM": "min", # alias
"MIN_MAX": "min_max",
"ONE": "hash_one",
"ANY_VALUE": "hash_one",
"PRODUCT": "product",
"STDDEV": "stddev",
"SUM": "sum",
Expand Down
2 changes: 1 addition & 1 deletion opteryx/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
2) we can import it in setup.py for the same reason
"""

__version__ = "0.3.0"
__version__ = "0.4.0-alpha.1"
18 changes: 17 additions & 1 deletion tests/sql_battery/test_battery_shape.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,9 @@
("SELECT VARCHAR(planetId) FROM $satellites GROUP BY planetId, VARCHAR(planetId)", 7, 1),
("SELECT TIMESTAMP(planetId) FROM $satellites GROUP BY planetId, TIMESTAMP(planetId)", 7, 1),
("SELECT NUMERIC(planetId) FROM $satellites GROUP BY planetId, NUMERIC(planetId)", 7, 1),
("SELECT INT(planetId) FROM $satellites GROUP BY planetId, INT(planetId)", 7, 1),
("SELECT INTEGER(planetId) FROM $satellites GROUP BY planetId, INTEGER(planetId)", 7, 1),
("SELECT FLOAT(planetId) FROM $satellites GROUP BY planetId, FLOAT(planetId)", 7, 1),
("SELECT CAST(planetId AS BOOLEAN) FROM $satellites", 177, 1),
("SELECT CAST(planetId AS VARCHAR) FROM $satellites", 177, 1),
("SELECT CAST(planetId AS TIMESTAMP) FROM $satellites", 177, 1),
Expand Down Expand Up @@ -473,6 +476,7 @@
("SELECT COUNT_DISTINCT(planetId) FROM $satellites", 1, 1),
("SELECT LIST(name), planetId FROM $satellites GROUP BY planetId", 7, 2),
("SELECT ONE(name), planetId FROM $satellites GROUP BY planetId", 7, 2),
("SELECT ANY_VALUE(name), planetId FROM $satellites GROUP BY planetId", 7, 2),
("SELECT MAX(planetId) FROM $satellites", 1, 1),
("SELECT MAXIMUM(planetId) FROM $satellites", 1, 1),
("SELECT MEAN(planetId) FROM $satellites", 1, 1),
Expand Down Expand Up @@ -502,6 +506,18 @@
("SELECT * FROM $planets INNER JOIN $planets AS b USING (id)", 9, 40),
("SELECT ROUND(5 + RAND() * (10 - 5)) rand_between FROM $planets", 9, 1),

("SELECT BASE64_DECODE(BASE64_ENCODE('this is a string'));", 1, 1),
("SELECT BASE64_ENCODE('this is a string');", 1, 1),
("SELECT BASE64_DECODE('aGVsbG8=')", 1, 1),
("SELECT BASE85_DECODE(BASE85_ENCODE('this is a string'));", 1, 1),
("SELECT BASE85_ENCODE('this is a string');", 1, 1),
("SELECT BASE85_DECODE('Xk~0{Zv')", 1, 1),
("SELECT HEX_DECODE(HEX_ENCODE('this is a string'));", 1, 1),
("SELECT HEX_ENCODE('this is a string');", 1, 1),
("SELECT HEX_DECODE('68656C6C6F')", 1, 1),
("SELECT NORMAL()", 1, 1),
("SELECT NORMAL() FROM $astronauts", 357, 1),

# These are queries which have been found to return the wrong result or not run correctly
# FILTERING ON FUNCTIONS
("SELECT DATE(birth_date) FROM $astronauts FOR TODAY WHERE DATE(birth_date) < '1930-01-01'", 14, 1),
Expand Down Expand Up @@ -606,7 +622,7 @@ def test_sql_battery(statement, rows, columns):

print(f"RUNNING BATTERY OF {len(STATEMENTS)} SHAPE TESTS")
for index, (statement, rows, cols) in enumerate(STATEMENTS):
print(f"{index:04}", statement)
print(f"{(index + 1):04}", statement)
test_sql_battery(statement, rows, cols)

print("✅ okay")

0 comments on commit 30c1471

Please sign in to comment.