Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expression vector processing improvements #17561

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Dec 12, 2024

Description

changes:

  • introduces FilteredInputBinding which adds better conditional expression processing support using a VectorMatch internally to selectively evaluate input vectors instead of precomputing all inputs, with nvl updated to take advantage of this
  • refactor some stuff to streamline expression vector processor implementation for simple functions like most math and logical operations with some new factory classes
  • update vector identifier expression processor to delegate evaluating results directly to the input binding selectors with ExprEvalBindingVector
  • add maxVectorSize() to ExprVectorProcessor to avoid having to pass max vector size around everywhere

some benchmarks with nvl before and after:

SELECT NVL(string2, CONCAT(string1, '-', long2)), SUM(double1) FROM expressions GROUP BY 1 ORDER BY 2
SELECT NVL(string1, CONCAT(string3, '-', long2)), SUM(double1) FROM expressions GROUP BY 1 ORDER BY 2
SELECT NVL(long1, long5 + long3), SUM(double1) FROM expressions GROUP BY 1 ORDER BY 2

before:

Benchmark                        (complexCompression)  (deferExpressionDimensions)  (query)  (rowsPerSegment)  (schemaType)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlExpressionBenchmark.querySql                  none                 singleString       49           1500000      explicit           MMAP              UTF8        force  avgt    5  258.528 ±  2.046  ms/op
SqlExpressionBenchmark.querySql                  none                 singleString       53           1500000      explicit           MMAP              UTF8        force  avgt    5  275.814 ±  1.727  ms/op
SqlExpressionBenchmark.querySql                  none                 singleString       57           1500000      explicit           MMAP              UTF8        force  avgt    5   68.072 ±  1.334  ms/op
SqlExpressionBenchmark.querySql                  none                   fixedWidth       49           1500000      explicit           MMAP              UTF8        force  avgt    5  475.695 ±  5.691  ms/op
SqlExpressionBenchmark.querySql                  none                   fixedWidth       53           1500000      explicit           MMAP              UTF8        force  avgt    5  476.026 ± 19.507  ms/op
SqlExpressionBenchmark.querySql                  none                   fixedWidth       57           1500000      explicit           MMAP              UTF8        force  avgt    5  479.159 ±  6.044  ms/op
SqlExpressionBenchmark.querySql                  none         fixedWidthNonNumeric       49           1500000      explicit           MMAP              UTF8        force  avgt    5  477.816 ±  6.072  ms/op
SqlExpressionBenchmark.querySql                  none         fixedWidthNonNumeric       53           1500000      explicit           MMAP              UTF8        force  avgt    5  470.072 ± 14.624  ms/op
SqlExpressionBenchmark.querySql                  none         fixedWidthNonNumeric       57           1500000      explicit           MMAP              UTF8        force  avgt    5   69.851 ±  1.485  ms/op
SqlExpressionBenchmark.querySql                  none                       always       49           1500000      explicit           MMAP              UTF8        force  avgt    5  477.870 ±  3.244  ms/op
SqlExpressionBenchmark.querySql                  none                       always       53           1500000      explicit           MMAP              UTF8        force  avgt    5  474.052 ± 15.498  ms/op
SqlExpressionBenchmark.querySql                  none                       always       57           1500000      explicit           MMAP              UTF8        force  avgt    5  471.010 ±  3.207  ms/op

after:

Benchmark                        (complexCompression)  (deferExpressionDimensions)  (query)  (rowsPerSegment)  (schemaType)  (storageType)  (stringEncoding)  (vectorize)  Mode  Cnt    Score    Error  Units
SqlExpressionBenchmark.querySql                  none                 singleString       49           1500000      explicit           MMAP              UTF8        force  avgt    5  204.239 ±  2.762  ms/op
SqlExpressionBenchmark.querySql                  none                 singleString       53           1500000      explicit           MMAP              UTF8        force  avgt    5  216.547 ±  1.920  ms/op
SqlExpressionBenchmark.querySql                  none                 singleString       57           1500000      explicit           MMAP              UTF8        force  avgt    5   38.502 ±  0.968  ms/op
SqlExpressionBenchmark.querySql                  none                   fixedWidth       49           1500000      explicit           MMAP              UTF8        force  avgt    5  480.898 ±  8.471  ms/op
SqlExpressionBenchmark.querySql                  none                   fixedWidth       53           1500000      explicit           MMAP              UTF8        force  avgt    5  457.998 ±  5.167  ms/op
SqlExpressionBenchmark.querySql                  none                   fixedWidth       57           1500000      explicit           MMAP              UTF8        force  avgt    5  476.647 ±  4.343  ms/op
SqlExpressionBenchmark.querySql                  none         fixedWidthNonNumeric       49           1500000      explicit           MMAP              UTF8        force  avgt    5  476.482 ±  3.979  ms/op
SqlExpressionBenchmark.querySql                  none         fixedWidthNonNumeric       53           1500000      explicit           MMAP              UTF8        force  avgt    5  457.149 ±  9.338  ms/op
SqlExpressionBenchmark.querySql                  none         fixedWidthNonNumeric       57           1500000      explicit           MMAP              UTF8        force  avgt    5   38.647 ±  0.943  ms/op
SqlExpressionBenchmark.querySql                  none                       always       49           1500000      explicit           MMAP              UTF8        force  avgt    5  478.346 ±  5.641  ms/op
SqlExpressionBenchmark.querySql                  none                       always       53           1500000      explicit           MMAP              UTF8        force  avgt    5  477.180 ± 14.496  ms/op
SqlExpressionBenchmark.querySql                  none                       always       57           1500000      explicit           MMAP              UTF8        force  avgt    5  478.263 ±  3.889  ms/op

note that this does seem to be a case where deferred expression processing is worse than normal vector processing, though not sure if/how we should tweak the strategies at the moment.

changes:
* introduces FilteredInputBinding which adds better conditional expression processing support using a VectorMatch internally to selectively evaluate input vectors instead of precomputing all inputs, with nvl updated to take advantage of this
* refactor some stuff to streamline expression vector processor implementation for simple functions like most math and logical operations with some new factory classes
* update vector identifier expression processor to delegate evaluating results directly to the input binding selectors with ExprEvalBindingVector
* add maxVectorSize() to ExprVectorProcessor to avoid having to pass max vector size around everywhere
matchMaker.setSelectionSize(3);

double[] doubles = filteredVectorInputBinding.getDoubleVector("double");
boolean[] nulls = filteredVectorInputBinding.getNullVector("double");

Check notice

Code scanning / CodeQL

Unread local variable Note test

Variable 'boolean[] nulls' is never read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant