forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gandiva C++ Merge. #7
Closed
praveenbingo
wants to merge
52
commits into
dremio:gandiva-merge
from
praveenbingo:gandiva-merge-final
Closed
Gandiva C++ Merge. #7
praveenbingo
wants to merge
52
commits into
dremio:gandiva-merge
from
praveenbingo:gandiva-merge-final
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Bootstrap evaluation using llvm code generation LLVM code generation is done using a mix of : - glue IR code that loops over the vector, generates function calls (and) - byte-code files generated from simple c++ functions using clang (emit-llvm). The glue-code and pre-compiled byte code are merged and optimized together. Expressions are specified using a "tree builder" where each node is an arrow vector, or a binary/unary function. During code generation, the expressions are "decomposed" so that the value array and bitmap array are evaluated separately to compute the expression result. This avoids the use of too many branch/conditional instructions (checks for "if null"), and hence, can be vectorized efficiently. Support added for arithmetic and logical expressions on numeric types. Travis CI support added for build on ubuntu.
Separate out the public and private target dependencies. For arrow, export an interface target. This avoids the need to add include dirs for each dependency on arrow. Removed dependency on gtest. Instead, build it as an external project. This is the recommended practice for googletest. For pre-compiled files, generate the bitcode files for each of them independently and then, link them to generate a unified bitcode file. Removed cpplint exceptions since there is no more sourcing of .cc files. Separate out the public include files from private includes, and add them in the dependency list in cmake. pass the bytecode filepath from cmake (instead of /tmp)
…he#8) * GDV-43: [C++] Introduce error codes as error handling strategy. Introduced status codes and using the same as the error handling strategy. The decision was taken to accommodate existing libraries that use error codes and because Arrow also uses error codes and not exception. Changed the signatures across the board for the same.
The pre-compiled functions takes an extra arg (bool *) to set the result validity. At decompose time, a local bitmap is assigned to track the result validity bits for such functions. At evaluate time, sufficient number of local bitmaps are allocated for all the local bitmaps. For the final computation of the expression validity, the input bitmaps can be either one of the value-vector bitmaps, or a local bitmap.
Replaced MakeUnaryFunction, MakeBinaryFunction with a simpler MakeFunction that takes a vector of args.
An if-else expression has three sub-expressions : - condition - then-expression - else-expression Each of these can again be a node in the expression tree. The result validity of the if-else expression is saved in a local bitmap. Also, moved all of the integ tests to a different folder (integ) so that there is no mix of include files.
- moved the expression decomposition logic from Node class to a visitor class (ExprDecomposer) - moved node.h out of external includes - renamed Evaluator to Projector
Added support for literals (int32, int64, float, double and bool).
In case of nested if-else conditions, eg. if A else if B else if C else D The else parts of A & C will not update validity bitmaps. Only the if parts and the terminal else (i.e D) update bitmaps.
Split gandiva into two sub modules : codegen & jni. - codegen is the core having cpp APIs and LLVM - jni deals with protobufs & interfacing with java
Dremio allocates the output vectors in java and passes the pointers to gandiva. In that case, gandiva will use the passed in buffers. Made Evaluate use ArrayData internally for output buffers, since Array is expected to be immutable.
Also, added "Adapted from XX" comments in ci/travis
- Added definitions for other integer types (int8, int16) - Added definitions for unsigned types - Added a test for arithmetic ops on all int types - The functions should be inlined in the pre-compiled library, but not in the unit tests. Added a compiler flag to control this.
* GDV-58: [CPP] Fix order of includes. Fixing the order of includes to follow style guideline. The order to follow is documented here : https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes Also enabled the check in lint.
GDV-7: Gandiva Java APIs Added the JNI Implementation of the Java APIs Added Java based unit and integration tests Use cmake to build gandiva_jni Added pom.xml to build Java files
Validating the input schema and expressions during the projector build.
- tree builder api for and/or - decomposer/validator for and/or - code generator for and/or - tests for and/or
- add tree-builder, codegen support for null literals - moved the code for final bitmap computation to class BitMapAccumulator
add java bindings for and/or
add java bindings for null literals
Support date/time types in Java Add cpp/Java tests for date/time types
Loading Gandiva dynamically in java bindings. Packaging the dynamic library and byte code files in Gandiva JAR. Introduced configuration object to customize Gandiva at runtime.
- Track offsets buffer for string/binary - annotator/generator support for string/binary - literal support for string/binary
Modified the build to package the gandiva jni as a stand alone library that can be packaged in the Gandiva JAR. Also producing two versions of gandiva core - a static and a shared one. Fixed LLVM dependencies to be target based.
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
- added java bindings for varlen types/literals - minor cleanups in llvm generator and engine (reported by clang-tidy)
Added microbenchmarks in both cpp and Java
Support isnull, isnotnull, equal, and not_equal for date/time types Support date/time types for less_than, less_than_or_equal_to, greater_than, greater_than_or_equal_to Implement all extractXxx functions
- Switched to gcc-4.9, since the stdc++ linked with 4.8 doesn't work with llvm libs. - Build arrow in travis instead of the conda build (the conda built libarrow.a has undefined symbols je_arror_allocx, ..) - fixed an error in node.h that showed up when I toyed with clang compiler
Exporting supported data types and functions from Gandiva. Added a JNI bridge to access this from the java layer.
* Fix missing set the include directory of gtest * Fix to use same format as other dependencies
Fixed the implementation of extract second from time.
* GDV-28: [C++] Add hash functions on all data types * GDV-28: Fix stylecheck in travis to print diff * GDV-28: pick clang-format from llvm-binary dir * GDV-28: handle case when seed is null * GDV-28: [C++] Fix a style check
Added support for literals and null for time types.
Class references are local by default and eligible for GC. We would need to convert it to global reference on library load for it to be safely used for the program lifetime.
Add support for timestampaddXxx functions Add support for is_distinct_from, is_not_distinct_from, isnull, isnotnull, date_add/add, date_sub/subtract/date_diff, date_trunc_Xxx functions
…e#74) * Temporarily matching what the dremio does for mod zero. * Used the latest Arrow APIs for allocating buffers.
- similar to projection, filter is built for a specific schema and condition (i.e expression) - the output of filter is a selection vector (Int16Array)
* Add java bindings for filter expr * Mv selection vector impl to internal
Fixed some bugs in the filter code path.
Change the selection vector arrays as unsigned to match dremio.
1. Added lock to holder read to address potential race condition. 2. Fixed log message. 3, Addressed breaking arrow change.
1. In evaluate to lookup module, first do without lock and fallback only if module is not found. 2. Use release builds in travis.
Introducing a cache to hold the projectors and filters for re-use. The cache is a LRU that can hold 100 entries.
* GDV-31:[Java][C++]Fixed concurrency issue in cache. Modifications were happening in get without a mutex. Wrote a test to verify and prevent regression.
Literal string coversion was ignoring types, leading to mismatch in hashing of expressions.
- add a registry for "function holders" implemented in cpp - the function holder is instantiated at expression decomposition time - at eval time, the registered fn gets an extra param (the . function holder)
- To get around the java load issue, create a native library and load it in the LLVM module. This module has the hooks for all the c++ function helpers. - for files that are compiled in libgandiva_helpers, add into gandiva::helpers namespace. - merged status.cc into status.h
pprudhvi
pushed a commit
that referenced
this pull request
May 26, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). apache#7131 enabled a minimal set of tests as a starting point. I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`. ``` $ git log | head -1 commit ed5f534 % ctest ... Start 1: arrow-array-test 1/51 Test #1: arrow-array-test ..................... Passed 4.62 sec Start 2: arrow-buffer-test 2/51 Test #2: arrow-buffer-test .................... Passed 0.14 sec Start 3: arrow-extension-type-test 3/51 Test #3: arrow-extension-type-test ............ Passed 0.12 sec Start 4: arrow-misc-test 4/51 Test #4: arrow-misc-test ...................... Passed 0.14 sec Start 5: arrow-public-api-test 5/51 Test #5: arrow-public-api-test ................ Passed 0.12 sec Start 6: arrow-scalar-test 6/51 Test #6: arrow-scalar-test .................... Passed 0.13 sec Start 7: arrow-type-test 7/51 Test #7: arrow-type-test ...................... Passed 0.14 sec Start 8: arrow-table-test 8/51 Test #8: arrow-table-test ..................... Passed 0.13 sec Start 9: arrow-tensor-test 9/51 Test #9: arrow-tensor-test .................... Passed 0.13 sec Start 10: arrow-sparse-tensor-test 10/51 Test #10: arrow-sparse-tensor-test ............. Passed 0.16 sec Start 11: arrow-stl-test 11/51 Test #11: arrow-stl-test ....................... Passed 0.12 sec Start 12: arrow-concatenate-test 12/51 Test #12: arrow-concatenate-test ............... Passed 0.53 sec Start 13: arrow-diff-test 13/51 Test #13: arrow-diff-test ...................... Passed 1.45 sec Start 14: arrow-c-bridge-test 14/51 Test #14: arrow-c-bridge-test .................. Passed 0.18 sec Start 15: arrow-io-buffered-test 15/51 Test #15: arrow-io-buffered-test ............... Passed 0.20 sec Start 16: arrow-io-compressed-test 16/51 Test #16: arrow-io-compressed-test ............. Passed 3.48 sec Start 17: arrow-io-file-test 17/51 Test #17: arrow-io-file-test ................... Passed 0.74 sec Start 18: arrow-io-hdfs-test 18/51 Test #18: arrow-io-hdfs-test ................... Passed 0.12 sec Start 19: arrow-io-memory-test 19/51 Test #19: arrow-io-memory-test ................. Passed 2.77 sec Start 20: arrow-utility-test 20/51 Test #20: arrow-utility-test ...................***Failed 5.65 sec Start 21: arrow-threading-utility-test 21/51 Test #21: arrow-threading-utility-test ......... Passed 1.34 sec Start 22: arrow-compute-compute-test 22/51 Test #22: arrow-compute-compute-test ........... Passed 0.13 sec Start 23: arrow-compute-boolean-test 23/51 Test #23: arrow-compute-boolean-test ........... Passed 0.15 sec Start 24: arrow-compute-cast-test 24/51 Test #24: arrow-compute-cast-test .............. Passed 0.22 sec Start 25: arrow-compute-hash-test 25/51 Test #25: arrow-compute-hash-test .............. Passed 2.61 sec Start 26: arrow-compute-isin-test 26/51 Test #26: arrow-compute-isin-test .............. Passed 0.81 sec Start 27: arrow-compute-match-test 27/51 Test #27: arrow-compute-match-test ............. Passed 0.40 sec Start 28: arrow-compute-sort-to-indices-test 28/51 Test #28: arrow-compute-sort-to-indices-test ... Passed 3.33 sec Start 29: arrow-compute-nth-to-indices-test 29/51 Test #29: arrow-compute-nth-to-indices-test .... Passed 1.51 sec Start 30: arrow-compute-util-internal-test 30/51 Test #30: arrow-compute-util-internal-test ..... Passed 0.13 sec Start 31: arrow-compute-add-test 31/51 Test #31: arrow-compute-add-test ............... Passed 0.12 sec Start 32: arrow-compute-aggregate-test 32/51 Test #32: arrow-compute-aggregate-test ......... Passed 14.70 sec Start 33: arrow-compute-compare-test 33/51 Test #33: arrow-compute-compare-test ........... Passed 7.96 sec Start 34: arrow-compute-take-test 34/51 Test #34: arrow-compute-take-test .............. Passed 4.80 sec Start 35: arrow-compute-filter-test 35/51 Test #35: arrow-compute-filter-test ............ Passed 8.23 sec Start 36: arrow-dataset-dataset-test 36/51 Test #36: arrow-dataset-dataset-test ........... Passed 0.25 sec Start 37: arrow-dataset-discovery-test 37/51 Test #37: arrow-dataset-discovery-test ......... Passed 0.13 sec Start 38: arrow-dataset-file-ipc-test 38/51 Test #38: arrow-dataset-file-ipc-test .......... Passed 0.21 sec Start 39: arrow-dataset-file-test 39/51 Test #39: arrow-dataset-file-test .............. Passed 0.12 sec Start 40: arrow-dataset-filter-test 40/51 Test #40: arrow-dataset-filter-test ............ Passed 0.16 sec Start 41: arrow-dataset-partition-test 41/51 Test #41: arrow-dataset-partition-test ......... Passed 0.13 sec Start 42: arrow-dataset-scanner-test 42/51 Test #42: arrow-dataset-scanner-test ........... Passed 0.20 sec Start 43: arrow-filesystem-test 43/51 Test #43: arrow-filesystem-test ................ Passed 1.62 sec Start 44: arrow-hdfs-test 44/51 Test #44: arrow-hdfs-test ...................... Passed 0.13 sec Start 45: arrow-feather-test 45/51 Test #45: arrow-feather-test ................... Passed 0.91 sec Start 46: arrow-ipc-read-write-test 46/51 Test #46: arrow-ipc-read-write-test ............ Passed 5.77 sec Start 47: arrow-ipc-json-simple-test 47/51 Test #47: arrow-ipc-json-simple-test ........... Passed 0.16 sec Start 48: arrow-ipc-json-test 48/51 Test #48: arrow-ipc-json-test .................. Passed 0.27 sec Start 49: arrow-json-integration-test 49/51 Test #49: arrow-json-integration-test .......... Passed 0.13 sec Start 50: arrow-json-test 50/51 Test #50: arrow-json-test ...................... Passed 0.26 sec Start 51: arrow-orc-adapter-test 51/51 Test #51: arrow-orc-adapter-test ............... Passed 1.92 sec 98% tests passed, 1 tests failed out of 51 Label Time Summary: arrow-tests = 27.38 sec (27 tests) arrow_compute = 45.11 sec (14 tests) arrow_dataset = 1.21 sec (7 tests) arrow_ipc = 6.20 sec (3 tests) unittest = 79.91 sec (51 tests) Total Test time (real) = 79.99 sec The following tests FAILED: 20 - arrow-utility-test (Failed) Errors while running CTest ``` Closes apache#7142 from kiszk/ARROW-8754 Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replicating the gandiva cpp tree structure into arrow.
Note : It does not build yet, the work is being done locally.