-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Jitify versions of binaryops for non-homogeneous types #892
[REVIEW] Jitify versions of binaryops for non-homogeneous types #892
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only did a top-level review of the headers.
@devavret Just want to make sure you address my concerns in my review of the original libgdf PR (rapidsai/libgdf#94), as well as any concerns expressed by other reviewers. Thanks! |
* branch-0.6: (74 commits) accidental whitespace changelog remove 0.6 version specification from readme Remove duplicate imports and cimport of numpy np.ndim will try REALLY hard to cast to a numpy array which makes things take basically forever CHANGELOG and repair a few tests. Add DataFrame.groupby(level=0) functionality CHANGELOG Improve Series groupby fix indentation in merge Add agg method Update csv.py Update CHANGELOG.md Groupby support for Series Resolve ambiguous names ENH: Use RMM::exec_policy for thrust operations Update changelog for PR rapidsai#889 Deleted old version of test_rmm.py which is now moved to RMM repo change wording Update CHANGELOG.md ...
@harrism I presumed those would've been addressed already. I see they haven't. I'm on it. |
…nops * fea-ext-test-utils-improvements: Added member to column_wrapper. CHANGELOG. Updated size_t to gdf_size_type. Added initial table_wrapper class for an abstraction of multiple gdf_columns of different types that can be used for unit testing. Added CUDA error checking to column wrapper. Updated operator== to only compare non-null values. Added constructor to column_wrapper that accepts solely a host_data vector. Defaults the valid buffer to null. Updates for column_wrapper. Added wrapper class for gdf_column for unit testing purposes. Updated uint32_t to int32_t because unsigned types are not supported gdf_dtypes. Added type_to_enum traits struct. Updated print_gdf_column to use type_dispatcher. Hash test currently failing. # Conflicts: # CHANGELOG.md
…nops * fea-ext-test-utils-improvements: moved order of includes. Added option to allocat bitmask to column_size constructor for column_wrapper. Added documentation. Added set_null_count function.
…nops * fea-ext-test-utils-improvements: Documentation for gdf_column conversion operator. Added conversion operator from column_wrapper to gdf_column to avoid need to use .get(). Added first test of column_wrapper. Temporarily deleted copy and assignment from column_wrapper. Moved tuple for_each to utility folder. Documentation. Temporarily undefine NDEBUG to run Death tests for type dispatcher. Updated Dispatcher Tests to test new traits. Also added additional tests to cause failed tests if a new type is added without updating types. Updated constructor documentaiton. Added intitial unit tests file for column_wrapper. For_each over tuple elements and documentation.
- Modulo - Power
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to update this PR and get it working.
Several remnants that were inherited from the original PR need to be updated:
- Parameter names like
vax
andvay
are confusing. Should instead be renamed withlhs
andrhs
orleft
andright
. - Most of the internal implementation files are missing Doxymentation
- Lots of custom and reimplemented traits that are already present in Jitify or that should be added to Jitify (e.g., std::common_type). Removing these will significantly simplify this code.
- Non-standard error checking functions that should user
GDF_REQUIRE
. - The null handling in the kernels is unnecessary and inefficient. It can instead be done as a simple memcpy (outside of the kernel).
cpp/include/cudf/types.h
Outdated
@@ -85,6 +85,36 @@ typedef struct { | |||
// here we can also hold info for decimal datatype or any other datatype that requires additional information | |||
} gdf_dtype_extra_info; | |||
|
|||
/**---------------------------------------------------------------------------* | |||
* @union gdf_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be gdf_datum
, signifying the singular rather than the plural?
gdf_data
sounds super- vague, and is also confusing IMHO since it can be interpreted to mean lots of data, or all data in a column etc.
return "None"; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come on... don't do this. Use an array. Or a flat map (if we have one; probably not). Or at worst, an unordered map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes needed
.gitmodules
Outdated
@@ -10,3 +10,6 @@ | |||
path = thirdparty/rmm | |||
url = https://github.com/rapidsai/rmm.git | |||
branch = branch-0.6 | |||
[submodule "thirdparty/jitify"] | |||
path = thirdparty/jitify | |||
url = https://github.com/NVIDIA/jitify.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we pin this to a specific branch / tag / commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please pin to a commit to avoid build issues in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this do? Or do I need to specify it in .gitmodules
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to specify it in .gitmodules
otherwise a git submodule update --init --recursive --remote
will update it to the latest commit of master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* branch-0.6: (30 commits) Update column.cpp Update cpp/src/dataframe/column.cpp added brief non doxygen comments to column.cpp, removed some more horizontal lines from functions.h removed horizontal lines from types.h, made one line comments us ///< instead of /**<, removed verbose language from comments changelog Use device array instead of numpy array for index gather call. Allow DeviceNDArrays to be used for getitem calls. Moved CUDA_CHECK_LAST after CUDA_TRY. Documentation. Update cpp/tests/error/error_handling_test.cu More tests for CHECK_STREAM. Documentation. Added initial tests for CHECK_STREAM. Added STREAM_CHECK macro to synchronize and error check a stream. Added test catching exception from CUDA_TRY. Updated CUDA_EXPECTS to CUDA_TRY. CUDA_EXPECTS. In progress. removed Reason: text. Added test for the contents of the what() message. Removed references to cudf::detail::logic_error in place of cudf::logic_error. Moved the exception out of the detail namespace. ... # Conflicts: # cpp/include/cudf/types.h
.gitmodules
Outdated
@@ -10,3 +10,6 @@ | |||
path = thirdparty/rmm | |||
url = https://github.com/rapidsai/rmm.git | |||
branch = branch-0.6 | |||
[submodule "thirdparty/jitify"] | |||
path = thirdparty/jitify | |||
url = https://github.com/NVIDIA/jitify.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please pin to a commit to avoid build issues in the future.
DEPENDS stringify | ||
MAIN_DEPENDENCY ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h) | ||
|
||
add_custom_target(stringify_run DEPENDS ${CMAKE_BINARY_DIR}/include/types.h.jit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEPENDS
does not have to be a path, it can be a CMake variable.
For example:
add_custom_command(OUTPUT TYPES_JIT
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include
COMMAND ${CMAKE_BINARY_DIR}/stringify cudf/types.h > ${CMAKE_BINARY_DIR}/include/types.h.jit
...)
...
add_custom_target(stringify_run DEPENDS TYPES_JIT)
...
As long as the MAIN_DEPENDENCY
is properly specified to be ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h
, CMake will update jit_types
if that file changes.
Note that this is helpful if later you expect your command to have multiple outputs.
I would also consider renaming stringify_run
to something more descriptive. I'm not really sure what the purpose of this target is off the cuff, and that obfuscates our build process for new developers, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this but it didn't work. Running make
rebuilds launcher.cpp
every time.
add_custom_command(OUTPUT STRINGIFIED_HEADERS
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include
COMMAND ${CMAKE_BINARY_DIR}/stringify cudf/types.h > ${CMAKE_BINARY_DIR}/include/types.h.jit
COMMENT "Run stringify on header types.h to convert it to c-str for use in JIT compiled code"
DEPENDS stringify
MAIN_DEPENDENCY ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h)
add_custom_target(stringified_headers DEPENDS STRINGIFIED_HEADERS)
add_dependencies(cudf stringified_headers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry what does the cmake code above have to do with launcher.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is for making and running an executable that comes with Jitify. This executable, called stringify, converts source files into c-strings and writes them to another source file. Now those stringified source files can be used in JIT code compilation. In our specific use case, we wanted to be able to use the definitions in types.h
in the JIT kernels. So we use stringify to convert types.h
into a string and store it in types.h.jit
in the build/include directory. That types.h.jit
is included in launcher.cpp
. I wanted a way to use CMake to run stringify only when types.h
is changed. With @mt-jones ' suggestion, CMake runs it all the time and every time i run make
, launcher.cpp
is rebuilt because the include is touched.
I think the only thing left is the change of the submodule to a fork on the rapidsai project. Once that's done I'll approve. |
Can one of the admins verify this patch? |
add to whitelist |
I'm only waiting on the ops ticket to fork jitify. I cannot see the issue right now. (404 for me). So I'm waiting to be added to the rapids org. Edit: nvm, found the fork. |
Work in this PR
This PR carries forward the work of @ironbit from libgdf PR 94.
Closes #593
Closes #1057
Depends on #694
Investigation:
Implementation:
Results of Investigation:
Jitify Overheads
I ran a simple vector add on 10 million elements, multiple times to find the overhead of JIT compiling and the overhead of retrieving and running a cached compiled kernel. The results are in the table below (all times are in ms):
The cost of JIT compiling is indeed large but the cost of running a cached kernel (hashing and retrieving) is negligible. So any future effort needs to be in cache persistence.
Since the API cost is almost double the compile cost, I also tried to break down the cost of JIT program creation which was divided into two parts: reading the program string and compiling.
Running different instantiations of kernels (different binop/Types) after the first one, does not incur this Program init cost, only the Compile cost.
I didn't dig deeper into what the cost of reading the program string is all about.
Global cache implementation
There's three levels of cache persistence I considered:
Thread level
This was already implemented in libgdf PR 94
Process level
Jitify has an environment variable
JITIFY_THREAD_SAFE
that can be enabled to turn on thread safety. The cache was simply stored globally with that option turned on and a CMake option was added to turn this off if you want.I confirmed that this worked and did provide performance benefit by running a simple script:
Which when ran with thread local cache, took 23 s and with process level cache, took 0.7282 s
Inter process level
This was investigated but wasn't implemented because the Jitify API to serialize/deserialize is experimental and also requires the user to take care of the caching. Also to limit the scope of this PR.
Possible ways to implement this:
std::filesystem::temp_directory_path
but would need .lock files as unix doesn't have exclusive access to files.In either case, more information is needed about usage scenarios.