Skip to content
This repository has been archived by the owner on Dec 21, 2018. It is now read-only.

[WIP] Binary Operators #94

Open
wants to merge 74 commits into
base: master
Choose a base branch
from
Open

[WIP] Binary Operators #94

wants to merge 74 commits into from

Conversation

wmalpica
Copy link
Contributor

Implemented so far is the first implementation concept using nested templating. It has been implemented with support for literals and 14 different binary operators.

We would like feedback on the API implemented, which is:

``
gdf_error gdf_scalar_operation(gdf_column* out, gdf_column* vax, gdf_scalar* vay, gdf_scalar* def, gdf_binary_operator ope);

gdf_error gdf_vector_operation(gdf_column* out, gdf_column* vax, gdf_column* vay, gdf_scalar* def, gdf_binary_operator ope);

enum gdf_binary_operator {
GDF_ADD,
GDF_SUB,
GDF_MUL,
GDF_DIV,
GDF_TRUE_DIV,
GDF_FLOOR_DIV,
GDF_MOD,
GDF_POW,
//GDF_COMBINE,
//GDF_COMBINE_FIRST,
//GDF_ROUND,
GDF_EQUAL,
GDF_NOT_EQUAL,
GDF_LESS,
GDF_GREATER,
GDF_LESS_EQUAL,
GDF_GREATER_EQUAL,
//GDF_PRODUCT,
//GDF_DOT
};

struct gdf_scalar {
union gdf_data {
std::int8_t si08;
std::int16_t si16;
std::int32_t si32;
std::int64_t si64;
float fp32;
double fp64;
};

gdf_data  data;
gdf_dtype type;

};
``

We have already started on a second implementation using NVRTC which uses the same API.

Unit tests have not yet been implemented and will be implemented after an implementation method has been decided.

This current implementation right now only supports the second operand being able to be a scalar, which can be improved upon easily later.

In order to compile the binary operation functionality, use the cmake variable "BINARY_OPERATION_VERSION" and choose the binary version.

Example:
cmake -DCMAKE_BUILD_TYPE=Release -DBINARY_OPERATION_VERSION:STRING=V1 ../../code/libgdf

This cmake switch was added right now because compilation time is very large due to all the functions generated by the templated code. For example compilation on a laptop with an 8-core i7 processor and 16Gb ram was:
Time to compile: 77m10.996s
Ram memory used: More than 13.5GB
Release binary size: 69,931,016 bytes

Implemented first approach, multiple dispatch and generic programming.
@kkraus14 kkraus14 added the 2 - In Progress Currenty a work in progress label Aug 14, 2018
@scopatz
Copy link

scopatz commented Aug 14, 2018

This has a conflict with master now

Christian Noboa Mardini and others added 14 commits August 17, 2018 15:17
The 2nd approach using NVRTC ("Jitify" library).
Base version for review.
The 2nd approach using NVRTC ("Jitify" library).
Base and stable version of the 2nd approach.
The 2nd approach using NVRTC ("Jitify" library).
Added kernels for binary operations with default values.
The 2nd approach using NVRTC ("Jitify" library).
Improved kernel implementation. Scalar type will be converted at kernel execution (type defined).
The 2nd approach using NVRTC ("Jitify" library).
Added all base types to gdf_dtypes.
The 2nd approach using NVRTC ("Jitify" library).
Changed gdf_scalar interface.
The 2nd approach using NVRTC ("Jitify" library).
Erased 'valid' field input for kernels without a default value.
The 2nd approach using NVRTC ("Jitify" library).
Updated valid field in kernels and added integration tests for all kernels.
Using Jitify utility to choose the optimal block size at kernel calls.
Fixed gtest compilation issues.
Fixed gtest issues
Added shared libraries for tests.
The 2nd approach using NVRTC ("Jitify" library).
Erased unused functionality.
The 2nd approach using NVRTC ("Jitify" library).
Added binary operations.
Copy link
Member

@harrism harrism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern with this PR is that there is a lot of code and basically zero documentation. First, I think it needs high-level documentation of the functionality and organization, and it needs operational documentation for the main APIs being introduced (both external and internal APIs).

src/binary/binary2/kernel.cpp Outdated Show resolved Hide resolved
@@ -16,6 +20,23 @@ typedef enum {
N_GDF_TYPES, /* additional types should go BEFORE N_GDF_TYPES */
} gdf_dtype;

union gdf_data {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the names in this union required to only be four characters for some reason? It would be nice to spell them out better, especially "invd", "tmst", "dt32" and "dt64".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a particular reason. Is there any name style and code style for this project?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Human readable variable names is good practice for every project.

}
}

template <typename TypeOut, typename TypeVax, typename TypeVay, typename TypeDef, typename TypeOpe>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, TypeDef is very risky typename, considering its proximity to typedef. Could just use TDef, TVax, etc. Or reorder, putting Type last in the names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, it will be changed.


template <typename TypeOut, typename TypeVax, typename TypeVay, typename TypeDef, typename TypeOpe>
__global__
void kernel_v_s_d(int size, gdf_data def_data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should gdf_data on this line be TypeDef?

AbstractOperation<TypeOpe> operation;
out_data[i] = operation.template operate<TypeOut, TypeVax, TypeVay>(vax_data_aux, (TypeVay)vay_data);

__syncwarp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for __syncwarp() here? I don't see any sharing of data between threads. Is it necessary? Same question for all the other __syncwarp() calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not correct. It'll be removed.

return *this;
}

Launcher& Launcher::instantiate(gdf_column* out, gdf_column* vax, gdf_column* vay, gdf_binary_operator ope) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since each of these instantiate functions varies only in the type of one argument, and their bodies are identical, couldn't you template them to reduce redundancy? You might even use variadic templates to collapse all into one instantiate and one launch function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point

src/binary/binary2/operation.cpp Outdated Show resolved Hide resolved
R"***(
#pragma once

struct IntegralSigned {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, I believe Jitify already has some support for std::type_traits, e.g. https://github.com/NVIDIA/jitify/blob/master/jitify.hpp#L1124
Do you need to roll your own?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some issue with that. I expressed it in my daily scrum. It is supposed to be working, but it isn't. I included the libraries, however "common_type" and others aren't working, because of the error messages using nvrtc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's work with Kate Clark and Ben Barsdell on making Jitify better. Are you in touch with them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I'm not. I remember having issues related to "type_traits" header and with compiler option "--device-as-default-execution-space", which wasn't working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will contact them.

return (TypeOut)pow((double)vax, (double)vay);
}
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice in this situation (closing brackets far from their openings) to comment what is being closed, e.g.:

} // namespace operation
} // namespace test
} // namespace gdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is important. I was focused on implementation.

ironbit and others added 12 commits August 28, 2018 16:33
The 2nd approach using NVRTC ("Jitify" library).
Erased syncwarp primitive.
Changed submodule "Jitify" to https protocol.
Added CUDA configuration for NVRTC library.
Added 'stubs' directory in cmake for CUDA library.
Changed default string variable for selection of binary operations.
Added google benchmark for binary operations - NVRTC implementation.
Changed interface for cffi python.
Solved issues related to other tests due to a fix in the storage duration of Jitify cache.
Added 'stubs' directory in the library path (python) for NVRTC and CUDA libraries.
Improved vector library code.
Fixed issue related to inputs verification.
Copy link
Contributor

@nsakharnykh nsakharnykh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did a live PR review with @jrhemstad @williamBlazing and @ironbit - requesting more changes. Also, we need the python bindings updated to use the new binary ops API: @mt-jones, can you post an update on where we are with this?

@@ -0,0 +1,36 @@
#=============================================================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this for CI? @mike-wendt

GDF_GREATER,
GDF_LESS_EQUAL,
GDF_GREATER_EQUAL,
//GDF_COMBINE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a description/comment why those are not implemented

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, preferably, please remove.

message(STATUS "Binary Operation Version: V2 - NVRTC")

set(source_files
src/binary/binary2/binary.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename binary2 folder to binary_nvrtc or binary_jitify

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code has been reorganized, that's why I think it's convenient to reply to every observation.
https://github.com/BlazingDB/libgdf/tree/binary-operators-draft/src/binary-operation/jit


template <typename TypeOut, typename TypeVax, typename TypeVay, typename TypeOpe>
__global__
void kernel_v_s(int size, TypeOut* out_data, TypeVax* vax_data, gdf_data vay_data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need kernel_s_v as well - should be 6 combinations overall, one kernel per the top level GDF function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the remaining scalar-vector operations.

#include "binary/binary2/cuda.h"

namespace gdf {
static thread_local jitify::JitCache JitCache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to store JitCache when exiting the process to store in a file, then use an env variable to read it from the specified file to have the cache enabled between different processes

}
};
/*
struct Add : public AbstractOperation<Add> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a comment with the description why this could be useful, i.e. for the overflows - if we ever wanted to enable this "feature" in the future

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source code has been optimized for jit.
The commented code has been moved outside the jit code and it has been added a description.
https://github.com/BlazingDB/libgdf/blob/96012d91aff817e6105a8fa8983e1587d237493f/src/binary-operation/jit/code/operation.cpp#L159

@@ -0,0 +1,109 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move library under bench/binary or tests/binary since it's only used in tests, probably also rename to something like binary_helper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,203 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be moved to src/bench/binary to have a folder per libgdf operator (one folder for binary, one for parquet, one for joins, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


for (int i=start; i<size; i+=step) {
AbstractOperation<TypeOpe> operation;
out_data[i] = operation.template operate<TypeOut, TypeVax, TypeVay>(vax_data[i], (TypeVay)vay_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to set the output valid bit mask - I don't see it being handled in the code. It should be an OR between the two bit masks of the two input operands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output valid bitmask is processed in all kernels.
The kernels also have been optimized using 'Bit Hacks'. In the benchmark, the kernel (v_v_v_d) reduces its time approx. in 5us in all of its benchmarks, while the kernel (v_v_v) increments its time approx. in 7us due to the bitmask processing.
https://github.com/BlazingDB/libgdf/blob/729cbfac6ae2281894b99644c58643184bd18d85/src/binary-operation/jit/code/kernel.cpp#L82

Christian Noboa Mardini added 3 commits September 25, 2018 12:33
Reorganized the source code.
Optimized the source code in Jit.
Added namespace documentation.
return "Pow";
//case GDF_COMBINE:
//case GDF_COMBINE_FIRST:
//case GDF_ROUND:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these simply not implemented? Probably should remove or make a comment

Christian Noboa Mardini and others added 11 commits September 27, 2018 12:28
Moved 'util' library used in tests and benchmark.
Optimized Jit kernel code (bit hacks) and added the output valid bitmask in some kernels.
Added documentation related to the new interface.
Changed the interface ('valid' field in gdf_scalar).
Added the changes for scalar vector operations.
Added the standard header for math functions.
Added a 'libcuda' soft link.
@mike-wendt mike-wendt added 4 - Needs Rework Additional work is needed migrate to cudf and removed 2 - In Progress Currenty a work in progress labels Oct 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
4 - Needs Rework Additional work is needed migrate to cudf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants