Skip to content
Merged
Show file tree
Hide file tree
Changes from 111 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
0421c02
The core implementation. Ported from PR 94 on libgdf.
devavret Feb 6, 2019
d66bfda
Jitify binops google tests.
devavret Feb 7, 2019
c23fda5
Cleanups
devavret Feb 7, 2019
f84a330
Merge branch 'branch-0.6' into enh-ext-jitify-binops
devavret Feb 11, 2019
ca64bde
Merge branch 'fea-ext-test-utils-improvements' into enh-ext-jitify-bi…
devavret Feb 11, 2019
061d4e5
Merge branch 'fea-ext-test-utils-improvements' into enh-ext-jitify-bi…
devavret Feb 12, 2019
c9791b1
Refactored scalar implementation, added a test util to work with scal…
devavret Feb 12, 2019
590bb5c
Merge branch 'fea-ext-test-utils-improvements' into enh-ext-jitify-bi…
devavret Feb 13, 2019
9017317
The tests are all clean now and only use our own utils.
devavret Feb 13, 2019
df64885
Added cython bindings
devavret Feb 16, 2019
8fd5081
Cleanup types and doc.
devavret Feb 16, 2019
beec867
Use jit binop in cuDF for heterogeneous types.
devavret Feb 16, 2019
7fe29b8
Fixed bug where jit kernel assumes we always have null mask.
devavret Feb 16, 2019
8e54098
fixed another bug where null mask existence was assumed, now in cuDF
devavret Feb 17, 2019
ada453e
Enabled global JIT cache and confirmed performance and correctness. t…
devavret Feb 17, 2019
88c57c1
Added remaining binary ops were previously unimplemented
devavret Feb 18, 2019
b6650ce
Added pytests for the two new ops
devavret Feb 18, 2019
f9dea20
renamed Vax and Vay to Lhs and Rhs.
devavret Feb 20, 2019
48302f7
Documentation cleanup in types.
devavret Feb 20, 2019
d553db0
change name of namespace gdf to cudf.
devavret Feb 20, 2019
4413a78
changed error handling to use GDF_REQUIRE
devavret Feb 20, 2019
ca9e43e
documentation for launcher
devavret Feb 20, 2019
a55f111
Some cleanup of traits.
devavret Feb 20, 2019
b3673fc
Merge branch 'bug-binops-nullmask-and' into enh-ext-jitify-binops
devavret Feb 21, 2019
52892b6
Merge branch 'fea-ext-test-utils-improvements' into enh-ext-jitify-bi…
devavret Feb 21, 2019
96f83e7
modified `scalar_wrapper` to work with updated type_dispatcher
devavret Feb 21, 2019
fea6751
fix merge conflict that wasn't detected earlier
devavret Feb 21, 2019
bf4dc61
Removed valid mask binary op from JIT kernel
devavret Feb 21, 2019
358e0c0
fix bug in null mask calculation
devavret Feb 22, 2019
877ac61
Merge branch 'branch-0.6' into enh-ext-jitify-binops
devavret Feb 25, 2019
ca0b31f
We didn't need common_type after all
devavret Feb 25, 2019
b269ee4
Removed passing valid mask pointers to jit kernels
devavret Feb 25, 2019
322e4a5
Jit kernel now uses gdf_size_type
devavret Feb 25, 2019
35c2c46
Changed `getTypeName` to use compiler generated string instead of har…
devavret Feb 26, 2019
d93fa0b
style fix
devavret Feb 26, 2019
df641c4
Added modulo to index because we support it now.
devavret Feb 26, 2019
aaae814
fixing pytests.
devavret Feb 26, 2019
02e2a9e
Merge branch 'branch-0.6' into enh-ext-jitify-binops
devavret Feb 26, 2019
e0c8fab
fix inlining for util functions.
devavret Feb 26, 2019
be31178
Merge branch 'fea-ext-test-utils-improvements' into enh-ext-jitify-bi…
devavret Feb 27, 2019
c5cbd8b
documentation update
devavret Feb 27, 2019
b5c729b
changed casts and traits in jit code
devavret Feb 27, 2019
febbad6
more doc changes.
devavret Feb 27, 2019
d0e30b2
removed changelog duplicates (artifacts of merge conflicts)
devavret Feb 27, 2019
dde39a7
refactored the mask calculation.
devavret Feb 27, 2019
86ea59b
removed redundant null_count calculation and fixed a bug in calculati…
devavret Feb 28, 2019
11b67dc
fix rebuilding of launcher.cpp everytime because types.h.jit was alwa…
devavret Feb 28, 2019
9520853
Changed gdf_scalar to use union and is_valid bool member.
devavret Mar 4, 2019
be21879
added nvidia license
devavret Mar 4, 2019
3946e1c
Merge branch 'fea-ext-test-utils-improvements' into enh-ext-jitify-bi…
devavret Mar 5, 2019
78a3cc8
Merge branch 'branch-0.6' into enh-ext-jitify-binops
devavret Mar 5, 2019
99a943f
added TODO #1119
devavret Mar 6, 2019
500e188
Merge branch 'branch-0.6' into enh-ext-jitify-binops
devavret Mar 6, 2019
ec1116b
documentation cleanup
devavret Mar 6, 2019
c0243e6
one more doc change that i missed.
devavret Mar 7, 2019
9c729c7
changed JITIFY thread safe macro in CMakeLists
devavret Mar 7, 2019
4626f7d
Changed union members to typedef'd versions
devavret Mar 7, 2019
b5d1f78
Merge branch 'branch-0.6' into enh-ext-jitify-binops
devavret Mar 7, 2019
7fe729a
include the changed location of `error_utils.hpp`
devavret Mar 7, 2019
79d5111
updated jitify
devavret Mar 7, 2019
8c11073
Changed Cython binding map to dict.
devavret Mar 7, 2019
87e0b41
Clean Cmakelists
devavret Mar 7, 2019
0d797fe
Merge branch 'branch-0.6' into enh-ext-jitify-binops
kkraus14 Mar 8, 2019
09ffe36
Use rapids' fork of Jitify submodule.
devavret Mar 8, 2019
2bfa21e
now pointing to a branch in Forked Jitify
devavret Mar 8, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@
path = thirdparty/rmm
url = https://github.com/rapidsai/rmm.git
branch = branch-0.6
[submodule "thirdparty/jitify"]
path = thirdparty/jitify
url = https://github.com/NVIDIA/jitify.git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pin this to a specific branch / tag / commit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please pin to a commit to avoid build issues in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this do? Or do I need to specify it in .gitmodules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Collaborator

@kkraus14 kkraus14 Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify it in .gitmodules otherwise a git submodule update --init --recursive --remote will update it to the latest commit of master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't seem to find a way to pin this to a commit/tag in the .gitmodules file. I read this and update seems to be the only option. But details on update say that the only way to do this is to use none. Is that what I should do?

5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@

## Improvements

- PR #892 Add support for heterogeneous types in binary ops with JIT
- PR #730 Improve performance of `gdf_table` constructor
- PR #813 unified libcudf API functions by replacing gpu_ with gdf_
- PR #822 Add support for `__cuda_array_interface__` for ingest
Expand All @@ -46,14 +47,16 @@
- PR #836 Add ingest support for arrow chunked arrays in Column, Series, DataFrame creation
- PR #763 Format doxygen comments for csv_read_arg struct
- PR #532 CSV Reader: Use type dispatcher instead of switch block
- PR #694 Unit test utilities improvements
- PR #878 Add better indexing to Groupby
- PR #554 Add `empty` method and `is_monotonic` attribute to `Index`
- PR #1040 Fixed up Doxygen comment tags
- PR #909 CSV Reader: Avoid host->device->host copy for header row data
- PR #916 Improved unit testing and error checking for `gdf_column_concat`
- PR #941 Replace `numpy` call in `Series.hash_encode` with `numba`
- PR #943 Updated `count_nonzero_mask` to return `num_rows` when the mask is null
- PR #942 Added increment/decrement operators for wrapper types
- PR #943 Updated `count_nonzero_mask` to return `num_rows` when the mask is null
- PR #952 Added trait to map C++ type to `gdf_dtype`
- PR #966 Updated RMM submodule.
- PR #998 Add IO reader/writer modules to API docs, fix for missing cudf.Series docs
- PR #1017 concatenate along columns for Series and DataFrames
Expand Down
35 changes: 34 additions & 1 deletion cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ include_directories("${ARROW_INCLUDE_DIR}"
"${CMAKE_SOURCE_DIR}/include"
"${CMAKE_SOURCE_DIR}/src"
"${CMAKE_SOURCE_DIR}/thirdparty/cub"
"${CMAKE_SOURCE_DIR}/thirdparty/jitify"
"${CMAKE_SOURCE_DIR}/thirdparty/moderngpu/src"
"${CMAKE_SOURCE_DIR}/thirdparty/rmm/include"
"${ZLIB_INCLUDE_DIRS}")
Expand Down Expand Up @@ -189,6 +190,13 @@ add_library(cudf SHARED
src/groupby/groupby.cu
src/groupby/new_groupby.cu
src/binary/binary_ops.cu
src/binary/jit/code/kernel.cpp
src/binary/jit/code/operation.cpp
src/binary/jit/code/traits.cpp
src/binary/jit/core/binop.cpp
src/binary/jit/core/launcher.cpp
src/binary/jit/util/operator.cpp
src/binary/jit/util/type.cpp
src/bitmask/bitmask_ops.cu
src/bitmask/valid_ops.cu
src/compaction/stream_compaction_ops.cu
Expand All @@ -213,6 +221,24 @@ add_library(cudf SHARED
#Override RPATH for cudf
SET_TARGET_PROPERTIES(cudf PROPERTIES BUILD_RPATH "\$ORIGIN")

###################################################################################################
# - jitify ----------------------------------------------------------------------------------------

add_executable(stringify "${CMAKE_SOURCE_DIR}/thirdparty/jitify/stringify.cpp")
execute_process(WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E make_directory ${CMAKE_BINARY_DIR}/include)

add_custom_command(OUTPUT ${CMAKE_BINARY_DIR}/include/types.h.jit
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include
COMMAND ${CMAKE_BINARY_DIR}/stringify cudf/types.h > ${CMAKE_BINARY_DIR}/include/types.h.jit
COMMENT "stringify header types.h"
DEPENDS stringify
MAIN_DEPENDENCY ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h)

add_custom_target(stringify_run DEPENDS ${CMAKE_BINARY_DIR}/include/types.h.jit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEPENDS does not have to be a path, it can be a CMake variable.

For example:

add_custom_command(OUTPUT  TYPES_JIT
	                   WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include
	                   COMMAND ${CMAKE_BINARY_DIR}/stringify cudf/types.h > ${CMAKE_BINARY_DIR}/include/types.h.jit
                           ...)
...
add_custom_target(stringify_run DEPENDS TYPES_JIT)
...

As long as the MAIN_DEPENDENCY is properly specified to be ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h, CMake will update jit_types if that file changes.

Note that this is helpful if later you expect your command to have multiple outputs.

I would also consider renaming stringify_run to something more descriptive. I'm not really sure what the purpose of this target is off the cuff, and that obfuscates our build process for new developers, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this but it didn't work. Running make rebuilds launcher.cpp every time.

add_custom_command(OUTPUT STRINGIFIED_HEADERS
                   WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/include
                   COMMAND ${CMAKE_BINARY_DIR}/stringify cudf/types.h > ${CMAKE_BINARY_DIR}/include/types.h.jit
                   COMMENT "Run stringify on header types.h to convert it to c-str for use in JIT compiled code"
                   DEPENDS stringify
                   MAIN_DEPENDENCY ${CMAKE_CURRENT_SOURCE_DIR}/include/cudf/types.h)

add_custom_target(stringified_headers DEPENDS STRINGIFIED_HEADERS)

add_dependencies(cudf stringified_headers)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what does the cmake code above have to do with launcher.cpp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is for making and running an executable that comes with Jitify. This executable, called stringify, converts source files into c-strings and writes them to another source file. Now those stringified source files can be used in JIT code compilation. In our specific use case, we wanted to be able to use the definitions in types.h in the JIT kernels. So we use stringify to convert types.h into a string and store it in types.h.jit in the build/include directory. That types.h.jit is included in launcher.cpp. I wanted a way to use CMake to run stringify only when types.h is changed. With @mt-jones ' suggestion, CMake runs it all the time and every time i run make, launcher.cpp is rebuilt because the include is touched.


add_dependencies(cudf stringify_run)

###################################################################################################
# - build options ---------------------------------------------------------------------------------

Expand All @@ -230,11 +256,18 @@ if(HT_LEGACY_ALLOCATOR)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --define-macro HT_LEGACY_ALLOCATOR")
endif(HT_LEGACY_ALLOCATOR)

option(JITIFY_THREAD_SAFE "Use a global cache for JIT compiled kernels" ON)
if(JITIFY_THREAD_SAFE)
message(STATUS "Using global cache for JIT compiled kernels")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --define-macro JITIFY_THREAD_SAFE")
endif(JITIFY_THREAD_SAFE)


###################################################################################################
# - link libraries --------------------------------------------------------------------------------

target_link_libraries(cudf rmm "${ARROW_LIB}" ${ZLIB_LIBRARIES} NVStrings)
# TODO: better nvrtc linking with optional variables
target_link_libraries(cudf rmm "${ARROW_LIB}" ${ZLIB_LIBRARIES} NVStrings nvrtc)

###################################################################################################
# - python cffi bindings --------------------------------------------------------------------------
Expand Down
60 changes: 60 additions & 0 deletions cpp/include/cudf/functions.h
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,66 @@ gdf_error gdf_extract_datetime_second(gdf_column *input, gdf_column *output);

/* binary operators */

/**
* @brief Binary operation function between gdf_scalar and gdf_column structs.
*
* The function performs the binary operation of a gdf_scalar operand and a
* gdf_column operand.
*
* The desired output type needs to be specified in out->dtype
*
* If the valid field in the gdf_column output is not nullptr, then the valid
* mask from rhs gdf_column is copied into the data pointer to by out->valid
*
* @param out (gdf_column) Output of the operation.
* @param lhs (gdf_scalar) First operand of the operation.
* @param rhs (gdf_column) Second operand of the operation.
* @param ope (enum) The binary operator that is going to be used in the operation.
* @return GDF_SUCCESS if the operation was successful, otherwise an appropriate
* error code
*/
gdf_error gdf_binary_operation_s_v(gdf_column* out, gdf_scalar* lhs, gdf_column* rhs, gdf_binary_operator ope);

/**
* @brief Binary operation function between gdf_column and gdf_scalar structs.
*
* The function performs the binary operation of a gdf_column operand and a
* gdf_scalar operand.
*
* The desired output type needs to be specified in out->dtype
*
* If the valid field in the gdf_column output is not nullptr, then the valid
* mask from lhs gdf_column is copied into the data pointer to by out->valid
*
* @param out (gdf_column) Output of the operation.
* @param lhs (gdf_column) First operand of the operation.
* @param rhs (gdf_scalar) Second operand of the operation.
* @param ope (enum) The binary operator that is going to be used in the operation.
* @return GDF_SUCCESS if the operation was successful, otherwise an appropriate
* error code
*/
gdf_error gdf_binary_operation_v_s(gdf_column* out, gdf_column* lhs, gdf_scalar* rhs, gdf_binary_operator ope);

/**
* @brief Binary operation function between two gdf_column structs.
*
* The function performs the binary operation of two gdf_column operands.
*
* The desired output type needs to be specified in out->dtype
*
* If the valid field in the gdf_column output is not nullptr, then it will be
* filled with the bitwise AND of the valid masks of lhs and rhs gdf_column's
*
* @param out (gdf_column) Output of the operation.
* @param lhs (gdf_column) First operand of the operation.
* @param rhs (gdf_column) Second operand of the operation.
* @param ope (enum) The binary operator that is going to be used in the operation.
* @return GDF_SUCCESS if the operation was successful, otherwise an appropriate
* error code
*/
gdf_error gdf_binary_operation_v_v(gdf_column* out, gdf_column* lhs, gdf_column* rhs, gdf_binary_operator ope);


/* arith */

gdf_error gdf_add_generic(gdf_column *lhs, gdf_column *rhs, gdf_column *output);
Expand Down
34 changes: 34 additions & 0 deletions cpp/include/cudf/types.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,17 @@ typedef struct {
// here we can also hold info for decimal datatype or any other datatype that requires additional information
} gdf_dtype_extra_info;


/**---------------------------------------------------------------------------*
* @brief A struct to hold a scalar (single) value and its type information
*
*---------------------------------------------------------------------------**/
typedef struct {
void* data; /**< Pointer to the scalar data */
Copy link
Collaborator

@eyalroz eyalroz Mar 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a void*. The use of void * is unavoidable, and somewhat reasonable, for a gdf_column, but here it should just be a union of the different possible column element types.

Using a pointer means:

  • A gdf_scalar can never be owning.
  • Probably-unrecoverable loss of optimization opportunities in tight loops.
  • Inability to compare two gdf_scalars trivially.
  • Confusing to the users/callers: When, if ever, can they free the memory to which the gdf_scalar is pointing to?
  • Since pointers cannot typically be used both on the device and on the host, this us in the on-host-vs-on-device mess, which is bad enough for columns. A proper plain-old-data scalar would avoid this altogether
  • Additional risk of null/invalid pointer dereferencing. While it's true we can't avoid these risks with gdf_column - we should not introduce more of them if we can help it.

Finally,:

  • It isn't really consistent with gdf_column. This has an extra level of indirection which gdf_column does not; and it is is either non-nullable, or nullable in a different way than gdf_column.
  • Since a pointer is typically 8 bytes, there's no space savings from not having the union.
  • If we did want something consistent with gdf_column for scalars, we'd use... well, we'd use gdf_columns constrained to have length 1.

Bottom line: Vehemently oppose the design of this structure. Instead, use a distinguished union and a boolean, e.g. like so:

typedef struct {
    union {
        cudf::column_element_type_t< GDF_INT8      > int8;
        cudf::column_element_type_t< GDF_INT16     > int16;
        cudf::column_element_type_t< GDF_INT32     > int32;
        cudf::column_element_type_t< GDF_INT64     > int64;
        cudf::column_element_type_t< GDF_FLOAT32   > float32;
        cudf::column_element_type_t< GDF_FLOAT64   > float64;
        cudf::column_element_type_t< GDF_DATE32    > date32;
        cudf::column_element_type_t< GDF_DATE64    > date64;
        cudf::column_element_type_t< GDF_TIMESTAMP > timestamp;
        cudf::column_element_type_t< GDF_CATEGORY  > category;
    } datum;
    gdf_dtype dtype;
    bool valid; 
} gdf_scalar;

or whatever @jrhemstad 's dtype-to-actual-type trait is called.

FYI @harrism , @williamBlazing , @jrhemstad .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eyalroz I understand most of your argument, but you claim that scalars must be nullable without any argument for why you feel that to be true.

What is an example use case where an invalid scalar must be passable to a binary op?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism : The motivation is that (nullable) columns can hold the NULL value. Their element type is T disjoint-union Unit. NULL is "just" another value; it has some special semantics, but I could just as well may pass NULL to a binary op as I may pass a non-null value. In particular, my scalar may be the result of some lookup operation in a nullable column. There's no reason my code should special-case the lookup result being NULL rather than 123 or what-not.

Also, a binary operator applied to a column and a scalar can (and should be) considered equivalent to using a uniform column of the same value as an operand. And, obviously, uniform columns of a certain element type can, in particular, be uniformly NULL.

Copy link
Contributor

@jrhemstad jrhemstad Mar 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I do not support a union is because it is incompatible with the type_dispatcher.

The type_dispatcher is predicated on the idea that for a given gdf_dtype, it dispatches you the correct T, and you can static_cast your type-erased date (void*) to T.

However, you cannot cast a union to one of it's member types:

https://wandbox.org/permlink/zTfeMxDUtBTNiNuL

Furthermore, using a union increases the amount of code that needs to be manually updated whenever a new gdf_dtype is added (you have to update the members of the union and any code/tests that refer to members of the union.)

Lastly,

 cudf::column_element_type_t< GDF_INT8      > int8;

No such trait exists. In fact, when designing the type_dispatcher and I proposed such a trait to map gdf_dtypes to Ts, you assured me that there was no reason for such a trait to exist :)

You're thinking of the inverse of mapping a T to a gdf_dtype. https://github.com/rapidsai/cudf/pull/952/files#diff-7a5c6cb1fc7d9f8ada8ffb4e15d130efR170

Copy link
Contributor

@jrhemstad jrhemstad Mar 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you could make a union work like this:

union U {
    int myInt;
    float myFloat;
};

int main() { 
    U foo;
    foo.myInt = 42;
    std::cout << *reinterpret_cast<int*>(&foo) << std::endl;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're confusing C++ and C here. For the C API, there is obviously no type dispatcher and we have to work with what C offers. For C++, of course we wouldn't use a union - we'd use a variant. Not std::variant that's not available to us, but there are nice variant implementations available, like Michael Park's..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're confusing C++ and C here. For the C API, there is obviously no type dispatcher and we have to work with what C offers. For C++, of course we wouldn't use a union - we'd use a variant. Not std::variant that's not available to us, but there are nice variant implementations available, like Michael Park's..

Switching to C++ internals that could support a variant is months away and vastly outside the scope of this PR. As such, it is mandatory that a gdf_scalar be compatible with the type_dispatcher.

A union will work via the example I provided above. As such, I support using a union for the gdf_scalar (sorry Devavret :( ).

Copy link
Collaborator

@eyalroz eyalroz Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrhemstad :

Switching to C++ internals that could support a variant is months away

You don't have to switch everything to support variants. Right now we don't have any code which takes a gdf_scalar. So instead of gradually switching to supporting a struct-with-void-* scalar we gradually switch to an mp::variant scalar.

Regardless - a union is certainly a reasonable initial choice. Do you like my proposed union-based struct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, your union-based struct is good.

We'd either need to create the cudf::column_element_type_t trait you suggested, or just manually specify the types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No such trait exists. In fact, when designing the type_dispatcher and I proposed such a trait to map gdf_dtypes to Ts, you assured me that there was no reason for such a trait to exist :)

I don't remember what I said, but you didn't need it for the type dispatcher. Also, that trait could be implemented using a tuple type and a decltype + std::get + declval, I think.

gdf_dtype dtype; /**< The datatype of the scalar's data */
} gdf_scalar;


typedef struct gdf_column_{
void *data; /**< Pointer to the columns data */
gdf_valid_type *valid; /**< Pointer to the columns validity bit mask where the 'i'th bit indicates if the 'i'th row is NULL */
Expand Down Expand Up @@ -156,6 +167,29 @@ typedef enum {
GDF_NUM_COLORS, /** Add new colors above this line */
} gdf_color;


/**---------------------------------------------------------------------------*
* @brief Types of binary operations that can be performed on data.
*
*---------------------------------------------------------------------------**/
typedef enum {
GDF_ADD, /**< operator + */
GDF_SUB, /**< operator - */
GDF_MUL, /**< operator * */
GDF_DIV, /**< operator / using common type of lhs and rhs */
GDF_TRUE_DIV, /**< operator / after promoting type to floating point*/
GDF_FLOOR_DIV, /**< operator / after promoting to float and then flooring the result */
GDF_MOD, /**< operator % */
GDF_POW, /**< lhs ^ rhs */
GDF_EQUAL, /**< operator == */
GDF_NOT_EQUAL, /**< operator != */
GDF_LESS, /**< operator < */
GDF_GREATER, /**< operator > */
GDF_LESS_EQUAL, /**< operator <= */
GDF_GREATER_EQUAL, /**< operator >= */
} gdf_binary_operator;


/* --------------------------------------------------------------------------*/
/**
* @brief This struct holds various information about how an operation should be
Expand Down
35 changes: 35 additions & 0 deletions cpp/src/binary/jit/code/code.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/*
* Copyright 2018-2019 BlazingDB, Inc.
* Copyright 2018 Christian Noboa Mardini <[email protected]>
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

#ifndef GDF_BINARY_OPERATION_JIT_CODE_CODE_H
#define GDF_BINARY_OPERATION_JIT_CODE_CODE_H

namespace cudf {
namespace binops {
namespace jit {
namespace code {

extern const char* kernel;
extern const char* traits;
extern const char* operation;

}
}
}
}

#endif
67 changes: 67 additions & 0 deletions cpp/src/binary/jit/code/kernel.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
/*
* Copyright 2018-2019 BlazingDB, Inc.
* Copyright 2018 Christian Noboa Mardini <[email protected]>
* Copyright 2018 Rommel Quintanilla <[email protected]>
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

namespace cudf {
namespace binops {
namespace jit {
namespace code {

const char* kernel =
R"***(
#include "operation.h"
#include "cudf/types.h"

template <typename TypeOut, typename TypeLhs, typename TypeRhs, typename TypeOpe>
__global__
void kernel_v_s(gdf_size_type size,
TypeOut* out_data, TypeLhs* lhs_data, TypeRhs* rhs_data) {
int tid = threadIdx.x;
int blkid = blockIdx.x;
int blksz = blockDim.x;
int gridsz = gridDim.x;

int start = tid + blkid * blksz;
int step = blksz * gridsz;

for (gdf_size_type i=start; i<size; i+=step) {
out_data[i] = TypeOpe::template operate<TypeOut, TypeLhs, TypeRhs>(lhs_data[i], rhs_data[0]);
}
}

template <typename TypeOut, typename TypeLhs, typename TypeRhs, typename TypeOpe>
__global__
void kernel_v_v(gdf_size_type size,
TypeOut* out_data, TypeLhs* lhs_data, TypeRhs* rhs_data) {
int tid = threadIdx.x;
int blkid = blockIdx.x;
int blksz = blockDim.x;
int gridsz = gridDim.x;

int start = tid + blkid * blksz;
int step = blksz * gridsz;

for (gdf_size_type i=start; i<size; i+=step) {
out_data[i] = TypeOpe::template operate<TypeOut, TypeLhs, TypeRhs>(lhs_data[i], rhs_data[i]);
}
}
)***";

} // namespace code
} // namespace jit
} // namespace binops
} // namespace cudf
Loading