Releases: libocca/occa
v2.0
Features
- The maximum number of kernel arguments can be adjusted at build time. [#718]
- SYCL subgroup size can be set via kernel property or
@simd_length
attribute. [#726] - Initial support for compiler attribute statements. [#729]
Breaking Changes
memory::size()
returns the number ofdtype
entries instead of byte-length. [#711]- Memory copies are now datatype aware for consistency. [#728]
- The CMake variables
OCCA_<MODE>_ENABLED
are set in parent scope. [#720] - CMake build options
ENABLE_<OPTION>
have been renamedOCCA_ENABLE_<OPTION>
. [#733] memoryPool
has graduated from an experimental feature and is now in the mainocca
namespace. [#741]
Bugfixes
- Correctly sync all streams in
device::finishAll()
. [#723] - Corruption of memory datatypes when using slices. [#727]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: v1.6.0...v2.0.0
v1.6.0
Features
- Devices can be shared by multiple host threads [#672]
- Pass general objects to kernels by value [#676]
- Quick return from some memory functions for zero sized allocations [#678]
- OKL support for typedef enums and unions [#705]
Bugfixes
- Correctly set source and binary filenames when building a launchedKernel [#666]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
- @ooreilly
- @BenWibking
- @topazus
- @cjatin
- @mkbosmans
- @TejaX-Alaghari
- @kian-ohara
- @deukhyun-cha
- @amikstcyr
- @thilinarmtb
- @noelchalmers
- @kris-rowe
** Full Changelog**: v1.5.0...v1.6.0
v1.5.0
Features
Memory Pools
A device memory pool implementation is available in the occa::experimental
namespace, targeting applications that frequently allocate/deallocate memory. See Example 17 for more details.
Provide feedback or share your use cases for this feature in the Experimental discussion category.
Outward Interoperability
An unwrap
function has been added to the core classes—device
, memory
, stream
, and streamTag
—which returns a void*
pointer to the mode-specific object used to implement each class.
This advance feature is intended to facilitate interoperability between occa and other accelerated libraries. Application developers are responsible casting the returned pointer to the correct mode-specific type.
In the future, a type-safe interface will be provided for the C++ API.
Breaking Changes
- Compiler flags set via
occa::json
kernel properties now take precedence over the corresponding environment variables. [#622]
Bugfixes
- Dynamic @exclusive sizes [#121]
- Build artifacts (e.g., binaries for kernel + launcher) are not durable [#515]
- Missing initial index value on @inner/@outer loops causes a segfault during translation. [#610]
- A seg fault is encountered when destroying an occa::kernel that was created via a multi-kernel OKL file in CUDA, HIP, or DPC++ modes. [#624]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: v1.4.0...v1.5.0
Release Version 1.4.0
Features
Stream and device synchronization
- The member function
stream::finish()
allows for synchronization with a specific stream. - The member function
device::finishAll()
synchronizes all streams on a device.- This is in contrast to
device::finish()
, which only synchronizes the current stream.
- This is in contrast to
- Related C and Fortran interfaces have been included for both functions.
- Example 07 streams has been updated to demonstrate intended usage.
Breaking Changes
- All MPI related functionality has been removed from OCCA.
Bugfixes
- A race condition that occurs when writing kernel caches. [#594]
- Reported CPU frequencies are now scaled correctly. [#601]
- Redundant OpenMP and library flags have been removed from the CMake build. [#607]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
Full Changelog: v1.3.0...v1.4.0
Release version 1.3.0
Features
CMake Package Files [#533]
OCCA now provides CMake package files which are configured during installation. These package files define an imported target, OCCA::libocca
, and look for all required dependencies.
For example, the CMakeLists.txt of downstream projects using OCCA would include
find_package(OCCA REQUIRED)
add_executable(downstream-app ...)
target_link_libraries(downstream-app PRIVATE OCCA::libocca)
add_library(downstream-lib ...)
target_link_libraries(downstream-lib PUBLIC OCCA::libocca)
In the case of a downstream library, linking OCCA using the PUBLIC specifier ensures that CMake will automatically forward OCCA's dependencies to applications which use the library.
Environment Module [#580]
During installation, the Env Modules file /modulefiles/occa is generated. When this module is loaded, paths to the installed bin, lib, and include directories are appended to environment variables such as PATH and LD_LIBRARY_PATH.
To use this modulefile, add the following line to your .modulerc file
module use -a <occa-install-prefix>/modulefiles
then call
module load occa
Non-blocking Streams [#498]
The CUDA and HIP backends now support the creation of non-blocking streams.
An example has been added demonstrating how to enable this feature.
Additionally, a new API has been added wrap native backend streams [#525]
Profiling and Debugging
An interface has been added for logging the memory high watermark. [#522]
OCCA preprocessor error messages have also been improved [#572]
OKL
A new attribute, @nobarrier
, prevents the automatic addition of barriers to @inner
loop blocks. [#544]
Kernel Loop Ranges [#531]
When @inner
loop ranges are known at compile-time, compiler optimization directives are added to translated kernel code for the CUDA, HIP, OpenCL, and SYCL backends.
If @inner
loop ranges are passed as a kernel argument, the OKL translator will not automatically add optimization directives. In this case, the attribute @max_inner_dims
can be used to achieve the same effect.
Dependency Changes
- The minimum version of CMake required is now v3.17 [#528]
Bugfixes
- Compilation on MacOS [#485]
- streamTag timings for OpenCL were corrected to give meaningful results [#518]
- Git ignore build dir if it's a symlink [#536]
- Use mpi_f08 module to fix Intel compiler warnings [#539]
- Use correct directory in run_examples script [#540]
- HIP compiler error and warnings [#547]
- Examples JSON [#549]
- Broken caching for "output" file [#554]
Contributors
OCCA is a community driven project that relies on the support of people like you. Thank you everyone who contributed to this release!
- @Luthaf
- @AljenU
- @deukhyun-cha
- @wjhorne
- @MalachiTimothyPhillips
- @stgeke
- @SFrijters
- @noelchalmers
- @kris-rowe
Full Changelog: v1.2.0...v1.3.0
v1.2.0
Table of Contents
🔥 Exciting News
We introduce 3 awesome new features to the OCCA library. They are still in the 🃏 experimental stage, mainly due to performance reasons. We found an initial approach to enabling inlined lambdas and wanted to see how far we could go with them.
Future work includes profiling and optimizing build + launch of the inlined lambdas. How we cache kernel builds and fetch from the cache is still up in the air, but looking forward to tacking this fun problem 😃.
🃏 occa::forLoop and inlined kernels
Basic Example
Here we generate a for-loop that goes through [0, N)
and tiled by tileSize
occa::forLoop()
.tile({N, tileSize})
.run(scope, OCCA_FUNCTION([=](const int index) -> void {
// ...
}));
We can do it manually by calling .outer
and .inner
occa::forLoop()
.outer(occa::range(0, N, tileSize))
.inner(tileSize)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int innerIndex) -> void {
const int index = innerIndex + (tileSize * innerIndex);
// ...
}));
Indices + Multiple Dimensions
We give an example where an index array is passed rather than a simple occa::range
Additionally, this @inner
loop has 2 dimensions so the expected OCCA_FUNCTION
should be taking in an int2
for the inner indices
occa::array<int> indices;
// ...
occa::forLoop()
.outer(indices)
.inner(X, Y)
.run(scope, OCCA_FUNCTION([=](const int outerIndex, const int2 innerIndex) -> void {
// ...
}));
🃏 occa::array and Functional Programming
We introduce a simple wrapper on occa::memory
which is typed and contains some of the core map
and reduce
functional methods.
Example
const double dynamicValue = 10;
const double compileTimeValue = 100;
occa::scope scope({
// Passed as arguments
{"dynamicValue", dynamicValue}
}, {
// Passed as compile-time #defines
{"defines/compileTimeValue", compileTimeValue}
});
occa::array<double> doubleValues = (
values.map(OCCA_FUNCTION(scope, [](int value) -> double {
return compileTimeValue + (dynamicValue * value);
}));
);
We also include a helper method occa::range
which implements most of the occa::array
methods but can be used without allocating data before iteration. It's useful if there is no specific input/output but still need to call a generic map
or reduce
function.
// Iterates through [0, 1, 2]
occa::range(3).map(...);
// Iterates through [3, 4, 5]
occa::range(3, 6).map(...);
// Iterates through [6, 5, 4]
occa::range(6, 3).map(...);
// Iterates through [0, 2, 4]
occa::range(0, 6, 2).map(...);
// Iterates through [6, 4, 2]
occa::range(6, 0, -2).map(...);
// No-op since there isn't anything to iterate through
occa::range(6, 0, 1).map(...);
Core methods
forEach
mapTo
map
reduce
Reduction
every
max
min
some
Re-indexing
reverse
shiftLeft
shiftRight
Utility methods
cast
clamp
clampMax
clampMin
concat
dot
fill
slice
Search
findIndex
find
includes
indexOf
lastIndexOf
🃏 Atomics
It's still in it's 🃏 experimental stage, but OKL now allows for basic atomic operations!
ℹ️ @atomic
should be fully available for Serial
and OpenMP
modes. There is probably still room for improvement in the OpenMP
implementation!
HIP
, CUDA
, OpenCL
) don't have general atomics implemented, only have the following basic updates:
@atomic value += update;
@atomic value -= update;
@atomic value &= update;
@atomic value |= update;
@atomic value ^= update;
Inlined @atomic
@atomic *ptr += value;
Block @atomic
If you prefer, you can use blocks which will be equivalent to inlined @atomic
use if possible
@atomic {
*ptr += value;
}
However, generic @atomic
blocks are also possible
@atomic {
*ptr += value;
*ptr2 += value2;
}
🃏 DPC++ Backend
The DPC++ backend was added by the great work completed jointly by ALCF and Intel, with contributions from:
- Anoop Madhusoodhanan Prabha (Intel)
- Cedric Andreolli (Intel)
- Kris Rowe (ALCF)
- Phillipe Thierry (Intel)
- Saumil Patel (ALCF)
Notes
Currently only building with CMake is supported.
Code Transformation Rewrite
The way statement and expression code transformations are done have been fully rewritten!A functional occa::lang::array
class was introduced to help with statement (statement_t
) and expression (exprNode
) iteration and transformation. More information on PR #404.
Additionally the occa::lang::expr
class helps create expressions easily without having to worry about pointers or underlying node objects. More information on PR #407.
⚠️ Breaking Changes
-
This is more of a potential breaking change but in a series of commits, we finally split up the public/private API!
-
occa::properties
is now deprecated and replaced withocca::json
occa::properties
wasn't adding much on top of occa::json
, instead making auto-casting harder since we had to handle both json and prop objects. We still keep the properties
and props
naming convention throughout the library, since that's what they are but have transitioned the types to occa::json
.
We still have a
typedef json properties;
so there shouldn't be any type-breaking changes for C++. The big difference is how std::string
is being cast to json/properties:
std::string
?occa::properties
: Thestd::string
value is parsed into its JSON value. For example, we can pass{key: 1}
orkey: 1
std::string
?occa::json
: Theocca::json
value is a literal string value. For example, if we pass{key: 1
} then theocca::json
value will be a string whose value is"{key: 1}"
.
Details about the refactor:
- [C++] The only breaking change is property strings now need to have the surrounding braces ({}
) to make it valid JSON
- [C] All property methods have been removed and should be replaced with the Json methods
- [Fortran] All property methods have been removed and should be replaced with the Json methods
- [#475 ] We're removing
umalloc
+ UVA since it's only adding extra overhead and introduces a 3rd way to manage memory along withocca::memory
andocca::array
.
⭐ Features
- [#376] Adds
host: true
option tomalloc
for better host-allocation strategies (Thanks @noelchalmers!) - [#404] Code transformation just got easier with the introduction of the very functional
statementArray
andexprNodeArray
which makes it easy to:- Iterate through statements (
forEach
ornestedForEach
(recursive)) - Filter statements (
filter
orflatFilter
(recursive)) - Transform expressions (
exprNode
) throughexprNodeArray::inplaceMap
- Iterate through statements (
- [#407] Introduces the
occa::lang::expr
helper class to build expressions without having to know the underlyingexprNode
types or worry about pointers! - [#408] Adds
okl/strict_headers
kernel property to avoid erroring on headers OCCA can't find. Useful for mode-specific system headers. - [#409] Adds
sourceCodeStatement
to inject non-standard source code when needed. - [#410] Adds
@atomic
support (TODO: Finish most base implementations) - [#411] Updates bash autocomplete
- [#420] Adds
occa::setOccaCacheDir
to programmatically set theOCCA_CACHE_DIR
at runtime - [#421] Handle new HIP output formats in our builds (Thanks @dmcdougall!)
- [#427] Adds
occa::getDeviceCount
(Thanks @noelchalmers!) - [#425] We now test CMake builds in our Github Actions (Thanks @noelchalmers!)
- [#435] Adds
device.wrapMemory
to wrap native pointers intoocca::memory
objects - [#459] Adds
occaKernelRunWithArgs
which takes anoccaType
pointer - [#494] Adds DPC++ backend (Thanks @kris-rowe 🎉 🎉 🎉)
- [#490] Change...
v1.1.0
🔥 Exciting News
Fortran API
I'm super excited to announce the Fortran API! This was single-handedly designed and built by @awehrfritz, so huge thanks!! The API is not finalized but most likely not changing much in the future since the design matches our other language APIs.
For more information, the initial PR can be found here: #341
Collaboration
For a lot of the OCCA development, most of the work was done by a very small group of people. The project has grown over the last few years from it being a research project to it being used by a few organizations.
During this release, we added CMake support. While it's not directly adding any development features, it will enable the use of the OCCA library to a greater audience which some might say is even more impactful than adding features. What makes this even more exciting is how many unrelated collaborators took part in this work!
Lots of PRs that made this happen: #310, #313, #319, #323, #329, #344, #345, #357
Many thanks to
⚠️ Breaking Changes
-
[de598e6] OCCA now compiles with C++11. C++ projects will need the
-std=c++11
flag for most compilers added to compilation. -
[f4fea62] Renamed
occa::hash_t
methodshash_t::toString()
→hash_t::getString()
hash_t::toFullString()
→hash_t::getFullString()
-
[#322] Updates
occaFree
to take in the argument by reference rather than valueoccaFree(value)
↓
occaFree(&value)
🃏Experimental
-
[#341] The Fortran API
-
[08b3a68] Adds
OCCA_JIT
andOCCA_JIT_WITH_SCOPE
macro. Examples for C++ and C can be found:For Example:
OCCA_JIT( (entries, a, b, ab), ( for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) { ab[i] = 100 * (a[i] + b[i]); } ) );
-
[0a77696] Adds
okl-mode.el
for editing OKL kernels in Emacs 🎉
⭐️ Features
-
[f813c34] Adds templated
malloc
for easier use while keeping backwards compatibilityOriginal malloc
occa::memory mem = occa::malloc(10 * sizeof(float), src);
↓
Initial dtype malloc
occa::memory mem = occa::malloc(10, occa::dtype::float_, src);
↓
New malloc
occa::memory mem = occa::malloc<float>(10, src);
-
[92ffb58] Adds templated
umalloc
for easier use while keeping backwards compatibilityfloat *a = (float*) occa::umalloc(10, occa::dtype::float_, src); void *b = occa::umalloc(10 * sizeof(float), src);
↓
float *a = occa::umalloc<float>(10, src); void *b = occa::umalloc(10 * sizeof(float), src);
-
[c61d636] Adds templated
ptr
for easier use. Defaults to the return value ofvoid*
for backwards compatibility.occa::memory mem = occa::malloc(10, occa::dtype::float_, src); float *ptr = (float*) mem.ptr();
↓
occa::memory mem = occa::malloc(10, occa::dtype::float_, src); float *ptr = mem.ptr<float>();
-
[c61d636] Adds
use_host_pointer
to memory props to auto-wrap source pointers duringmalloc
callsfloat *hostPtr = new float[10]; occa::memory mem = occa::malloc<float>(10, occa::dtype::float_, hostPtr, "use_host_pointer: true"); mem.ptr<float>() == hostPtr;
-
Adds polyfills to test compilation of locally unsupported modes
-
[284aff8] Adds method to get the kernel hash
C++
occa::kernel::hash()
which returns aocca::hash_t
objectC
occaKernelGetHash
andoccaKernelGetFullHash
which return hash as aconst char*
-
[f2f21a3] Adds Metal backend for GPGPU in MacOS
- Requires MacOS to be at least 10.4 (Mojave)
- Requires XCode version to be at least 10.2.1
- Metal does not support
double
orlong
types - Issues with global
typedef
due to missing address space qualifiers
-
[386bc4c] Adds
occa translate --launcher
to get the host code needed to launch the device kernels (CUDA, HIP, OpenCL, Metal modes) -
[#246] Adds the
@directive
preprocessor attribute to add directives inside macros, such asOCCA_JIT
@directive("#pragma ivdep")
↓
#pragma ivdep
-
[#265] Adds
OCCA_CONFIG
config file to set defaults. There is aconfig.defaults.json
file with explanation of possible properties that can be set, including mode-specific properties. -
[#266] Allows HIP to compile CUDA kernels (Thanks @noelchalmers!)
-
[#270] Adds
occa::null
for passing aNULL
equivalent toocca::kernel
s (occaNull
in C) -
[#284] Adds
OCCA_LDFLAGS
along withkernel/compiler_linker_flags
(Thanks @stgeke!) -
[#308] Adds
OCCA_SHARED_FLAGS
along withkernel/compiler_shared_flags
-
[#308] Adds support to build native C kernels (disabling OKL with
okl/enabled
set tofalse
and settingkernel/compiler_language
toC
which defaults toC++
) (Thanks @amikstcyr!) -
[#346] Supports
#include
of standard C and C++ headers in OKL kernels. Note this will print warnings since adding these headers is not a portable solution across supported backends. -
[#347] Adds some standard defines on OKL kernels so users can check if the kernel is being processed by an OKL kernel or not. This is useful when reusing source code for OCCA kernels and non-OCCA kernels.
-
[#349] Keeps some comments around after applying OKL transformation for cleaner generated kernels.
-
[#354] Adds
OKL_KERNEL_HASH
define to help debug which kernel is currently being run (Tip:printf
andstd::cout
are available inSerial
andOpenMP
modes!) -
[#349][#355][#358][#364] Keeps comments around when transpiling kernels
🐛 Bugs Fixed
- [ebdb659] Updates to HIP backend (Thanks @noelchalmers!)
- [ac117fb] Fixed caching bugs (Thanks Nigel Nunn!)
- [5420005] Use
.dylib
instead of.so
on MacOS (Thanks @thilinarmtb!) - [ce4df26] Properly copy over artifacts when building with
PREFIX
(Thanks @thilinarmtb!) - [#243] Properly avoid overriding and duplicating compiler shared flags(Thanks @noelchalmers!)
- [f23ce88] Avoids writing lockfile when checking compiler vendor
- [3df3955] Properly fixed untyped umalloc in C
- [4d5d5bc] Kernels from strings were badly generating the launcher kernel
- [27a7420] OpenCL translation was converting the const pointer typedefs
const
qualifier &rarr__constant
- [#261] Invalid read in
json
->properties
unsafe cast (Thanks for pointing it out @stgeke!) - [#265] Fixes object/mode specific properties from not propagating
- [86dead2] OpenCL timing was done backwards, resulting in negative times. (Thanks @tcew!)
- [#293] Fixed some reference counting issues with the
kernelBuilder
- [#400] CUDA context was not being set in a few places (Thanks @amikstcyr!)
🎉 Contributors
v1.0.9
⭐️ Features
-
[beec086] Added
struct
support to OKLThere are still a few missing features when using
structs
, such as:-
typedef
-ing structstypedef struct { } foo;
-
Expanding
@attributes
on struct variablesstruct mat3 { int *values @dim(3, 3); } mat3 m; // Error since the parser right now doesn't "know" `values` is a @dim(3, 3) m.values(0, 0);
-
Access level modifiers are not supported at the moment
struct foo { private: ... }
-
-
[bf1dd16]
@restrict
expands to__restrict__
by default-
OpenCL mode overrides it to
restrict
-
Setting the property
options/restrict
overrides either of those two values. For example:-
disable
will make it so@restrict
is ignored -
Any other value will be used instead (e.g. setting it to
'__declspec(restrict)'
would be preferred in Windows)
-
-
-
[897f600] Defaults compiler flags to optimize compilation (e.g.
-O3
)
🐛 Bugs Fixed
- [e21962d] CPU wrapped memory was being freed by the
occa::memory
object
v1.0.8
📢 Annoucement
Python API released!
Check it out at libocca/occa.py or install running
pip install occa
- Most of the core API is ported to Python
- Numpy arrays are used seamlessly with
occa.memory
objects - First steps to supporting JIT-compiled Python functions as OKL kernels:
@okl.kernel
def py_add_vectors(a: Const[List[np.float32]],
b: Const[List[np.float32]],
ab: List[np.float32]) -> None:
for i in okl.range(entries).tile(16):
ab[i] = a[i] + b[i]
↓
@kernel void py_add_vectors(const float *a,
const float *b,
float *ab) {
for (int i = 0; i < entries; ++i; @tile(16, @outer, @inner)) {
ab[i] = a[i] + b[i];
}
}
⭐️ Features
-
[54f4003] Added dtypes which can be optionally used for runtime type checking
- New class
occa::dtype_t
- Optional typed
occa::memory
allocation
occa::malloc(10 * sizeof(float)); // Regular malloc occa::malloc(10, occa::dtype::float_); // Typed malloc occa::malloc(10, occa::dtype::get<float>()); // Templated typed malloc
occaMalloc(10 * sizeof(float), NULL, occaDefault); // Regular malloc occaTypedMalloc(10, occaDtypeFloat, NULL, occaDefault); // Typed malloc
- API for creating custom dtypes, for example:
occa::dtype_t vec3; // { float x, y, z } vec3.addField("x", occa::dtype::float_); vec3.addField("y", occa::dtype::float_); vec3.addField("z", occa::dtype::float_);
- New class
-
[994eb2a] Added more kernel methods for the C API
void occaKernelPushArg(occaKernel kernel, occaType arg); void occaKernelClearArgs(occaKernel kernel); void occaKernelRunFromArgs(occaKernel kernel); void occaKernelVaRun(occaKernel kernel, const int argc, va_list args);
-
[f6333f2] Custom kernel library paths, for example:
// Application code occa::io::addLibraryPath("mylibrary", "./path/to/kernels/dir"); occa::io::addLibraryPath("mylibrary", "${MY_LIBRARY_DIR}"); // Kernel #include "occa://mylibrary/kernel.okl"
🐛 Bugs Fixed
v1.0.7
⚠️ Breaking Changes
-
[cd68708] Updated
wrapMemory
to take in anocca::device
andocca::properties
Before
occa::cpu::wrapMemory(void* ptr, const udim_t bytes)
After
occa::cpu::wrapMemory(occa::device device, void* ptr, const udim_t bytes, occa::properties props)
-
[959ec4a] Renamed
occaSetDeviceFromInfos
to fit the rest of the methodsBefore
occaSetDeviceFromInfos(const char *info)
After
occaSetDeviceFromString(const char *info)
-
[7735c66] Removed some redundant stream methods
Before
occa::device::freeStream(occa::stream) // C++ occaDeviceFreeStream(occaStream) // C
After (Not new)
occa::stream::free() // C++ occaFree(occaStream) // C
-
[f81054d] Removed
occa::opencl::event()
and moved it toocca::opencl::streamTag::clEvent
-
[f81054d] Removed
occa::cuda::event()
and moved it toocca::cuda::streamTag::cuEvent
-
[f81054d] Removed
occa::streamTag::tagTime
. Tags can only be used for:- Waiting for queued tasks to finish (e.g. launched kernels or memory copies)
- Time gaps between 2 tags
⭐️ Features
- [daf0300] Faster
make
build and addedmake info
@v-dobrev - [1024a62] Switched garbage collection strategy to
NULL
out existing device/kernel/memory objects when one is freed. This switchesSEGFAULT
issues toocca::exception
errors that can be more easily debugged. - [527494c] Linalg methods reuse device buffers for reductions
- [ce46013] Loading cached kernels are sped up by avoiding locks if possible
- [e27b29e] Added
occaJson
- [fdd2d7c] Added
occaCreateDeviceFromString
- [fdd2d7c] Added CLI to C exampleOpenCL mode
- [959ec4a] Added UVA methods to C API
- [7735c66] The
occa::stream
class can now be extended - [f81054d] The
occa::streamTag
class can now be extended
🐛 Bugs Fixed
- [99ce6fb] Linalg properly deletes array allocations @jdahm
- [b7384bc] Kernel hashes is generated only from needed props (e.g. ignores
verbose
) - [780a06a] OpenCL
__global
,__local
, and__kernel
are properly inserted in the beginning - [dba0db9]
memory::slice
was improperly freeing UVA pointers in - [3260a05] The
verbose
property was being overwritten in CUDA mode