15 Feb 08:55

havogt

7cd24d2

GridTools version 2.2.3

Bug fixes

Fix CUDA 12.0 compilation (#1741)
Improvements to Python packaging (#1742, #1743, #1744)

CI

Add gcc-11, gcc-12, CUDA 12.0 (#1738, #1739, #1740)

Assets 2

12 Dec 10:30

havogt

v2.2.2

919cc3a

GridTools version 2.2.2

fn: SID neighbor table wrapper (#1730)

Adds a simple class that wraps a SID and implements the neighbour table concept. (Picked for convenience into 2.2.2.)

Support for Python packaging (#1720)

Starting with this release we will publish GridTools C++ on pypi.org to make it easier to consume GridTools C++ from GT4Py.

Bug fixes

Fix get_keys of empty hymap (#1728)
fn: CUDA early exit on empty grid - an empty domain skips execution instead of erroring (#1729)
fn: prefer qualified names over ADL for fn builtins (they are not customization points for the user) (#1731, #1732)
Enable workarounds for CUDA 11.8 (#1734)
Enable workarounds for Clang 15 (#1735)

Build fixes

Fix perftests CMake target when no tests are added (#1724)

Assets 2

04 Aug 08:57

havogt

v2.2.1

bc2c8dd

GridTools version 2.2.1

Bug fixes

Update pybind11 version to fix wrong C++ standard (#1723)
Fix perfect forwarding in sid::composite::make_values (#1722)
Workaround for NVCC bug in gcl (present in 11.6, 11.7 and most likely in 11.8) (#1726)

Performance fixes

Alternative skip value check in fn, which improves CUDA performance (#1721)

Cleanup

Replace boost::variant by std::variant (#1718)

Assets 2

06 Jul 10:36

havogt

v2.2.0

240e8b0

GridTools version 2.2.0

C++ standard upgraded to C++17

Starting with this version of GridTools, we require the C++17 standard (#1680) and improved the code base using C++17 features (#1693, #1716, #1697):

Get rid of tuple_util::make
GT_CONSTEXPR and GT_CONSTEXPR_TARGET goes away
wstd stuff goes away
is_trivially_copy_constructible check is consistently used instead of is_trivially_copyable where the data is passed host/device boundary, because it is exactly what is needed.
make_[smth] pattern is replaced to template argument deduction in several places, the old pattern is deprecated
composite is rewritten using c++17
overload is rewritten using c++17
std::[smth]_v<...> are used instead of std::[smth]<...>::value
static_assert(<cond>) used instead of static_assert(<cond>, "")
CTAD for simple_ptr_holder (#1701, #1708)

If you were using functionality from the internal library common you might have to update your code (all common is considered internal API, see Release process). The most common change is using CTAD instead of makers where possible. If not possible due to compiler bugs, the maker pattern was updated to be independant of tuple_util::make. E.g. replace

tuple_util::make<tuple>(...) by tuple(...)
tuple_util::make<array>(...) by array(...)
sid::composite::make<...>(...) by sid::composite::keys<...>::make_values(...)
tuple_util::make<hymap::keys<...>::values>(...) by hymap::keys<...>::make_values(...);

New library `fn`: functional model backend

The fn library provides functionality for the Declarative GT4Py to implement a backend for the functional model. It supports (naive, no-blocking) CPU and (efficient) GPU (CUDA) execution for structured (Cartesian) and unstructured grids. See examples in tests/regression/fn/.
The library provides a high-level, human-readable frontend, but is mainly meant as a target for code generators.

Introduce functional model backend (#1648, #1666, #1679)
Implements fn::extents (#1683)
Column Stage (#1685)
New Backend Backends (#1695)
Fn Frontend (#1698)
Performance References for FN Backends (#1711)
Add fn::tuple_get and fn::make_tuple (#1713)
Allow setting CUDA stream (#1712)

Minor new features

int_vector library (#1672)
add conversion assign to hymap (#1702)

Minor improvements

Extensions to meta and hymap (#1663)
Soften sid value type requirements from std::trivially_copyable to std::trivially_copy_constructible (#1663)
is_tuple_like (#1676) and is_hymap (#1677)

Bug fixes

Propagate CXX_STANDARD to all tests (#1664)
Compilation fixes for nvcc 11.5 and clang 12 with std=c++20 (#1665)
Workaround for nvcc bug https://godbolt.org/z/orrev1xnM (#1681)
c_bindings example: fix typo and split cpu and gpu fortran sources (#1684)
Fix unused param warnings (#1706)
Fix Compilation with CUDA 11.6 (#1710)
Support for Clang 14 (#1707)

Testing

Add C++20 with Cray Clang on Piz Daint to Jenkins CI (#1675)
Perftest Updates (#1690)
CI dom: Downgrade to gcc 10.3 for CUDA toolkit support (#1699)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt.

Contributors

fthaler, havogt, and anstaf

Assets 2

25 Oct 13:09

havogt

v2.1.0

a8039cb

GridTools version 2.1.0

New features

Dump backend: outputs a json representation of the stencil specification (#1456)
Reduction library with naive, CPU and GPU backends (#1590, #1594, #1619)
SID: Python cuda array interface support (#1596)

Extended features

Support for compile time length in data stores (#1545)
Several SID improvements (#1548)
Structured bindings support for gridtools tuple-like (#1556)
Improvements for Hugepage Allocation (#1562)
Add protection against misuse of device namespace (#1581)
fortran_array_view: allow to disable openacc (#1603)
Introduce sid::unknown_kind (#1605)

Non-functional changes

Hold the sids within sid::composite as tuple (#1564)
Various cleanups and c++17 related changes (#1579)
C++17 versions of meta::fold (#1549)
Sid as a proper C++20 concept (#1580, #1582)

Performance

More Inlining in cpu_kfirst Backend (#1634)
Support for Compile-Time Unit Stride Dimension for Python SID Adapter (#1635, #1651)

Bug fixes

K-cache fixes (#1530)
CMake: Fix storage_gpu for HIPCC-AMDGPU (#1540)
Remove a warning in hugepage_alloc which warns about a problem which only affects testing code (#1560)
Improve HIP + OpenMP Compilation (#1578)
Fix empty composite and add composite::make helper (#1583)
Fix as_const to work with any SID and be compatible with std::as_const (#1601, #1611)
SID composite: add static_assert against incorrect kinds (#1604)
Workaround a CUDA problem: tuple_util::concat remove constexpr var (#1606)
Improve Compliance with Parallel Model: Limit fusion of k-parallel execution with k-offsets (#1612)
GCC 9.x: Optimize multishift (#1630)
Python SID adapter: fix integer format check (#1632)
GCC 11.x: Compilation fixes (#1641, #1646)
Fixes for CUDA 11.4 (#1644)

Testing

Update to GTest v1.11 and minor changes to adapt for changed gtest interface (#1655)

Documentation

Clarifications to the execution model (#1541)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt, @lukasm91.

Contributors

fthaler, havogt, and 2 other contributors

Assets 2

04 Oct 08:42

havogt

v1.1.4

4de50ec

GridTools version 1.1.4

Bug fixes

speedup compile time (#1608)
Support for GPU backend with custom block sizes in boundary conditions (#1438)
Fix sid shift origin (#1517)

Compatibility with new compilers

Added support for GCC 11.x (#1652, #1654)
Fix for CUDA 11 (#1520)

Assets 2

31 Jul 09:28

havogt

v2.0.0

8101c64

GridTools version 2.0.0

GridTools v2.0.0

GridTools v2.0.0 comes with an improved API for stencil composition and storage construction.
These changes and a few others (see below) are breaking changes.

Changes since v1.1.0

New API: Stencil Composition

The make_computation API for composing stencils is replaced by a new stencil specification API, e.g.

auto horizontal_diffusion_spec = [](auto coeff, auto in, auto out) {
    GT_DECLARE_TMP(double, lap, flx, fly);
    return st::execute_parallel()
        .ij_cached(lap, flx, fly)
        .stage(lap_function(), lap, in)
        .stage(flx_function(), flx, in, lap)
        .stage(fly_function(), fly, in, lap)
        .stage(out_function(), out, in, flx, fly, coeff);
};

st::run(horizontal_diffusion_spec, stencil_backend_t(), grid, coeff, in, out);

instead of

auto horizontal_diffusion = gt::make_computation<backend_t>(grid,
    p_coeff{} = coeff,
    gt::make_multistage(gt::enumtype::execute<gt::enumtype::parallel, 20>{},
        define_caches(gt::cache<gt::IJ, gt::cache_io_policy::local>(p_lap{}, p_flx{}, p_fly{})),
        gt::make_stage<lap_function>(p_lap{}, p_in{}),
        gt::make_independent(gt::make_stage<flx_function>(p_flx{}, p_in{}, p_lap{}),
            gt::make_stage<fly_function>(p_fly{}, p_in{}, p_lap{})),
        gt::make_stage<out_function>(p_out{}, p_in{}, p_flx{}, p_fly{}, p_coeff{})));

horizontal_diffusion.run(p_in{} = in, p_out{} = out);

See the documentation and examples for details about the new API.

Related PRs: #1388

New API: Storage Builder

Datastores are now created using a builder API, e.g.

auto storage_builder = gt::storage::builder<storage_traits_t>.dimensions(d1, d2, d3).halos(halo, halo, 0);

auto in = storage_builder.type<double const>().value(42).build();
auto coeff = storage_builder.type<double const>().value(42).build();
auto out = storage_builder.type<double>().build();

The type returned by the builder is a shared_ptr of a data_store (previously the shared_ptr was inside the data_store)

Other storage related changes:

Memory alignment is applied in bytes (instead of in elements).
Host/device buffers are automatically synchronized on creation of views or on access of the underlying pointer (the sync method is removed).

See the documentation and examples for details about the new API.

Related PRs #1388, #1534

API break: New Backend names

Our backend names (cuda, mc, x86) where a source of confusion as the users had a certain (but wrong) idea of e.g. when to use x86.

The new names are (#1490):

gpu instead of cuda as the same backend works for HIP.
cpu_kfirst instead of x86, the innermost dimension is k, suitable for vertical stencils and architectures that emphasize caches over vector instructions.
cpu_ifirst instead of mc, the innermost dimension is i, suitable for modern CPUs where vector instructions are key for performance.

Additionally we introduced a new backend gpu_horizontal (#1445) which works only for pure horizontal (parallel) stencils.
Performance of gpu_horizontal is improved over gpu for most stencils, however we recommend to benchmark both backends.

Other API breaking changes

Backend declarations (traits) are removed from common/defs.hpp and are now provided in component specific headers for stencil, timer, gcl and storage (#1388).
We improved the code structure by introducing finer-grained namespaces (#1388)
The storage repository was removed (#1456)

New functionality

New sid::rename_dimensions (#1533)
New regression test illustrating c-arrays as SIDs (#1525)
A Python SID adapter including regression test for calling computations from Python (#1523)
Introduced the threadpool concept (#1484, #1498, #1504) and added an HPX threadpool (#1437)
Added an example for calling CUDA GridTools computations from Fortran with OpenACC (#1454)

Improved functionality

GCL is now header-only (-> all GridTools is now header-only)
The CMake build scripts are rewritten, see the documentation and examples for how to use GridTools CMake targets (#1421, #1441, #1442, #1450, #1509)

Bug Fixes / Cleanup

Fixes to SID concept helpers (#1524, #1527, #1531)
Fixes for CUDA 11 (#1529), thanks @lukasm91
Fixes for HIP compilation (#1488)
Better error diagnostics at the frontend (#1495)
Performance tests are now included in a single binary (#1453)
Layout transformations are refactored (#1388)
and many other small fixes

Infrastructure/Development

Environments are renamed to describe more precisely what they are (#1507)
Added testing on the new MeteoSwiss machine Tsa to Jenkins (#1452)
Moved tests from Travis to GitHub actions (#1446), added tests for different CMake setups (#1443).
Added a Gitpod configuration (#1423)
Added testing with Clang-based Cray compiler on Daint (#1382)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt, @jdahm, @lukasm91, @mbianco, @tehrengruber, @wdeconinck.

Assets 2

29 Jul 07:22

havogt

v2.0.0rc2

50c5e50

GridTools version 2.0.0rc2 Pre-release

Pre-release

see final release

Assets 2

15 Jun 12:09

havogt

v2.0.0rc1

53910ee

GridTools version 2.0.0rc1 Pre-release

Pre-release

see final release

Assets 2

20 Jan 14:52

havogt

v1.1.3

d33fa6f

GridTools version 1.1.3

Performance fixes

Revert a #pragma unroll to be optimal for the COSMO dycore on V100 (#1400)

Other

CMake: Add a missing policy workaround_mpi.cmake (#1398)

Assets 2

Releases: GridTools/gridtools

GridTools version 2.2.3

Bug fixes

CI

Uh oh!

GridTools version 2.2.2

fn: SID neighbor table wrapper (#1730)

Support for Python packaging (#1720)

Bug fixes

Build fixes

Uh oh!

GridTools version 2.2.1

Bug fixes

Performance fixes

Cleanup

Uh oh!

GridTools version 2.2.0

C++ standard upgraded to C++17

New library fn: functional model backend

Minor new features

Minor improvements

Bug fixes

Testing

Contributions

Contributors

Uh oh!

GridTools version 2.1.0

New features

Extended features

Non-functional changes

Performance

Bug fixes

Testing

Documentation

Contributions

Contributors

Uh oh!

GridTools version 1.1.4

Bug fixes

Compatibility with new compilers

Uh oh!

GridTools version 2.0.0

GridTools v2.0.0

Changes since v1.1.0

New API: Stencil Composition

New API: Storage Builder

API break: New Backend names

Other API breaking changes

New functionality

Improved functionality

Bug Fixes / Cleanup

Infrastructure/Development

Contributions

Uh oh!

GridTools version 2.0.0rc2

Uh oh!

GridTools version 2.0.0rc1

Uh oh!

GridTools version 1.1.3

Performance fixes

Other

Uh oh!

New library `fn`: functional model backend