Skip to content

Releases: GridTools/gridtools

GridTools version 2.2.3

15 Feb 08:55
Compare
Choose a tag to compare

Bug fixes

CI

GridTools version 2.2.2

12 Dec 10:30
Compare
Choose a tag to compare

fn: SID neighbor table wrapper (#1730)

Adds a simple class that wraps a SID and implements the neighbour table concept. (Picked for convenience into 2.2.2.)

Support for Python packaging (#1720)

Starting with this release we will publish GridTools C++ on pypi.org to make it easier to consume GridTools C++ from GT4Py.

Bug fixes

  • Fix get_keys of empty hymap (#1728)
  • fn: CUDA early exit on empty grid - an empty domain skips execution instead of erroring (#1729)
  • fn: prefer qualified names over ADL for fn builtins (they are not customization points for the user) (#1731, #1732)
  • Enable workarounds for CUDA 11.8 (#1734)
  • Enable workarounds for Clang 15 (#1735)

Build fixes

  • Fix perftests CMake target when no tests are added (#1724)

GridTools version 2.2.1

04 Aug 08:57
Compare
Choose a tag to compare

Bug fixes

  • Update pybind11 version to fix wrong C++ standard (#1723)
  • Fix perfect forwarding in sid::composite::make_values (#1722)
  • Workaround for NVCC bug in gcl (present in 11.6, 11.7 and most likely in 11.8) (#1726)

Performance fixes

  • Alternative skip value check in fn, which improves CUDA performance (#1721)

Cleanup

  • Replace boost::variant by std::variant (#1718)

GridTools version 2.2.0

06 Jul 10:36
240e8b0
Compare
Choose a tag to compare

C++ standard upgraded to C++17

Starting with this version of GridTools, we require the C++17 standard (#1680) and improved the code base using C++17 features (#1693, #1716, #1697):

  • Get rid of tuple_util::make
  • GT_CONSTEXPR and GT_CONSTEXPR_TARGET goes away
  • wstd stuff goes away
  • is_trivially_copy_constructible check is consistently used instead of is_trivially_copyable where the data is passed host/device boundary, because it is exactly what is needed.
  • make_[smth] pattern is replaced to template argument deduction in several places, the old pattern is deprecated
  • composite is rewritten using c++17
  • overload is rewritten using c++17
  • std::[smth]_v<...> are used instead of std::[smth]<...>::value
  • static_assert(<cond>) used instead of static_assert(<cond>, "")
  • CTAD for simple_ptr_holder (#1701, #1708)

If you were using functionality from the internal library common you might have to update your code (all common is considered internal API, see Release process). The most common change is using CTAD instead of makers where possible. If not possible due to compiler bugs, the maker pattern was updated to be independant of tuple_util::make. E.g. replace

  • tuple_util::make<tuple>(...) by tuple(...)
  • tuple_util::make<array>(...) by array(...)
  • sid::composite::make<...>(...) by sid::composite::keys<...>::make_values(...)
  • tuple_util::make<hymap::keys<...>::values>(...) by hymap::keys<...>::make_values(...);

New library fn: functional model backend

The fn library provides functionality for the Declarative GT4Py to implement a backend for the functional model. It supports (naive, no-blocking) CPU and (efficient) GPU (CUDA) execution for structured (Cartesian) and unstructured grids. See examples in tests/regression/fn/.
The library provides a high-level, human-readable frontend, but is mainly meant as a target for code generators.

  • Introduce functional model backend (#1648, #1666, #1679)
  • Implements fn::extents (#1683)
  • Column Stage (#1685)
  • New Backend Backends (#1695)
  • Fn Frontend (#1698)
  • Performance References for FN Backends (#1711)
  • Add fn::tuple_get and fn::make_tuple (#1713)
  • Allow setting CUDA stream (#1712)

Minor new features

  • int_vector library (#1672)
  • add conversion assign to hymap (#1702)

Minor improvements

  • Extensions to meta and hymap (#1663)
  • Soften sid value type requirements from std::trivially_copyable to std::trivially_copy_constructible (#1663)
  • is_tuple_like (#1676) and is_hymap (#1677)

Bug fixes

  • Propagate CXX_STANDARD to all tests (#1664)
  • Compilation fixes for nvcc 11.5 and clang 12 with std=c++20 (#1665)
  • Workaround for nvcc bug https://godbolt.org/z/orrev1xnM (#1681)
  • c_bindings example: fix typo and split cpu and gpu fortran sources (#1684)
  • Fix unused param warnings (#1706)
  • Fix Compilation with CUDA 11.6 (#1710)
  • Support for Clang 14 (#1707)

Testing

  • Add C++20 with Cray Clang on Piz Daint to Jenkins CI (#1675)
  • Perftest Updates (#1690)
  • CI dom: Downgrade to gcc 10.3 for CUDA toolkit support (#1699)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt.

GridTools version 2.1.0

25 Oct 13:09
a8039cb
Compare
Choose a tag to compare

New features

  • Dump backend: outputs a json representation of the stencil specification (#1456)
  • Reduction library with naive, CPU and GPU backends (#1590, #1594, #1619)
  • SID: Python cuda array interface support (#1596)

Extended features

  • Support for compile time length in data stores (#1545)
  • Several SID improvements (#1548)
  • Structured bindings support for gridtools tuple-like (#1556)
  • Improvements for Hugepage Allocation (#1562)
  • Add protection against misuse of device namespace (#1581)
  • fortran_array_view: allow to disable openacc (#1603)
  • Introduce sid::unknown_kind (#1605)

Non-functional changes

  • Hold the sids within sid::composite as tuple (#1564)
  • Various cleanups and c++17 related changes (#1579)
  • C++17 versions of meta::fold (#1549)
  • Sid as a proper C++20 concept (#1580, #1582)

Performance

  • More Inlining in cpu_kfirst Backend (#1634)
  • Support for Compile-Time Unit Stride Dimension for Python SID Adapter (#1635, #1651)

Bug fixes

  • K-cache fixes (#1530)
  • CMake: Fix storage_gpu for HIPCC-AMDGPU (#1540)
  • Remove a warning in hugepage_alloc which warns about a problem which only affects testing code (#1560)
  • Improve HIP + OpenMP Compilation (#1578)
  • Fix empty composite and add composite::make helper (#1583)
  • Fix as_const to work with any SID and be compatible with std::as_const (#1601, #1611)
  • SID composite: add static_assert against incorrect kinds (#1604)
  • Workaround a CUDA problem: tuple_util::concat remove constexpr var (#1606)
  • Improve Compliance with Parallel Model: Limit fusion of k-parallel execution with k-offsets (#1612)
  • GCC 9.x: Optimize multishift (#1630)
  • Python SID adapter: fix integer format check (#1632)
  • GCC 11.x: Compilation fixes (#1641, #1646)
  • Fixes for CUDA 11.4 (#1644)

Testing

  • Update to GTest v1.11 and minor changes to adapt for changed gtest interface (#1655)

Documentation

  • Clarifications to the execution model (#1541)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt, @lukasm91.

GridTools version 1.1.4

04 Oct 08:42
Compare
Choose a tag to compare

Bug fixes

  • speedup compile time (#1608)
  • Support for GPU backend with custom block sizes in boundary conditions (#1438)
  • Fix sid shift origin (#1517)

Compatibility with new compilers

GridTools version 2.0.0

31 Jul 09:28
8101c64
Compare
Choose a tag to compare

GridTools v2.0.0

GridTools v2.0.0 comes with an improved API for stencil composition and storage construction.
These changes and a few others (see below) are breaking changes.

Changes since v1.1.0

New API: Stencil Composition

The make_computation API for composing stencils is replaced by a new stencil specification API, e.g.

auto horizontal_diffusion_spec = [](auto coeff, auto in, auto out) {
    GT_DECLARE_TMP(double, lap, flx, fly);
    return st::execute_parallel()
        .ij_cached(lap, flx, fly)
        .stage(lap_function(), lap, in)
        .stage(flx_function(), flx, in, lap)
        .stage(fly_function(), fly, in, lap)
        .stage(out_function(), out, in, flx, fly, coeff);
};

st::run(horizontal_diffusion_spec, stencil_backend_t(), grid, coeff, in, out);

instead of

auto horizontal_diffusion = gt::make_computation<backend_t>(grid,
    p_coeff{} = coeff,
    gt::make_multistage(gt::enumtype::execute<gt::enumtype::parallel, 20>{},
        define_caches(gt::cache<gt::IJ, gt::cache_io_policy::local>(p_lap{}, p_flx{}, p_fly{})),
        gt::make_stage<lap_function>(p_lap{}, p_in{}),
        gt::make_independent(gt::make_stage<flx_function>(p_flx{}, p_in{}, p_lap{}),
            gt::make_stage<fly_function>(p_fly{}, p_in{}, p_lap{})),
        gt::make_stage<out_function>(p_out{}, p_in{}, p_flx{}, p_fly{}, p_coeff{})));

horizontal_diffusion.run(p_in{} = in, p_out{} = out);

See the documentation and examples for details about the new API.

Related PRs: #1388

New API: Storage Builder

Datastores are now created using a builder API, e.g.

auto storage_builder = gt::storage::builder<storage_traits_t>.dimensions(d1, d2, d3).halos(halo, halo, 0);

auto in = storage_builder.type<double const>().value(42).build();
auto coeff = storage_builder.type<double const>().value(42).build();
auto out = storage_builder.type<double>().build();

The type returned by the builder is a shared_ptr of a data_store (previously the shared_ptr was inside the data_store)

Other storage related changes:

  • Memory alignment is applied in bytes (instead of in elements).
  • Host/device buffers are automatically synchronized on creation of views or on access of the underlying pointer (the sync method is removed).

See the documentation and examples for details about the new API.

Related PRs #1388, #1534

API break: New Backend names

Our backend names (cuda, mc, x86) where a source of confusion as the users had a certain (but wrong) idea of e.g. when to use x86.

The new names are (#1490):

  • gpu instead of cuda as the same backend works for HIP.
  • cpu_kfirst instead of x86, the innermost dimension is k, suitable for vertical stencils and architectures that emphasize caches over vector instructions.
  • cpu_ifirst instead of mc, the innermost dimension is i, suitable for modern CPUs where vector instructions are key for performance.

Additionally we introduced a new backend gpu_horizontal (#1445) which works only for pure horizontal (parallel) stencils.
Performance of gpu_horizontal is improved over gpu for most stencils, however we recommend to benchmark both backends.

Other API breaking changes

  • Backend declarations (traits) are removed from common/defs.hpp and are now provided in component specific headers for stencil, timer, gcl and storage (#1388).
  • We improved the code structure by introducing finer-grained namespaces (#1388)
  • The storage repository was removed (#1456)

New functionality

  • New sid::rename_dimensions (#1533)
  • New regression test illustrating c-arrays as SIDs (#1525)
  • A Python SID adapter including regression test for calling computations from Python (#1523)
  • Introduced the threadpool concept (#1484, #1498, #1504) and added an HPX threadpool (#1437)
  • Added an example for calling CUDA GridTools computations from Fortran with OpenACC (#1454)

Improved functionality

  • GCL is now header-only (-> all GridTools is now header-only)
  • The CMake build scripts are rewritten, see the documentation and examples for how to use GridTools CMake targets (#1421, #1441, #1442, #1450, #1509)

Bug Fixes / Cleanup

  • Fixes to SID concept helpers (#1524, #1527, #1531)
  • Fixes for CUDA 11 (#1529), thanks @lukasm91
  • Fixes for HIP compilation (#1488)
  • Better error diagnostics at the frontend (#1495)
  • Performance tests are now included in a single binary (#1453)
  • Layout transformations are refactored (#1388)
  • and many other small fixes

Infrastructure/Development

  • Environments are renamed to describe more precisely what they are (#1507)
  • Added testing on the new MeteoSwiss machine Tsa to Jenkins (#1452)
  • Moved tests from Travis to GitHub actions (#1446), added tests for different CMake setups (#1443).
  • Added a Gitpod configuration (#1423)
  • Added testing with Clang-based Cray compiler on Daint (#1382)

Contributions

This release contains contributions from
@anstaf, @fthaler, @havogt, @jdahm, @lukasm91, @mbianco, @tehrengruber, @wdeconinck.

GridTools version 2.0.0rc2

29 Jul 07:22
50c5e50
Compare
Choose a tag to compare
Pre-release

see final release

GridTools version 2.0.0rc1

15 Jun 12:09
53910ee
Compare
Choose a tag to compare
Pre-release

see final release

GridTools version 1.1.3

20 Jan 14:52
d33fa6f
Compare
Choose a tag to compare

Performance fixes

  • Revert a #pragma unroll to be optimal for the COSMO dycore on V100 (#1400)

Other

  • CMake: Add a missing policy workaround_mpi.cmake (#1398)