[Feat] Generic GPU-based data manipulation

## Motivation

deck.gl has multiple feature areas that would benefit from a shared & standardized computation module. Example use cases:

- To consume data input in memory-efficient formats such as Arrow and Parquet
- Attribute animation/transition (currently implemented with ad-hoc transforms)
- Data aggregation (currently implemented with ad-hoc transforms)
- To perform one-time data transform such as 64bit split, coordinate system conversion (currently implemented on the CPU)

## Proposal

Create a new module `@luma.gl/gpgpu`.

> The proposed syntax is strongly inspired by [tensorflow.js](https://js.tensorflow.org/api/latest/), especially the functions in Creation, Slicing and Joining, Arithmetic, Basic Math & Reduction

### Example API

Creation: returns a wrapper of a GPU buffer

```ts
// A column with stride of 1, example: Int32Array [property0, property1, property2, ...]
gpu.array1d(data: TypedArray): GPUAttribute
// A column with stride > 1, example: Float64Array [x0, y0, x1, y1, x2, y2, ...]
gpu.array2d(data: TypedArray, shape: [number, number]): GPUAttribute
// Constant scaler|vec2|vec3|vec4
gpu.constant(value: number | NumericArray): GPUAttribute
```

Reshape: joining, slicing and/or rearranging GPU buffers

```ts
// Interleaving multiple columns, example: [x0, x1, x2, ...] + [y0, y1, y2, ...] -> [x0, y0, x1, y1, x2, y2, ...]
gpu.stack(values: GPUAttribute[]): GPUAttribute
// N-D slicing of interleaved buffer, example: [a0, b0, c1, a1, b1, c1, a2, b2, c2, ...] -> [a0, c0, a1, c1, c2, c2, ...]
gpu.slice(value: GPUAttribute, begin: number[], size: number[]): GPUAttribute
```

Transform: math operations on GPU buffers

```ts
// element-wise add
gpu.add(value: GPUAttribute, 1): GPUAttribute
// min value across dimensions
gpu.min(value: GPUAttribute): number
// map each 64bit float element to two 32bit floats, as in highPart=Math.fround(x) and lowPart = x - highPart
gpu.fp64Split(value: GPUAttribute): [highPart: GPUAttribute, lowPart: GPUAttribute]
```

### Interface with loaders.gl

There is no direct dependency on loaders.gl, but the module can be "loaders friendly" by accepting a `Table` shaped input:

```ts
gpu.array1d(table: Table, columnNameOrIndex: string | number): GPUAttribute
```

### Interface with deck.gl

deck.gl could add support for accessors that return a `@luma.gl/gpgpu` `GPUAttribute` object. If such an accessor is provided, instead of filling an attribute's value array on the CPU, the underlying GPU buffer is directly transferred.

Sample layer with JSON input:

```ts
import {ScatterplotLayer} from '@deck.gl/layers';
import {extent} from 'd3-array';
import {scaleLog} from 'd3-scale';
import memoize from 'memoize';

const getColorScaleMemoized = memoize(
  data => scaleLog()
    .domain(extent(data, d => d.color_value))
    .range([0, 200, 255, 255], [255, 180, 0, 255]);
);

const layer = new ScatterplotLayer({
  data: 'points.json',

  getPosition: d => [d.x, d.y, d.z],

  getRadius: d => Math.max(Math.min(d.radius_value * 10, 100), 1),

  getFillColor: (d, {data}) => getColorScaleMemoized(data)(d.color_value)
});
```

Equivalent layer with Arrow input (option A):

```ts
import {ScatterplotLayer} from '@deck.gl/layers';
import {gpu} from '@luma.gl/gpgpu';
import {ArrowLoader} from '@loaders.gl/arrow';
import type {Table, ArrowTableBatch} from '@loaders.gl/schema';

const layer = new ScatterplotLayer({
  data: 'points.arrow',
  loaders: [ArrowLoader],

  getPosition: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const x = gpu.array1d(data, 'x');
    const y = gpu.array1d(data, 'y');
    const z = gpu.constant(0);
    return gpu.stack([x, y, z]);
  },

  getRadius: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const value = gpu.array1d(data, 'radius_value');
    return value.mul(10).clipByValue(1, 100);
  },

  getFillColor: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const value = gpu.array1d(data, 'color_value');
    return value.scaleLog([[0, 200, 255, 255], [255, 180, 0, 255]]);
  }
});
```

Equivalent declarative layer with Arrow input (option B):


```ts
{
  "type": "ScatterplotLayer",
  "data": "points.arrow",

  "getPosition": ["x", "y", "z"],

  "getRadius": {
    "source": "radius_value",
    "transform": [
      {"func": "mul", "args": [10]},
      {"func": "clipByValue", "args": [1, 100]}
    ]
  },

  "getFillColor": {
    "source": "color_value",
    "transform": [
      {"func": "scaleLog", "args": [[0, 200, 255, 255], [255, 180, 0, 255]]}
    ]
  }
}
```


## Implementation Considerations

- It might be appropriate to move the `BufferTransform` and `TextureTransform` classes from [the engine module](https://github.com/visgl/luma.gl/tree/9.1-release/modules/engine/src/compute) to this new module.
- The module will contain multiple "backends" for WebGL2 and WebGPU. Dynamic import can be used to reduce runtime footprint.
- Actual GPU resources (shaders/buffer) will need to be lazily allocated/written when the buffer is accessed. This allows a) the JS wrapper to be created without waiting for an available device; b) batching calculations for performance instead of running one render pass for each JS function; c) the buffer to be created on the same device where it will be used for render:
  
  ```ts
  gpuAttribute.getBuffer(device: Device): Buffer;
  ```

- Release of no longer needed resources. Consider the following case:

  ```ts
  getPosition: (_, {data}: {data: Table | ArrowTableBatch}) => {
    const x = gpu.array1d(data, 'x'); // intermediate buffer that will not be needed after evaluation
    const y = gpu.array1d(data, 'y'); // intermediate buffer that will not be needed after evaluation
    const z = gpu.constant(0);
    return gpu.stack([x, y, z]); // output buffer that will be used for render
  }
  ```

  We could have something similar to [tf.tidy(fn)](https://js.tensorflow.org/api/latest/#tidy) which cleans up all intermediate tensors allocated by `fn` except those returned by `fn`.

  Alternatively we could consider using [FinalizationRegistry](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/FinalizationRegistry) to clean up intermediate buffers, though the application will have less control of when the clean up happens. (i.e. the standard deck.gl Layer tests will fail due to unreleased WebGL resources).

## Discussion

- Do we want to use an existing external library instead of rolling our own?

  First of all, I have not conducted an extensive investigation of existing offerings, so additional comment on this is much welcomed. Based on my own experience, the main pain point (with a long maintenance tail) is context sharing (required for deck.gl to reuse the output GPU buffer without reading it out to the CPU).

  + [tensorflow.js](https://js.tensorflow.org): a proof-of-concept is available [here](https://github.com/visgl/deck.gl/pull/7931). It is very mature with a big user base, cross-platform existence, and a variety of backend implementations (WebGL, WebGPU, WebAssembly). The library itself is fairly heavy-weight (> 1 MB minified) with extra machine-learning functionalities, though it could likely be reduced if we re-distribute a [tree-shaked bundle](https://www.tensorflow.org/js/tutorials/deployment/size_optimized_bundles). Forcing it to use an external WebGL context is painful because the context state handoff is not clean.
  + [gpu.js](https://github.com/gpujs/gpu.js/): the ability to write JavaScript functions that get translated to shader code is very appealing. The library has not been updated for 2 years and I doubt there will be WebGPU support.

- TBD

@ibgreen @felixpalmer @donmccurdy 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Generic GPU-based data manipulation #2340

Motivation

Proposal

Example API

Interface with loaders.gl

Interface with deck.gl

Implementation Considerations

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat] Generic GPU-based data manipulation #2340

Description

Motivation

Proposal

Example API

Interface with loaders.gl

Interface with deck.gl

Implementation Considerations

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions