Skip to content

Conversation

@cjboyle
Copy link
Contributor

@cjboyle cjboyle commented Aug 23, 2025

This adds client and backend support for reading/writing irregular arrays using the the ragged package. As ragged is more or less a wrapper around awkward, this PR reuses, or adds similar implementations from that structure family (e.g. serialization).

Implements #801.

Checklist

  • Add a Changelog entry
  • Add the ticket number which this PR closes to the comment section
  • Writing ragged data from client (file storage)
  • Reading ragged data to client (file storage)
    • in full
    • sliced
    • from block
    • from block, sliced
    • with variable chunks
  • Writing [None] shaped data from Bluesky/TiledWriter into SQL storage
  • Reading data from RaggedAdapter returned by SQLAdapter (SQL storage)
  • Reading ragged data to client (SQL storage)
    • in full
    • sliced
    • from block
    • from block, sliced
    • with variable chunks
  • Serialization
    • JSON
    • Arrow
    • Parquet
    • others?

@danielballan
Copy link
Member

Awesome!

It looks like you found all the modules that need to be touched to add this.

The aspect that will need the most careful thought is the structure description and the HTTP APIs. These are designed to be used not only from the built-in Python client, but also from curl with tools like jq, browser-based applications, maybe Julia or Rust someday....

The Awkward form is quite complex. I suspect that only Python and C++ based clients, with access to awkward / awkward-cpp libraries, will be able to parse the form and engage with Tiled's Awkward structures in detail. (Unless, that is, IRIS-HEP builds Awkward libraries in other languages.) Clients without knowledge of Awkward can still get the data—exporting it to JSON, for example—but they probably cannot introspect or slice it in sophisticated ways.

If we were willing to similarly restrict ragged to clients with access to an awkward implementation, we wouldn't even really need to add a new structure family. We could implement it fully client-side, as a wrapper of the awkward client. But I see advantages in using the comparative simplicity of ragged to make it more accessible to simple clients.

This form construct is more flexible than ragged requires:

{'class': 'ListOffsetArray',
 'offsets': 'i64',
 'content': {'class': 'NumpyArray',
  'primitive': 'int64',
  'inner_shape': [],
  'parameters': {},
  'form_key': 'node1'},
 'parameters': {},
 'form_key': 'node0'}

A ragged form is always composed of one numpy "content" array and some number of "offset" arrays—full stop. It can be described thus (from #801):

class RaggedStructure(ArrayStructure):
    shape: Tuple[None | int, ...]  # override base class which has this as Tuple[int, ...]

I'm not sure whether ragged always puts offset arrays in int64 dtype. If other uint types may be needed, then we will need a supplemental offset_datatype, similar to the supplemental coord_datatype in sparse structures.

coord_data_type: Optional[BuiltinDtype] = field(
default_factory=lambda: BuiltinDtype(
Endianness("little"), Kind("u"), 8
) # numpy 'uint' dtype
)

Although reusing the awkward form keeps things simple assuming your client already consumes awkward I think having a custom, much more constrained structure JSON, is worthwhile, to make ragged arrays a more portable and accessible concept.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants