Skip to content

Support for RFC 8746: Tags for Typed Arrays #367

@axman6

Description

@axman6

RFC 8746 specifies tags for use with CBOR byte strings to indicate they contain arrays of data of known type, which can be efficiently copied directly into native arrays.

The encoding supports arrays of {signed/unsigned}int{8, 16, 32, 64} and float{16,32,64,128} types, as well as the endianess of the contained data. This encoding maps directly to Data.Vector.Storable or Data.Vector.Primitive's internal representation (with some byte swapping needed of the endianess doesn't match the current architecture), as well as several other array types. It also has support for indicating that a CBOR array is homogeneous (corresponding to [a], Vector a etc.), and multidimensional arrays. I've only considered the TypedArray style, though the others might be useful for users as well.

Having support for this RFC would allow efficient transfer of large numerical data without having to validate the encoding of every individual element as an encoding such as [Int16] would, and would also reduce bandwidth and overhead.

I'm not sure exactly what the interface for this should look like, which is why I'm making an issue to discuss it instead of a PR. It could be as simple as having a newtype with new instances:

-- Support explicit encoding to the given endianess or use the native machine's encoding.
data Endianess 
  = Big    -- ^ Encoded data will be big endian, converted to native is needed for decoding
  | Little -- ^ ditto for little
  | Native -- ^ Encoding will be tagged to match machine encoding, and converted on decoding if needed

data Signedness = Unsigned | Signed -- used for IntN and WordN types, always signed for Float/Double

-- Contains some kind of vector type, the `a` is always in native endianess, 
-- but the tag states what to encode to.
-- /Possibly a second type where the data is stored as whatever was received 
-- and performs the encianess conversion when indexing? Probably more of a
-- `vector` library question./
newtype TypedArray (e :: Endianess) v a = TypedArray (v a)

class TypedArrayEncodable a where
     -- maybe more type safe with: data SizeIndicator = Zero | One | Two | Three
    size :: a -> Int
    signedness :: a -> Signedness
    -- Probably a very bad idea, but some efficient function to swap endianess for that type
    -- ideally it could use https://gitlab.haskell.org/ghc/ghc/-/issues/25069 if I ever get around to writing it...
    swapEndian :: PrimMonad m => Ptr a -> Int -> m ()

instance TypedArrayEncodable {Int,Word}{8,16,32,64}
instance TypedArrayEncodable {Float,Double}

-- Would need instances for the combinations of contents and vector types,
-- but with the intent that what gets encoded is just the direct byte representation
--  contained in the vector.
-- decode would need to check that the tag matches, up to the endieness, and if 
-- that doesn't, copy the data with a byte swap.
instance TypedArrayEncodable a => Serialise (TypedArray e v a)

As I wrote this interface, the more horrible it felt, so consider it an invocation of Cunningham's Law.

Whatever the interface is, the basic idea is to be able to support a basically zero copy encoding of types like Data.Vector.Storable.Vector Int16 which produces a ByteString directly from the internal buffers and can copy efficiently
into a buffer allocated in ST.

I've been thinking about this for a while and was surprised when I couldn't find any references to it anywhere in the repo or any issues. The use case I have in mind is efficiently retrieving data from a high performance embedded system which produces sample data as arrays of thousands of int16_t - not having to perform any encoding at all would massively improve performance (at least, it would once I add support to it for producing CBOR),

Related: #62

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions