Binary array encoding (i.e. json+binref) not supported by Python SDK

### Summary

Currently, the `from_image` and `from_tesseract_api` Python SDK allows `output_format="json+binref"` but this is not supported by `_decode_array` (after #422 is merged this option will be explicitly removed for clarity). The current implementation of `_decode_array` only offers support for `json+base64` encoding explicitly and otherwise defaults to raw list handling:

```python
def _decode_array(encoded_arr: dict) -> np.ndarray:
    if "data" in encoded_arr:
        if encoded_arr["data"]["encoding"] == "base64":
            data = base64.b64decode(encoded_arr["data"]["buffer"])
            arr = np.frombuffer(data, dtype=encoded_arr["dtype"])
        else:
            arr = np.array(encoded_arr["data"]["buffer"], dtype=encoded_arr["dtype"])
```

To support json+binref, we need the following:

1. Add failing tests
2. Allow output_path to be provided as argument to from_url (or to query this with a request)
3. Ensure output_path is provided when using json+binref encoding (or queried with a request if absent)
4. Add `fsspec` to SDK dependencies to allow for reading of cloud file storage (for interim local solution can just use Python's built-in `open`)
5. After safely reading file convert bytes to numpy array in _decode_array using np.frombuffer
6. Double-check this works properly with from_url (not sure if binaries get downloaded with request otherwise we might need to request them directly).

### Why is this needed?

The performance improvement of json+binref is unlikely to be material until output sizes exceed at least 100MB-5GB (depending on network speed). In these cases the

1. 25% lower memory footprint of pure binaries over base64,
2. more manageable json file sizes,
3. minimal encode/decode time (the least material advantage as base64 encoding/decoding is already highly optimized)

could add up to a material improvement in serialization plus transfer time.

Happy to address when we determine there is sufficient advantage for this or to perform minimal tests on base64 vs binref encoding/decoding time.

### Usage example

```python
from tesseract_core import Tesseract

with Tesseract.from_image(..., output_format="json+binref"):
    ....
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Binary array encoding (i.e. json+binref) not supported by Python SDK #423

Summary

Why is this needed?

Usage example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Binary array encoding (i.e. json+binref) not supported by Python SDK #423

Description

Summary

Why is this needed?

Usage example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions