Skip to content

Commit c78fb2d

Browse files
Add /zarr/v2 and /zarr/v3 Endpoints (#774)
* add zarr route * ENH: basic zarr functionality * ENH: map tiled chunks to zarr blocks * MNT: Clean-up comments * ENH: support tables * add zarr route * ENH: basic zarr functionality * ENH: map tiled chunks to zarr blocks * MNT: Clean-up comments * ENH: support tables * ENH: Add data type to sparse * ENH: support units for numpy datetime types * MNT: removed unnecessary imports * ENH: add default value for units * ENH: update BuiltinDtype in pydantic * ENH: update BuiltinDtype in pydantic * ENH: add default value for units * TST: datetime dtypes in test_array * MNT: Update changelog * resolve conflict FIX: Recursion error when pickling with dill TST: Initial tests for zarr endpoints Clean, refactor, and lint TST: tests for arrays and tables TST: tests for arrays and tables ENH: restructure demo examples ENH: (partial) support for StructDtype TST: fix tests ENH: support for datetime types * Clean-up * MNT: gitignore alembic.ini * FIX: typo in comment * ENH: Add data type to sparse * FIX: assignment error * MNT: clean and lint * MNT: update changelog * MNT: fix changelog * FIX: tests with COOStructure * FIX: typing * BLD: add aiohttp package to server requirements * MNT: clean and lint * FIX: default value of units to empty string. * FIX: use None as the sentinel for the units kwarg * ENH: use np.datetime_data to extract units * TST: Fix failing authorization test -- empty password * MNT: format and lint * MNT: remove deprecated PatchedStreamingResponse * MNT: lint * TST: add authentication tests * FIX: ensure support for py3.8 * MNT: lint * MNT: add changelog entry * MNT: moved aiohttp from required to dev dependencies * ENH: add indx_data_type for sparse arrays * MNT: rename indx_data_type to coord_data_type * MNT: clean up * MNT: lint * TST: refactor ThreadedServer class for tests * TST: test no writing in read-only mode * MNT: resolve conflicts * MNT: lint * FIX: use default_factory for BuiltinDtype * ENH: incorporate changes from David * TST: fix existing tests after rebase * TST: update tests * FIX: do not specify media_type in zar group response * TST: xfail dtype tests * TST: bring back tests with complex and string types * ENH: add zarr v3 endpoints * FIX: variable names and default codec * ENH: support struct dtypes natively * ENH: remove zarr middleware * FIX: zattrs and codecs for zarr v2 * DOC: add tutorial on zarr * TST: use blosc lz4 codec for v2 and v3 * DOC: fix errors in building docs * TST: replace default zarr metadata when exporting to HDF5 * Update docs/source/tutorials/zarr-integration.md Co-authored-by: Dan Allan <[email protected]> * DOC: update docs * MNT: update readme to match the new demo tree * Satisfy linter --------- Co-authored-by: Dan Allan <[email protected]> Co-authored-by: Dan Allan <[email protected]>
1 parent a7ddd55 commit c78fb2d

29 files changed

+1397
-370
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ config.yml
6969
prometheus_data
7070
grafana_data
7171
data
72+
alembic.ini
7273

7374
tiled/_version.py
7475

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ Write the date in place of the "Unreleased" in the case a new version is release
55

66
## v0.1.0-b33 (Unreleased)
77

8+
### Added
9+
10+
- Endpoints for (read) data access with zarr v2 and v3 protocols.
11+
- `data_type` and `coord_data_type` properties for sparse arrays in `COOAdapter`
12+
and `COOStructure`.
13+
814
### Changed
915

1016
- Refactored internal server function ``get_root_tree()`` to not use FastAPI

README.md

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -64,25 +64,22 @@ any HTTP client.
6464
>>> client = from_uri("http://localhost:8000")
6565

6666
>>> client
67-
<Container {'short_table', 'long_table', 'structured_data', ...} ~10 entries>
67+
<Container {'scalars', 'nested', 'tables', 'structured_data', ...} ~8 entries>
6868

6969
>>> list(client)
70-
'big_image',
71-
'small_image',
72-
'tiny_image',
73-
'tiny_cube',
74-
'tiny_hypercube',
70+
['scalars',
71+
'nested',
72+
'tables',
73+
'structured_data',
74+
'flat_array',
7575
'low_entropy',
7676
'high_entropy',
77-
'short_table',
78-
'long_table',
79-
'labeled_data',
80-
'structured_data']
77+
'dynamic']
8178

82-
>>> client['medium_image']
79+
>>> client['nested/images/medium_image']
8380
<ArrayClient>
8481

85-
>>> client['medium_image'][:]
82+
>>> client['nested/images/medium_image'][:]
8683
array([[0.49675483, 0.37832119, 0.59431287, ..., 0.16990737, 0.5396537 ,
8784
0.61913812],
8885
[0.97062498, 0.93776709, 0.81797714, ..., 0.96508877, 0.25208564,
@@ -97,10 +94,10 @@ array([[0.49675483, 0.37832119, 0.59431287, ..., 0.16990737, 0.5396537 ,
9794
[0.16567224, 0.1347261 , 0.48809697, ..., 0.55021249, 0.42324589,
9895
0.31440635]])
9996

100-
>>> client['long_table']
97+
>>> client['tables/long_table']
10198
<DataFrameClient ['A', 'B', 'C']>
10299

103-
>>> client['long_table'].read()
100+
>>> client['tables/long_table'].read()
104101
A B C
105102
index
106103
0 0.246920 0.493840 0.740759
@@ -117,7 +114,7 @@ index
117114

118115
[100000 rows x 3 columns]
119116

120-
>>> client['long_table'].read(['A', 'B'])
117+
>>> client['tables/long_table'].read(['A', 'B'])
121118
A B
122119
index
123120
0 0.246920 0.493840
@@ -139,19 +136,19 @@ data in whole or in efficiently-chunked parts in the format of your choice:
139136

140137
```
141138
# Download tabular data as CSV
142-
http://localhost:8000/api/v1/table/full/long_table?format=csv
139+
http://localhost:8000/api/v1/table/full/tables/long_table?format=csv
143140
144141
# or XLSX (Excel)
145-
http://localhost:8000/api/v1/table/full/long_table?format=xslx
142+
http://localhost:8000/api/v1/table/full/tables/long_table?format=xslx
146143
147144
# and subselect columns.
148-
http://localhost:8000/api/v1/table/full/long_table?format=xslx&field=A&field=B
145+
http://localhost:8000/api/v1/table/full/tables/long_table?format=xslx&field=A&field=B
149146
150147
# View or download (2D) array data as PNG
151-
http://localhost:8000/api/v1/array/full/medium_image?format=png
148+
http://localhost:8000/api/v1/array/full/nested/images/medium_image?format=png
152149
153150
# and slice regions of interest.
154-
http://localhost:8000/api/v1/array/full/medium_image?format=png&slice=:50,100:200
151+
http://localhost:8000/api/v1/array/full/nested/images/medium_image?format=png&slice=:50,100:200
155152
```
156153

157154
Web-based data access usually involves downloading complete files, in the

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ tutorials/search
1414
tutorials/writing
1515
tutorials/simple-server
1616
tutorials/plotly-integration
17+
tutorials/zarr-integration
1718
```
1819

1920
```{toctree}

docs/source/tutorials/export.md

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -15,54 +15,57 @@ Now, in a Python interpreter, connect, with the Python client.
1515
from tiled.client import from_uri
1616

1717
client = from_uri("http://localhost:8000")
18+
19+
tables = client["tables"] # Container of demo tables
20+
images = client["nested/images"] # Container of demo images
1821
```
1922

2023
The Tiled server can encode its structures in various formats.
2124
These are just a couple of the supported formats:
2225

2326
```python
2427
# Table
25-
client["short_table"].export("table.xlsx") # Excel
26-
client["short_table"].export("table.csv") # CSV
28+
tables["short_table"].export("table.xlsx") # Excel
29+
tables["short_table"].export("table.csv") # CSV
2730

2831
# Array
29-
client["medium_image"].export("numbers.csv") # CSV
30-
client["medium_image"].export("image.png") # PNG image
31-
client["medium_image"].export("image.tiff") # TIFF image
32+
images["medium_image"].export("numbers.csv") # CSV
33+
images["medium_image"].export("image.png") # PNG image
34+
images["medium_image"].export("image.tiff") # TIFF image
3235
```
3336

3437
It's possible to select a subset of the data to only "pay" for what you need.
3538

3639
```python
3740
# Export just some of the columns...
38-
client["short_table"].export("table.csv", columns=["A", "B"])
41+
tables["short_table"].export("table.csv", columns=["A", "B"])
3942

4043
# Export an N-dimensional slice...
41-
client["medium_image"].export("numbers.csv", slice=[0]) # like arr[0]
44+
images["medium_image"].export("numbers.csv", slice=[0]) # like arr[0]
4245
import numpy
43-
client["medium_image"].export("numbers.csv", slice=numpy.s_[:10, 100:200]) # like arr[:10, 100:200]
46+
images["medium_image"].export("numbers.csv", slice=numpy.s_[:10, 100:200]) # like arr[:10, 100:200]
4447
```
4548

4649
In the examples above, the desired format is automatically detected from the
4750
file extension (`table.csv` -> `csv`). It can also be specified explicitly.
4851

4952
```python
5053
# Format inferred from filename...
51-
client["short_table"].export("table.csv")
54+
tables["short_table"].export("table.csv")
5255

5356
# Format given as a file extension...
54-
client["short_table"].export("table.csv", format="csv")
57+
tables["short_table"].export("table.csv", format="csv")
5558

5659
# Format given as a media type (MIME)...
57-
client["short_table"].export("table.csv", format="text/csv")
60+
tables["short_table"].export("table.csv", format="text/csv")
5861
```
5962

6063
## Supported Formats
6164

6265
To list the supported formats for a given structure:
6366

6467
```py
65-
client["short_table"].formats
68+
tables["short_table"].formats
6669
```
6770

6871
**It is easy to add formats and customize the details of how they are exported,
@@ -116,13 +119,13 @@ buffer) in which case the format must be specified.
116119
```python
117120
# Writing directly to an open file
118121
with open("table.csv", "wb") as file:
119-
client["short_table"].export(file, format="csv")
122+
tables["short_table"].export(file, format="csv")
120123

121124
# Writing to a buffer
122125
from io import BytesIO
123126

124127
buffer = BytesIO()
125-
client["short_table"].export(buffer, format="csv")
128+
tables["short_table"].export(buffer, format="csv")
126129
```
127130

128131
## Limitations
@@ -136,7 +139,7 @@ data you want, not on formatting it "just so". To do more refined export, use
136139
standard Python tools, as in:
137140

138141
```python
139-
df = client["short_table"].read()
142+
df = tables["short_table"].read()
140143
# At this point we are done with Tiled. From here, we just use pandas,
141144
# or whatever we want.
142145
df.to_csv("table.csv", sep=";", header=["custom", "column", "headings"])

docs/source/tutorials/navigation.md

Lines changed: 79 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -25,20 +25,27 @@ Tiled provides a utility for visualizing a nested structure.
2525
```python
2626
>>> from tiled.utils import tree
2727
>>> tree(client)
28-
├── big_image
29-
├── small_image
30-
├── medium_image
31-
├── sparse_image
32-
├── awkward_array
33-
├── tiny_image
34-
├── tiny_cube
35-
├── tiny_hypercube
36-
├── short_table
37-
├── long_table
38-
├── wide_table
39-
├── structured_data
40-
│ ├── pets
41-
│ └── xarray_dataset
28+
├── scalars
29+
│ ├── pi
30+
│ ├── e_arr
31+
│ ├── fsc
32+
│ └── fortytwo
33+
├── nested
34+
│ ├── images
35+
│ │ ├── tiny_image
36+
│ │ ├── small_image
37+
│ │ ├── medium_image
38+
│ │ └── big_image
39+
│ ├── cubes
40+
│ │ ├── tiny_cube
41+
│ │ └── tiny_hypercube
42+
│ ├── complex
43+
│ ├── sparse_image
44+
│ └── awkward_array
45+
├── tables
46+
│ ├── short_table
47+
│ ├── long_table
48+
<Output truncated at 20 lines. Adjust tree's max_lines parameter to see more.>
4249
```
4350

4451
Each (sub)tree displays the names of a couple of its entries---up to
@@ -47,41 +54,55 @@ however many fit on one line.
4754

4855
```python
4956
>>> client
50-
<Container {'big_image', 'small_image', 'medium_image', ...} ~16 entries>
57+
<Container {'scalars', 'nested', 'tables', 'structured_data', ...} ~8 entries>
5158
```
5259

5360
Containers act like (nested) mappings in Python. All the (read-only) methods
54-
that work on Python dictionaries work on Containers. We can lookup a specific
55-
value by its key
61+
that work on Python dictionaries work on Containers. We can
62+
63+
* lookup a specific value by its key
5664

5765
```python
5866
>>> client['structured_data']
5967
<Container {'pets', 'xarray_dataset'}>
6068
```
6169

62-
list all the keys
70+
* easily access nested hierarchies
71+
72+
```python
73+
>>> client['nested']['images']['tiny_image']
74+
<ArrayClient shape=(50, 50) chunks=((50,), (50,)) dtype=float64>
75+
```
76+
77+
* or using a simplified syntax
78+
79+
```python
80+
>>> client['nested', 'images', 'tiny_image']
81+
<ArrayClient shape=(50, 50) chunks=((50,), (50,)) dtype=float64>
82+
```
83+
84+
* or even
85+
86+
```python
87+
>>> client['nested/images/tiny_image']
88+
<ArrayClient shape=(50, 50) chunks=((50,), (50,)) dtype=float64>
89+
```
90+
91+
* list all the keys
6392

6493
```python
6594
>>> list(client)
66-
['big_image',
67-
'small_image',
68-
'medium_image',
69-
'sparse_image',
70-
'awkward_array',
71-
'tiny_image',
72-
'tiny_cube',
73-
'tiny_hypercube',
74-
'short_table',
75-
'long_table',
76-
'wide_table',
95+
['scalars',
96+
'nested',
97+
'tables',
7798
'structured_data',
7899
'flat_array',
79100
'low_entropy',
80101
'high_entropy',
81102
'dynamic']
82103
```
83104

84-
and loop over keys, values, or ``(key, value)`` pairs.
105+
* and loop over keys, values, or ``(key, value)`` pairs
85106

86107
```python
87108
for key in client:
@@ -104,37 +125,49 @@ need to start from the middle.
104125

105126
```python
106127
>>> client.keys().first() # Access the first key.
107-
'big_image'
128+
'scalars'
108129

109130
>>> client.keys().head() # Access the first several keys.
110-
['big_image',
111-
'small_image',
112-
'medium_image',
113-
'sparse_image',
114-
'awkward_array']
131+
['scalars',
132+
'nested',
133+
'tables',
134+
'structured_data',
135+
'flat_array']
115136

116137
>>> client.keys().head(3) # Access the first N keys.
117-
['big_image',
118-
'small_image',
119-
'medium_image']
138+
['scalars',
139+
'nested',
140+
'tables']
120141

121142
>>> client.keys()[1:3] # Access just the keys for entries 1:3.
122-
['small_image', 'medium_image']
143+
['nested', 'tables']
123144
```
124145

125-
All the same methods work for values
146+
All the same methods work for values, which return string representations of the
147+
container contents:
126148

127149
```python
128150
>>> client.values()[1:3] # Access the values (which may be more expensive).
129-
[<ArrayClient shape=(300, 300) chunks=((300,), (300,)) dtype=float64>, <ArrayClient shape=(1000, 1000) chunks=((1000,), (1000,)) dtype=float64>]
151+
[<Container {'images', 'cubes', 'complex', 'sparse_image', ...} ~5 entries>,
152+
<Container {'short_table', 'long_table', 'wide_table'}>]
153+
```
154+
155+
or
156+
157+
```python
158+
>>> client['nested/images'].values()[:2] # Access the values of a nested container
159+
[<ArrayClient shape=(50, 50) chunks=((50,), (50,)) dtype=float64>,
160+
<ArrayClient shape=(300, 300) chunks=((300,), (300,)) dtype=float64>]
130161
```
131162

132163
and `(key, value)` pairs ("items").
133164

134165
```python
135-
>>> client.items()[1:3] # Access (key, value) pairs.
136-
[('small_image', <ArrayClient shape=(300, 300) chunks=((300,), (300,)) dtype=float64>),
137-
('medium_image', <ArrayClient shape=(1000, 1000) chunks=((1000,), (1000,)) dtype=float64>)]
166+
>>> client['nested/images'].items()[:2] # Access (key, value) pairs.
167+
[('tiny_image',
168+
<ArrayClient shape=(50, 50) chunks=((50,), (50,)) dtype=float64>),
169+
('small_image',
170+
<ArrayClient shape=(300, 300) chunks=((300,), (300,)) dtype=float64>)]
138171
```
139172

140173
Each item has ``metadata``, which is a simple dict.
@@ -145,7 +178,7 @@ space to use or not.
145178
>>> client.metadata # happens to be empty
146179
DictView({})
147180

148-
>>> client['short_table'].metadata # happens to have some stuff
181+
>>> client['tables/short_table'].metadata # happens to have some stuff
149182
DictView({'animal': 'dog', 'color': 'red'})
150183
```
151184

0 commit comments

Comments
 (0)