You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We proxy our Azure Blob Storage using flexify.io, having migrated away from MinIO gateway. This allows us to provide an S3-compatible interface for Azure Blob storage.
When Trino tries to read a Parquet format file, it looks for the PAR1 magic number at the footer of the file, and to fetch the file it makes a GET request with a partial range to the storage endpoint (some headers have been removed):
GET /bucket/01971b9f-7b05-73a2-8ebd-af8c31907960/01971ba0-58d8-7da0-ad29-f089ae4994dc/00000-0-b1313c67-9270-422f-9e37-620e1d9803ee.parquet HTTP/1.1
...
Range: bytes=-49152
...
Given it's requesting the last N bytes, I'm assuming it looks at specific byte indexes for the magic number, but can't find it. Otherwise, it would find the magic bytes at the end of the file as normal.
This bytes=-N format is not supported by Azure for some reason. This works with AWS S3 storage directly.
Is there anything which can be done to get Trino to request the explicit range? Trino makes a HEAD request just prior to the GET request, so the Content-Length is known at the time the GET request is made, therefore it should be possible to request i.e bytes=(length-N)-length instead of bytes=-N to get the last N bytes.
Currently, we're resorting to use the legacy hive.s3 filesystem (doc), which uses the Range: bytes=0-9223372036854775806 header and works, however obviously this is deprecated and will be removed in a future release so don't want to rely on it.
The text was updated successfully, but these errors were encountered:
The azure client is doing what Azure supports in io.trino.filesystem.azure.AzureInput#readTail and the S3 client is doing what S3 supports in io.trino.filesystem.s3.S3Input#readTail.
It's possible to change the S3 client to do whatever is the lowest common denominator between S3 and Azure, but that feels brittle as we wouldn't be able to test any future changes against this scenario. Our S3 tests run only against S3 compatible implementations.
I think it is justified for the S3 client to expect an S3 compatible interface and not have to change its behaviour to adjust for the backend not supporting some aspects of S3 interface. I would expect the proxy layer that you're using to adapt to the mismatch in interfaces.
Uh oh!
There was an error while loading. Please reload this page.
We proxy our Azure Blob Storage using flexify.io, having migrated away from MinIO gateway. This allows us to provide an S3-compatible interface for Azure Blob storage.
When Trino tries to read a Parquet format file, it looks for the
PAR1
magic number at the footer of the file, and to fetch the file it makes a GET request with a partial range to the storage endpoint (some headers have been removed):i.e the last 49152 bytes are requested.
Azure returns the following.
i.e it returns 10050405 bytes, or the entire file, and a 200 is returned instead of 206.
I'm not sure how Trino then searches the file for the magic bytes, however it results in the following error:
Given it's requesting the last N bytes, I'm assuming it looks at specific byte indexes for the magic number, but can't find it. Otherwise, it would find the magic bytes at the end of the file as normal.
This
bytes=-N
format is not supported by Azure for some reason. This works with AWS S3 storage directly.Is there anything which can be done to get Trino to request the explicit range? Trino makes a HEAD request just prior to the GET request, so the
Content-Length
is known at the time the GET request is made, therefore it should be possible to request i.ebytes=(length-N)-length
instead ofbytes=-N
to get the last N bytes.Currently, we're resorting to use the legacy
hive.s3
filesystem (doc), which uses theRange: bytes=0-9223372036854775806
header and works, however obviously this is deprecated and will be removed in a future release so don't want to rely on it.The text was updated successfully, but these errors were encountered: