Using `seek_points()` to obtain valid decompression ranges.

I would like to use `indexed_gzip` to download only the relevant portions of a `.gz` file via a ranged `GET` request. I understand that you cannot use `indexed_gzip` to download and decompress from an arbitrary point (see #112). However I am hoping that it is possible to use the index generated by `indexed_gzip` and made accessible via the `seek_points` method to download and decompress a small portion of the larger file that contains the data I'm interested in. This is what I have so far:

```python
import zlib

import indexed_gzip as igzip
import numpy as np


def get_data(gz_path: str, index1: int, index2: int) -> bytes:
    with igzip.IndexedGzipFile(gz_path) as f:
        f.build_full_index()
        seek_points = list(f.seek_points())

    array = np.array(seek_points)
    start = array[index1, 1]
    stop = array[index2, 1]

    # stand-in for a ranged GET request~~~~ #
    with open(gz_path, 'rb') as f:
        f.seek(start)
        compressed = f.read(stop - start)
    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #

    decompressed = zlib.decompressobj(-1 * zlib.MAX_WBITS).decompress(compressed)
    return decompressed
```
This function performs as expected when passing the arguments `get_data(file_path, 0, 1)`, but when not starting from the first index location (e.g.,  `get_data(file_path, 1, 2)`) the function fails in the `zlib` decompression step with the message: `zlib.error: Error -3 while decompressing data: invalid block type`.

I'm guessing that the root of this issue is that I do not fully understand how `zlib` decompression works and what the required data formatting is. If you have any suggestion on how to modify this function to achieve my goal, I'd appreciate it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using `seek_points()` to obtain valid decompression ranges. #114

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using seek_points() to obtain valid decompression ranges. #114

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Using `seek_points()` to obtain valid decompression ranges. #114