Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support on-the-fly HATS web-servers #480

Open
2 of 3 tasks
hombit opened this issue Mar 28, 2025 · 2 comments
Open
2 of 3 tasks

Support on-the-fly HATS web-servers #480

hombit opened this issue Mar 28, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@hombit
Copy link
Contributor

hombit commented Mar 28, 2025

Feature request

@fxpineau developed an on-the-fly HATS server for Vizier catalogs. Unfortunately, due to some fsspec limitations we currently cannot use it:

with UPath(VIZIER_HATS_PARQUET_FILE_URL).open('rb') as fh:
    pa.parquet.read_table(fh)
File ~/.virtualenvs/lsdb/lib/python3.12/site-packages/pyarrow/parquet/core.py:1793, in read_table(source, columns, use_threads, schema, use_pandas_metadata, read_dictionary, memory_map, buffer_size, partitioning, filesystem, filters, use_legacy_dataset, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties, thrift_string_size_limit, thrift_container_size_limit, page_checksum_verification)
   1787     warnings.warn(
   1788         "Passing 'use_legacy_dataset' is deprecated as of pyarrow 15.0.0 "
   1789         "and will be removed in a future version.",
   1790         FutureWarning, stacklevel=2)
   1792 try:
-> 1793     dataset = ParquetDataset(
   1794         source,
   1795         schema=schema,
   1796         filesystem=filesystem,
   1797         partitioning=partitioning,
   1798         memory_map=memory_map,
   1799         read_dictionary=read_dictionary,
   1800         buffer_size=buffer_size,
   1801         filters=filters,
   1802         ignore_prefixes=ignore_prefixes,
   1803         pre_buffer=pre_buffer,
   1804         coerce_int96_timestamp_unit=coerce_int96_timestamp_unit,
   1805         decryption_properties=decryption_properties,
   1806         thrift_string_size_limit=thrift_string_size_limit,
   1807         thrift_container_size_limit=thrift_container_size_limit,
   1808         page_checksum_verification=page_checksum_verification,
   1809     )
   1810 except ImportError:
   1811     # fall back on ParquetFile for simple cases when pyarrow.dataset
   1812     # module is not available
   1813     if filters is not None:

File ~/.virtualenvs/lsdb/lib/python3.12/site-packages/pyarrow/parquet/core.py:1360, in ParquetDataset.__init__(self, path_or_paths, filesystem, schema, filters, read_dictionary, memory_map, buffer_size, partitioning, ignore_prefixes, pre_buffer, coerce_int96_timestamp_unit, decryption_properties, thrift_string_size_limit, thrift_container_size_limit, page_checksum_verification, use_legacy_dataset)
   1356 if single_file is not None:
   1357     fragment = parquet_format.make_fragment(single_file, filesystem)
   1359     self._dataset = ds.FileSystemDataset(
-> 1360         [fragment], schema=schema or fragment.physical_schema,
   1361         format=parquet_format,
   1362         filesystem=fragment.filesystem
   1363     )
   1364     return
   1366 # check partitioning to enable dictionary encoding

File ~/.virtualenvs/lsdb/lib/python3.12/site-packages/pyarrow/_dataset.pyx:1443, in pyarrow._dataset.Fragment.physical_schema.__get__()

File ~/.virtualenvs/lsdb/lib/python3.12/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File ~/.virtualenvs/lsdb/lib/python3.12/site-packages/pyarrow/error.pxi:89, in pyarrow.lib.check_status()

File ~/.virtualenvs/lsdb/lib/python3.12/site-packages/fsspec/implementations/http.py:732, in HTTPStreamFile.seek(self, loc, whence)
    730 if loc == self.loc and whence == 0:
    731     return
--> 732 raise ValueError("Cannot seek streaming HTTP file")

ValueError: Cannot seek streaming HTTP file

Before submitting
Please check the following:

  • I have described the purpose of the suggested change, specifying what I need the enhancement to accomplish, i.e. what problem it solves.
  • I have included any relevant links, screenshots, environment information, and data relevant to implementing the requested feature, as well as pseudocode for how I want to access the new functionality.
  • If I have ideas for how the new feature could be implemented, I have provided explanations and/or pseudocode and/or task lists for the steps.
@hombit hombit added the enhancement New feature or request label Mar 28, 2025
@fxpineau
Copy link

The URL of CDS on-the-fly HATS products is: https://vizcat.cds.unistra.fr/hats

@fxpineau
Copy link

(technically, it consists in a small cgi + apache rewrite rules transforming parquet file paths into GET queries on the qat2s.cgi used in production for VizieR large tables)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

2 participants