Skip to content

Conversation

cpard
Copy link

@cpard cpard commented Sep 18, 2025

Hey everyone,

fenic now has native support for reading datasets directly from the Hugging Face Hub using the hf:// protocol, documented at docs.fenic.ai. This PR adds the corresponding documentation to the Hugging Face docs.

Changes

  • New documentation page for fenic integration (docs/hub/datasets-fenic.md)
  • Updated libraries table adding fenic
  • Added navigation entry in _toctree.yml

Features documented

  • Read CSV and Parquet files directly from HF datasets using hf:// protocol
  • Support for dataset revisions and versions
  • Mix HF data sources with local files in a single read operation
  • Process data with PySpark-inspired DataFrame operations and AI-powered transformations

Image

I'll add the image on a separate PR, is there any specific instructions I should follow for this?

Happy to answer any questions and of course accommodate any changes required.

Thank you for the amazing work you've been doing with HuggingFace Datasets.

@cpard
Copy link
Author

cpard commented Sep 18, 2025

@lhoestq
Copy link
Member

lhoestq commented Sep 22, 2025

I'm just discovering fenic, the API looks great :) btw is there a way to use Hugging Face Inference Providers for the semantic / generative operations ?

It's a unified API for many providers serving models on HF, you can find more info at https://huggingface.co/docs/inference-providers/en/index

@cpard
Copy link
Author

cpard commented Sep 22, 2025

I'm just discovering fenic, the API looks great :) btw is there a way to use Hugging Face Inference Providers for the semantic / generative operations ?

It's a unified API for many providers serving models on HF, you can find more info at https://huggingface.co/docs/inference-providers/en/index

Thank you so much for the kind words. It's great to hear from you that the API looks great!

Regarding HF Inference Providers, we don't currently have support for it but we will definitely add it, same with also writing back to Datasets. Right now we only support reading from HF Datasets but the goal is to have full support. For us the functionality that HF Datasets offers is really important for the experience we want to offer and the functionality we are working on, e.g. hydrating MCP servers with precomputed data sets that are stored on HF.

For an example of that, check this: https://huggingface.co/datasets/typedef-ai/fenic-0.4.0-codebase
This dataset is generated using fenic, over the fenic codebase and then is used to hydrate the MCP server we use for our documentation tooling, check here for more information: https://github.com/typedef-ai/fenic/tree/main/examples/mcp/docs-server

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@davanstrien davanstrien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice! Added a few small questions/suggestions.

@cpard
Copy link
Author

cpard commented Oct 9, 2025

Looks very nice! Added a few small questions/suggestions.

I tried to address all the questions/suggestions, check also my latest commits. Let me know if there's anything else I can do to make this better.

I really appreciate the time you spend on this!

@cpard cpard requested a review from davanstrien October 9, 2025 06:03
Copy link
Member

@davanstrien davanstrien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on it! Will see if @lhoestq has any final comments but otherwise think we can merge (can update later with Inference Providers examples!)

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good ! The write to HF operation is not available because it's not yet implemented in DuckDB right ? (link to duckdb issue here)

Once this is merged/deployed you should share this online with the community :) Many people will like the API which is clean and convenient. Feel free to let us know here about your posts so we can amplify and re-share !

Also looking forward to see the integration with HF Inference providers, have you already started to look into it by any chance ?

@cpard
Copy link
Author

cpard commented Oct 14, 2025

This looks good ! The write to HF operation is not available because it's not yet implemented in DuckDB right ? (link to duckdb issue here)

Yes but we are most probably going to integrate directly the Datasets SDK on fenic. This will give us the maximum flexibility for the things we want to do. As soon as the work is at a stage it can be shared I'll let you know so you can take a look.

Once this is merged/deployed you should share this online with the community :) Many people will like the API which is clean and convenient. Feel free to let us know here about your posts so we can amplify and re-share !

100% I'm also planning to write some content about this, I believe Datasets has tremendous potential and I'd like to share some the stuff we find really neat by combining a processing engine with the SDK and the infrastructure HF provides.

Also looking forward to see the integration with HF Inference providers, have you already started to look into it by any chance ?

We haven't yet but we will soon. I'll keep you posted!

Thank you so much for all the support guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants