Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safetensors dataset #221

Open
astrojuanlu opened this issue May 23, 2023 · 3 comments · May be fixed by #898
Open

Add safetensors dataset #221

astrojuanlu opened this issue May 23, 2023 · 3 comments · May be fixed by #898
Labels
Hacktoberfest help wanted Contribution task, outside help would be appreciated!

Comments

@astrojuanlu
Copy link
Member

https://huggingface.co/blog/safetensors-security-audit

🐶Safetensors is a library for saving and loading tensors in the most common frameworks (including PyTorch, TensorFlow, JAX, PaddlePaddle, and NumPy).

import torch
from safetensors.torch import load_file, save_file

weights = {"embeddings": torch.zeros((10, 100))}
save_file(weights, "model.safetensors")
weights2 = load_file("model.safetensors")

Comparison with other formats: https://github.com/huggingface/safetensors#yet-another-format-

For Hugging Face, EleutherAI, and Stability AI, the master plan is to shift to using this format by default.

@merelcht merelcht added the Community Issue/PR opened by the open-source community label Dec 13, 2023
@merelcht merelcht changed the title Add safetensors dataset? Add safetensors dataset Dec 13, 2023
@ankatiyar ankatiyar moved this to To Do in Kedro Framework Sep 30, 2024
@MinuraPunchihewa
Copy link
Contributor

Hey @astrojuanlu,
Can I give this a shot?
I had previously commented on this issue, but I am having trouble with my Hive setup and I would like to tackle it later.

@astrojuanlu
Copy link
Member Author

Go ahead @MinuraPunchihewa ! Please add it as an experimental dataset

@astrojuanlu
Copy link
Member Author

Well, pressure is mounting to move away from pickle 😄 https://www.forbes.com/sites/iainmartin/2024/10/22/hackers-have-uploaded-thousands-of-malicious-models-to-ais-biggest-online-repository/

From @/adrinjalali on LinkedIn

The core of the problem being pickle format. This is nothing new, and the community has been working quite hard to mitigate these issues for a while at this point. However, those pickle files are still there and people still load them.

There are tools which you can use to avoid loading pickle files:

@merelcht merelcht added help wanted Contribution task, outside help would be appreciated! and removed Community Issue/PR opened by the open-source community labels Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hacktoberfest help wanted Contribution task, outside help would be appreciated!
Projects
No open projects
Status: Todo
Development

Successfully merging a pull request may close this issue.

3 participants