You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@MarcRomeijn, @cwharris, @MarkMoTrin - Given your experience with the cuDF tokenizer, we'd value any feedback or suggestions for enhancing the Subword tokenizer API and features you would like.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
We currently rely on the hashed vocab file using
cudf.utils.hash_vocab_utils.hash_vocab
, we should move to using thevocab file
directly.This will be similar to the API we added here: #13930
Describe the solution you'd like
Instead of earlier:
Additional context
This should help the switch from hugging face like tokenizer to be easier.
CC: @davidwendt
@MarcRomeijn, @cwharris, @MarkMoTrin - Given your experience with the cuDF tokenizer, we'd value any feedback or suggestions for enhancing the
Subword tokenizer
API and features you would like.The text was updated successfully, but these errors were encountered: