System Info
unrelated
Who can help?
@muellerzr @SunMarc
(original tags, no longer valid)
@ArthurZucker
(re-tag because want to discuss patch release)
Information
Tasks
Reproduction
Hi thanks for the library! Consider this simple line:
x = transformers.tokenization_utils_base.BatchEncoding({'a': ['x','y']})
x.to('cpu') # or cuda or whatever
The column a
is then silently removed :(
This is annoying in the following scenario: For each of my training/eval sample, I have a string column that serves as a tag for it, and want to utilize it when computing metrics and losses.
Then it does not work. After some debugging, the root reason is that it gets silently removed in the to
mentioned above.
It seems torch does not support a tensor of dtype str
, thus it seems impossible to have data pass through.
Expected behavior
(see above)