BatchEncoding.to throws away columns silently, thus no way to pass non-tensor columns such as String in Trainer metric computation

### System Info

unrelated

### Who can help?

@muellerzr @SunMarc
(original tags, no longer valid)

@ArthurZucker 
(re-tag because want to discuss patch release)

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

Hi thanks for the library! Consider this simple line:

```
x = transformers.tokenization_utils_base.BatchEncoding({'a': ['x','y']})
x.to('cpu') # or cuda or whatever
```

The column `a` is then silently removed :(

This is annoying in the following scenario: For each of my training/eval sample, I have a string column that serves as a tag for it, and want to utilize it when computing metrics and losses.

Then it does not work. After some debugging, the root reason is that it gets silently removed in the `to` mentioned above.

It seems torch does not support a tensor of dtype `str`, thus it seems impossible to have data pass through.

### Expected behavior

(see above)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BatchEncoding.to throws away columns silently, thus no way to pass non-tensor columns such as String in Trainer metric computation #34983

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BatchEncoding.to throws away columns silently, thus no way to pass non-tensor columns such as String in Trainer metric computation #34983

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions