Cannot use nlp.sentences results in deep-learning training

When training a TrainableComponent on GPU, trying to use span.sent does not work even with eds.sentences() in the nlp pipeline. 
When setting `context_getter=lambda span : span.sent` in the trainable component, the training stops with the error : 
```python
ValueError: [E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: 
`nlp.add_pipe('sentencizer')`. Alternatively, add the dependency parser or sentence recognizer, or set sentence 
boundaries by setting `doc[i].is_sent_start`.
```



## How to reproduce the bug



```python
# Pipeline definition
nlp = edsnlp.blank("eds")
nlp.add_pipe(eds.sentences())
nlp.add_pipe(
    eds.span_classifier(
        embedding=eds.span_pooler(
            pooling_mode="mean",
            embedding=eds.transformer(
                model="prajjwal1/bert-tiny",
            ),
        ),
        span_getter=["ents", "sc"],
        attributes=[
            "_.negation",
        ],
        context_getter=lambda span : span.sent,
    ),
    name="span_classifier",
)

training_data = (
        ...
)

nlp.post_init(training_data)
device = "cuda" if torch.cuda.is_available() else "cpu"
batches = (
    training_data.loop()
    .shuffle("dataset")
    .map(nlp.preprocess, kwargs={"supervision": True})
    .batchify(batch_size=32 * 128, batch_by=stat_batchify("tokens"))
    .map(nlp.collate, kwargs={"device": device})
)
batches = batches.set_processing(num_cpu_workers=1, process_start_method="spawn")
# Move the model to the GPU
nlp.to(device)

optimizer = torch.optim.AdamW(
    params=nlp.parameters(),
    lr=3e-4,
)

iterator = iter(batches)

for step in range(max_steps):
    batch = next(iterator)
    optimizer.zero_grad()
    with nlp.cache():
        loss = torch.zeros((), device=device)
        for name, component in nlp.torch_components():
            output = component(batch[name])
            if "loss" in output:
                loss += output["loss"]
        loss.backward()
        optimizer.step()
```

From my understanding, `nlp.to(device)` sends only the trainable components to the GPU, and because of that eds.sentences is never actually called on the training data. I imagine it would be possible to work around this problem by defining training_data with sentences already set, by running another pipeline on it before defining it as the training data for our training loop. However I have not really tried that yet since issue #426 is another problem around the use of functions as context_getter, for which I have not found a solution for now.

## Your Environment




- Python Version Used: 3.7.16
- EDS-NLP Version Used: 0.17.2
- spaCy: 3.7.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cannot use nlp.sentences results in deep-learning training #427

How to reproduce the bug

Your Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot use nlp.sentences results in deep-learning training #427

Description

How to reproduce the bug

Your Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions