Masking class and separation of tokenizer_masking

### Is your feature request related to a problem? Please describe.

Currently masking of data happens in tokenizer_masking in batchify_source and batchify_target, and there is not a class currently for masking, which is needed to implement different masking strategies, vary them through training and so on.

### Describe the solution you'd like

The first solution is a bare bones implementation of this. Create a masker class, which implements a couple of simple masking strategies, and is instantiated in multi_stream_data_sampler before the tokenizer is instantiated. This class should take input data, and return the masked data. This needs to occur for both source and target data.

tokenizer_masking should then be adjusted to do only tokenisation, and no longer do masking at the same time.

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

### Organisation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Masking class and separation of tokenizer_masking #380

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Masking class and separation of tokenizer_masking #380

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Organisation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions