Skip to content

Wrong TRF alignment indices #10794

May 12, 2022 · 1 comments · 3 replies
Discussion options

You must be logged in to vote

This is a little unexpected, but not directly a bug. It's allowed for transformer tokens to align to more than one spacy token.

To support both slow and fast transformers tokenizers in the same way in spacy-transformers, we're using a generic alignment algorithm (from spacy-alignments) to align the transformer tokens with the spacy tokens. For example, the tokens that are being aligned look like this:

  • spacy: ['He', 'is', 'the', 'recipient', 'of', 'multiple', 'accolades', ',', 'including', 'a', 'Golden', 'Globe', 'Award']
  • transformer: ['<s>', 'He', 'Ġis', 'Ġthe', 'Ġrecipient', 'Ġof', 'Ġmultiple', 'Ġaccol', 'ades', ',', 'Ġincluding', 'Ġa', 'ĠGolden', 'ĠGlobe', 'ĠAward', '</s>']

Unfortunat…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@k-sap
Comment options

@adrianeboyd
Comment options

@k-sap
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / transformer Feature: Transformer
3 participants
Converted from issue

This discussion was converted from issue #10791 on May 13, 2022 07:12.