Skip to content

SparseSpacyFeaturizer #29

@koaning

Description

@koaning

If you have a look at all the attributes that spaCy generates for their tokens then you can imagine that some of these features can be useful for machine learning pipelines. To name a few:

  • is_oov: is the token part of the vocabulary/does it have a vector?
  • is_stop: is the token a stopword?
  • lemma_: what is the lemma of the token
  • pos/tag coarse/fine-grained part of speech information
  • morphological features
  • grammatical dependency

These can all have a discrete representation and could be added in general to a Rasa pipeline.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions