Skip to content

Add punctuation in raw textual data #10420

Discussion options

You must be logged in to vote

spaCy doesn't have any kind of model that would add punctuation. In general, text generation / NLG is out of scope for spaCy.

You could use the parser / senter to get sentence boundaries pretty easily. You might be able to do something with the parse tree, which I think might work on unpunctuated text, but it's probably better and less work to use a specialized model.

In general I would expect this to modelled as a seq2seq problem. It looks like in academic literature this problem is sometimes referred to as "punctuation restoration".

Replies: 3 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by adrianeboyd
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training Training and updating models
3 participants