Add punctuation in raw textual data #10420
-
I have raw data with me without punctuation mark. Is there any way to add punctuation in the data. Here is no end or start of sentence in data. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
spaCy doesn't have any kind of model that would add punctuation. In general, text generation / NLG is out of scope for spaCy. You could use the parser / senter to get sentence boundaries pretty easily. You might be able to do something with the parse tree, which I think might work on unpunctuated text, but it's probably better and less work to use a specialized model. In general I would expect this to modelled as a seq2seq problem. It looks like in academic literature this problem is sometimes referred to as "punctuation restoration". |
Beta Was this translation helpful? Give feedback.
-
Ok. Thanks for Response |
Beta Was this translation helpful? Give feedback.
-
@kamrapooja If you still need punctuation on text, check out something like https://huggingface.co/felflare/bert-restore-punctuation |
Beta Was this translation helpful? Give feedback.
spaCy doesn't have any kind of model that would add punctuation. In general, text generation / NLG is out of scope for spaCy.
You could use the parser / senter to get sentence boundaries pretty easily. You might be able to do something with the parse tree, which I think might work on unpunctuated text, but it's probably better and less work to use a specialized model.
In general I would expect this to modelled as a seq2seq problem. It looks like in academic literature this problem is sometimes referred to as "punctuation restoration".