Creating vrt files for cqpweb #4704
tianshuo
started this conversation in
New Features & Project Ideas
Replies: 1 comment 2 replies
-
I'd recommend looking at how textacy implements |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Feature description
We want to find process text into an annotated corpus and search on it using POS+regex. For example, if we want to search for "look|looking|looked for", we can use
"{look/V} for"
We already have been using spacy's pattern matching function but it is a bit slow(because we don't index anything) and we found out cqpweb is a great tool for both indexing, searching (it has a special syntax that works for us) and has a web UI. So the next step is we need to export everything into the xml-based .vrt files used in Cqpweb. See the instructions here: http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial.pdf
An example format:
Could the feature be a custom component or spaCy plugin?
Yes! This could be a spaCy plugin, where we can export our corpus as the .vrt file. Or we could even create a tool for importing managing, browsing and searching text in our corpus.
Beta Was this translation helpful? Give feedback.
All reactions