-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update wndb2lmf to build Pre-3.0 WordNets #42
Conversation
Earlier versions of the Princeton WordNet did not include verb.Framestext, and they never changed across versions, so it's easier to just hard-code them than to load them from a file. The only potential issue I see is that this is content copied from the copyrighted WordNet documentation and there might not be enough attribution. I do link back to the documentation, so hopefully we're good there.
build_senseidx.py will create exact replicas of the index.sense files for WordNet 1.7 and higher versions. For WordNet 1.6, you can get close with the --use-adjposition option, but the counts for any sense key with an adjposition (a) or (p) after the head word of satellite adjective sense keys would need to be reset to 0. WordNet 1.5 did not have an index.sense file distributed with it.
- word = "Original_word" as in the WNDB data files - respaced = "Original word" with spaces instead of _ - lemma = "original_word" as in the WNDB index files
The frames were being sent to Wn's LMF in the 1.0 format and weren't being written in the 'subcat' attribute on senses. This is now fixed.
Also try to make it more robust for WN1.5
Not worth the trouble.
Part of #38
@fcbond Sorry for the large PR. All the lexicons (including the new ones) now pass validation: $ ./validate.sh 1.5
build/omw-1.5/omw-arb/omw-arb.xml - valid
build/omw-1.5/omw-bg/omw-bg.xml - valid
build/omw-1.5/omw-ca/omw-ca.xml - valid
build/omw-1.5/omw-cmn/omw-cmn.xml - valid
build/omw-1.5/omw-da/omw-da.xml - valid
build/omw-1.5/omw-el/omw-el.xml - valid
build/omw-1.5/omw-en15/omw-en15.xml - valid
build/omw-1.5/omw-en16/omw-en16.xml - valid
build/omw-1.5/omw-en171/omw-en171.xml - valid
build/omw-1.5/omw-en17/omw-en17.xml - valid
build/omw-1.5/omw-en20/omw-en20.xml - valid
build/omw-1.5/omw-en21/omw-en21.xml - valid
build/omw-1.5/omw-en30/omw-en30.xml - valid
build/omw-1.5/omw-en31/omw-en31.xml - valid
build/omw-1.5/omw-es/omw-es.xml - valid
build/omw-1.5/omw-eu/omw-eu.xml - valid
build/omw-1.5/omw-fi/omw-fi.xml - valid
build/omw-1.5/omw-fr/omw-fr.xml - valid
build/omw-1.5/omw-gl/omw-gl.xml - valid
build/omw-1.5/omw-he/omw-he.xml - valid
build/omw-1.5/omw-hr/omw-hr.xml - valid
build/omw-1.5/omw-id/omw-id.xml - valid
build/omw-1.5/omw-is/omw-is.xml - valid
build/omw-1.5/omw-it/omw-it.xml - valid
build/omw-1.5/omw-iwn/omw-iwn.xml - valid
build/omw-1.5/omw-ja/omw-ja.xml - valid
build/omw-1.5/omw-lt/omw-lt.xml - valid
build/omw-1.5/omw-nb/omw-nb.xml - valid
build/omw-1.5/omw-nl/omw-nl.xml - valid
build/omw-1.5/omw-nn/omw-nn.xml - valid
build/omw-1.5/omw-pl/omw-pl.xml - valid
build/omw-1.5/omw-pt/omw-pt.xml - valid
build/omw-1.5/omw-ro/omw-ro.xml - valid
build/omw-1.5/omw-sk/omw-sk.xml - valid
build/omw-1.5/omw-sl/omw-sl.xml - valid
build/omw-1.5/omw-sq/omw-sq.xml - valid
build/omw-1.5/omw-sv/omw-sv.xml - valid
build/omw-1.5/omw-th/omw-th.xml - valid
build/omw-1.5/omw-zsm/omw-zsm.xml - valid Feel free to review the whole thing if you have time, but otherwise please just pay attention to the changes to |
I forgot to have the non-English lexicons require Also note that I now have the |
@fcbond merging so we can move ahead. If you have |
Thanks. I have some changes to tsv2lmf.py, but have not had a chance to
give them a final check yet, sorry!
…On Fri, 24 Jan 2025 at 00:35, Michael Wayne Goodman < ***@***.***> wrote:
@fcbond <https://github.com/fcbond> merging so we can move ahead. If you
have tsv2lmf.py changes please do see what has changed here (similarly if
you have changes to wndb2lmf.py). Let me know if you need help with any
conflicts.
—
Reply to this email directly, view it on GitHub
<#42 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIPZRSK7RYHMVDNXAVSWS32MF4EVAVCNFSM6AAAAABQX3UBPCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJRGIZDQMBYGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Francis Bond <https://fcbond.github.io/>
|
See #38
Summary of changes:
wndb.py
modulebuild_senseidx.py
to rebuild anindex.sense
file from WordNet data/index/cntlist filesbuild.sh
to build all versions of WordNetindex.sense
files as appropriate for each WordNet (as discussed here)omw-en*
lexicon (summary of changes, include original README)wns/en30/
,wns/en31/
, andwns/pwn/
<Requires>
element on non-English lexicons to point toomw-en30:1.5
index.toml