-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update wndb2lmf to build Pre-3.0 WordNets #42
Open
goodmami
wants to merge
21
commits into
main
Choose a base branch
from
gh-38-older-pwn-versions
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Earlier versions of the Princeton WordNet did not include verb.Framestext, and they never changed across versions, so it's easier to just hard-code them than to load them from a file. The only potential issue I see is that this is content copied from the copyrighted WordNet documentation and there might not be enough attribution. I do link back to the documentation, so hopefully we're good there.
build_senseidx.py will create exact replicas of the index.sense files for WordNet 1.7 and higher versions. For WordNet 1.6, you can get close with the --use-adjposition option, but the counts for any sense key with an adjposition (a) or (p) after the head word of satellite adjective sense keys would need to be reset to 0. WordNet 1.5 did not have an index.sense file distributed with it.
- word = "Original_word" as in the WNDB data files - respaced = "Original word" with spaces instead of _ - lemma = "original_word" as in the WNDB index files
The frames were being sent to Wn's LMF in the 1.0 format and weren't being written in the 'subcat' attribute on senses. This is now fixed.
Also try to make it more robust for WN1.5
Not worth the trouble.
Part of #38
This was
linked to
issues
Jan 18, 2025
@fcbond Sorry for the large PR. All the lexicons (including the new ones) now pass validation: $ ./validate.sh 1.5
build/omw-1.5/omw-arb/omw-arb.xml - valid
build/omw-1.5/omw-bg/omw-bg.xml - valid
build/omw-1.5/omw-ca/omw-ca.xml - valid
build/omw-1.5/omw-cmn/omw-cmn.xml - valid
build/omw-1.5/omw-da/omw-da.xml - valid
build/omw-1.5/omw-el/omw-el.xml - valid
build/omw-1.5/omw-en15/omw-en15.xml - valid
build/omw-1.5/omw-en16/omw-en16.xml - valid
build/omw-1.5/omw-en171/omw-en171.xml - valid
build/omw-1.5/omw-en17/omw-en17.xml - valid
build/omw-1.5/omw-en20/omw-en20.xml - valid
build/omw-1.5/omw-en21/omw-en21.xml - valid
build/omw-1.5/omw-en30/omw-en30.xml - valid
build/omw-1.5/omw-en31/omw-en31.xml - valid
build/omw-1.5/omw-es/omw-es.xml - valid
build/omw-1.5/omw-eu/omw-eu.xml - valid
build/omw-1.5/omw-fi/omw-fi.xml - valid
build/omw-1.5/omw-fr/omw-fr.xml - valid
build/omw-1.5/omw-gl/omw-gl.xml - valid
build/omw-1.5/omw-he/omw-he.xml - valid
build/omw-1.5/omw-hr/omw-hr.xml - valid
build/omw-1.5/omw-id/omw-id.xml - valid
build/omw-1.5/omw-is/omw-is.xml - valid
build/omw-1.5/omw-it/omw-it.xml - valid
build/omw-1.5/omw-iwn/omw-iwn.xml - valid
build/omw-1.5/omw-ja/omw-ja.xml - valid
build/omw-1.5/omw-lt/omw-lt.xml - valid
build/omw-1.5/omw-nb/omw-nb.xml - valid
build/omw-1.5/omw-nl/omw-nl.xml - valid
build/omw-1.5/omw-nn/omw-nn.xml - valid
build/omw-1.5/omw-pl/omw-pl.xml - valid
build/omw-1.5/omw-pt/omw-pt.xml - valid
build/omw-1.5/omw-ro/omw-ro.xml - valid
build/omw-1.5/omw-sk/omw-sk.xml - valid
build/omw-1.5/omw-sl/omw-sl.xml - valid
build/omw-1.5/omw-sq/omw-sq.xml - valid
build/omw-1.5/omw-sv/omw-sv.xml - valid
build/omw-1.5/omw-th/omw-th.xml - valid
build/omw-1.5/omw-zsm/omw-zsm.xml - valid Feel free to review the whole thing if you have time, but otherwise please just pay attention to the changes to |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #38
Summary of changes:
wndb.py
modulebuild_senseidx.py
to rebuild anindex.sense
file from WordNet data/index/cntlist filesbuild.sh
to build all versions of WordNetindex.sense
files as appropriate for each WordNet (as discussed here)omw-en*
lexicon (summary of changes, include original README)wns/en30/
,wns/en31/
, andwns/pwn/