-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
I'm trying to rebuild your data, and noticed in the ALL.dict.xml (which, as I understand, contains all of the lemmas, glosses and word senses used in all the semeval data), you have entries such as the following:
<lexelt item="climate#n" pos="n" sence_count_wn="2" sense_count_corpus="1" word_example_count="5">
<sense gloss="the weather in some location averaged over some long period of time" id="climate%1:26:00::" sense_example_count="5" sense_freq="5" synset="climate clime">
</sense>
</lexelt>
Where climate#n is the lemma and pos. It says here that the sence_count_wn=2
, however, there is only one sense inside of lexelt. Shouldn't there be all of the 2 sense entries inside of lexelt
? My assumption is that each lexelt should have all of the different WN senses and glosses of the lemma listed in item
.
I also notice that when I look up this word in nltk's wordnet (which I see that you also use), I get a different definition for climate%1:26:00::
:
In [1]: from nltk.corpus import wordnet as wn
In [2]: wn.synset_from_sense_key('climate%1:26:00::').definition()
'the prevailing psychological state'
## whereas your sense gloss seems to correspond to climate%1:26:01::
In [11]: wn.synset_from_sense_key('climate%1:26:01::').definition()
Out[11]: 'the weather in some location averaged over some long period of time'
In [13]: wn.get_version()
Out[13]: '3.0'
Metadata
Metadata
Assignees
Labels
No labels