Skip to content

full set of word senses missing in dictionary files?  #2

@yakazimir

Description

@yakazimir

I'm trying to rebuild your data, and noticed in the ALL.dict.xml (which, as I understand, contains all of the lemmas, glosses and word senses used in all the semeval data), you have entries such as the following:

<lexelt item="climate#n" pos="n" sence_count_wn="2" sense_count_corpus="1" word_example_count="5">
 <sense gloss="the weather in some location averaged over some long period of time" id="climate%1:26:00::" sense_example_count="5" sense_freq="5" synset="climate clime">
 </sense>
</lexelt>

Where climate#n is the lemma and pos. It says here that the sence_count_wn=2, however, there is only one sense inside of lexelt. Shouldn't there be all of the 2 sense entries inside of lexelt? My assumption is that each lexelt should have all of the different WN senses and glosses of the lemma listed in item.

I also notice that when I look up this word in nltk's wordnet (which I see that you also use), I get a different definition for climate%1:26:00:::

In [1]: from nltk.corpus import wordnet as wn 
In [2]: wn.synset_from_sense_key('climate%1:26:00::').definition()                         
'the prevailing psychological state'

## whereas your sense gloss seems to correspond to climate%1:26:01::
In [11]: wn.synset_from_sense_key('climate%1:26:01::').definition()                        
Out[11]: 'the weather in some location averaged over some long period of time'

In [13]: wn.get_version()                                                                   
Out[13]: '3.0'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions