-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing relations #40
Comments
Thanks for reporting - I'm looking into this now. It has to do with the fix we settled on for long distance relationships (i.e. Secondary Def --> Definition --> Term), which was to mark only the final tag in the relationship as the root, so that you would have relationships in the .deft files like this, where the Term is the root: |
I take it back - on inspection this is actually a problem with overlapping relationships. In this case, there was a referential-definition (this) that "refers-to" the definition (there is a smallest unit that cannot be further subdivided) and also "indirect-defines" the term (the atom). Someone brought this up in the forums yesterday and we're aware of the problem. I'm working on finding a fix right now that handles this scenario without undermining our existing data format. |
…rlaps are now in repeated sentences, following same format for overlapping token tags. #40
Hi, there are still the problems with missing relations in train and dev sets (it seems I have an actual state of data, please check it): |
And here a little bit of left examples: |
I found 266 examples (context-windows) which have tokens with root_ids marked as "0" and tag_id, say TXXX, but there are no tokens with root_id TXXX in example in train and dev set.
For example there is such T105 tokens:
data/source_txt/t3_physics_2_101.deft
TOKEN ROOT_ID TAG_ID RELATION
3161 -1 -1 0
. -1 -1 0
Another -1 -1 0
is -1 -1 0
what -1 -1 0
Democritus -1 -1 0
in -1 -1 0
particular -1 -1 0
believed -1 -1 0
— -1 -1 0
that -1 -1 0
there 0 T106 0
is 0 T106 0
a 0 T106 0
smallest 0 T106 0
unit 0 T106 0
that 0 T106 0
can 0 T106 0
not 0 T106 0
be 0 T106 0
further 0 T106 0
subdivided 0 T106 0
. -1 -1 0
Democritus -1 -1 0
called -1 -1 0
this T106 T194 Refers-To
the 0 T105 0
atom 0 T105 0
. -1 -1 0
We -1 -1 0
now -1 -1 0
know -1 -1 0
that -1 -1 0
atoms -1 -1 0
themselves -1 -1 0
can -1 -1 0
be -1 -1 0
subdivided -1 -1 0
, -1 -1 0
but -1 -1 0
their -1 -1 0
identity -1 -1 0
is -1 -1 0
destroyed -1 -1 0
in -1 -1 0
the -1 -1 0
process -1 -1 0
, -1 -1 0
so -1 -1 0
the -1 -1 0
Greeks -1 -1 0
were -1 -1 0
correct -1 -1 0
in -1 -1 0
a -1 -1 0
respect -1 -1 0
. -1 -1 0
The text was updated successfully, but these errors were encountered: