Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing relations #40

Open
davletov-aa opened this issue Feb 19, 2020 · 4 comments
Open

Missing relations #40

davletov-aa opened this issue Feb 19, 2020 · 4 comments

Comments

@davletov-aa
Copy link

I found 266 examples (context-windows) which have tokens with root_ids marked as "0" and tag_id, say TXXX, but there are no tokens with root_id TXXX in example in train and dev set.

For example there is such T105 tokens:

data/source_txt/t3_physics_2_101.deft
TOKEN ROOT_ID TAG_ID RELATION
3161 -1 -1 0
. -1 -1 0
Another -1 -1 0
is -1 -1 0
what -1 -1 0
Democritus -1 -1 0
in -1 -1 0
particular -1 -1 0
believed -1 -1 0
— -1 -1 0
that -1 -1 0
there 0 T106 0
is 0 T106 0
a 0 T106 0
smallest 0 T106 0
unit 0 T106 0
that 0 T106 0
can 0 T106 0
not 0 T106 0
be 0 T106 0
further 0 T106 0
subdivided 0 T106 0
. -1 -1 0
Democritus -1 -1 0
called -1 -1 0
this T106 T194 Refers-To
the 0 T105 0
atom 0 T105 0

. -1 -1 0
We -1 -1 0
now -1 -1 0
know -1 -1 0
that -1 -1 0
atoms -1 -1 0
themselves -1 -1 0
can -1 -1 0
be -1 -1 0
subdivided -1 -1 0
, -1 -1 0
but -1 -1 0
their -1 -1 0
identity -1 -1 0
is -1 -1 0
destroyed -1 -1 0
in -1 -1 0
the -1 -1 0
process -1 -1 0
, -1 -1 0
so -1 -1 0
the -1 -1 0
Greeks -1 -1 0
were -1 -1 0
correct -1 -1 0
in -1 -1 0
a -1 -1 0
respect -1 -1 0
. -1 -1 0

@sashaspala
Copy link
Collaborator

Thanks for reporting - I'm looking into this now. It has to do with the fix we settled on for long distance relationships (i.e. Secondary Def --> Definition --> Term), which was to mark only the final tag in the relationship as the root, so that you would have relationships in the .deft files like this, where the Term is the root:
(Secondary Def, T1, T2, Supplements)
(Definition, T2, T3, Direct Defines)
(Term, T3, 0, 0)

@sashaspala
Copy link
Collaborator

I take it back - on inspection this is actually a problem with overlapping relationships. In this case, there was a referential-definition (this) that "refers-to" the definition (there is a smallest unit that cannot be further subdivided) and also "indirect-defines" the term (the atom). Someone brought this up in the forums yesterday and we're aware of the problem. I'm working on finding a fix right now that handles this scenario without undermining our existing data format.

sashaspala pushed a commit that referenced this issue Feb 29, 2020
…rlaps are now in repeated sentences, following same format for overlapping token tags. #40
@davletov-aa
Copy link
Author

Hi, there are still the problems with missing relations in train and dev sets (it seems I have an actual state of data, please check it):
{'data/source_txt/t3_physics_2_101.deft': {'T105',
'T109',
'T134',
'T145',
'T31'},
'data/source_txt/t6_sociology_1_101.deft': {'T125',
'T142',
'T58'},
'data/source_txt/t1_biology_1_505.deft': {'T189',
'T195',
'T241',
'T246',
'T282',
'T283',
'T72',
'T74',
'T86'},
'data/source_txt/t2_history_0_0.deft': {'T151',
'T162',
'T47',
'T81',
'T95'},
'data/source_txt/t6_sociology_0_101.deft': {'T76', 'T98'},
'data/source_txt/t2_history_2_101.deft': {'T111', 'T131'},
'data/source_txt/t7_government_1_101.deft': {'T103', 'T116'},
'data/source_txt/t7_government_1_404.deft': {'T13'},
'data/source_txt/t1_biology_0_303.deft': {'T129',
'T131',
'T176',
'T26',
'T296',
'T79',
'T82',
'T9',
'T94'},
'data/source_txt/t1_biology_1_404.deft': {'T113',
'T173',
'T194',
'T195',
'T223',
'T231',
'T36',
'T7'},
'data/source_txt/t5_economic_1_0.deft': {'T103',
'T140',
'T154',
'T50',
'T73',
'T89',
'T95'},
'data/source_txt/t1_biology_2_404.deft': {'T113',
'T150',
'T167',
'T205',
'T228',
'T295',
'T299',
'T42'},
'data/source_txt/t4_psychology_2_0.deft': {'T127',
'T204',
'T209',
'T232',
'T38'},
'data/source_txt/t3_physics_0_101.deft': {'T157', 'T174', 'T39'},
'data/source_txt/t7_government_0_303.deft': {'T20'},
'data/source_txt/t5_economic_0_202.deft': {'T137'},
'data/source_txt/t5_economic_1_202.deft': {'T47'},
'data/source_txt/t4_psychology_0_303.deft': {'T17'},
'data/source_txt/t7_government_1_0.deft': {'T16'},
'data/source_txt/t1_biology_2_606.deft': {'T207',
'T259',
'T28',
'T37',
'T59',
'T83'},
'data/source_txt/t4_psychology_1_0.deft': {'T123',
'T165',
'T200',
'T216',
'T221',
'T32'},
'data/source_txt/t2_history_2_0.deft': {'T146',
'T151',
'T179',
'T25',
'T53',
'T76'},
'data/source_txt/t7_government_1_303.deft': {'T13'},
'data/source_txt/t1_biology_1_303.deft': {'T105', 'T15', 'T86'},
'data/source_txt/t7_government_0_202.deft': {'T31', 'T35'},
'data/source_txt/t1_biology_0_101.deft': {'T131', 'T261', 'T82'},
'data/source_txt/t4_psychology_2_101.deft': {'T198', 'T31', 'T7'},
'data/source_txt/t4_psychology_0_202.deft': {'T102',
'T21',
'T35',
'T36',
'T83'},
'data/source_txt/t5_economic_0_101.deft': {'T1',
'T180',
'T7',
'T86'},
'data/source_txt/t2_history_1_0.deft': {'T110',
'T158',
'T23',
'T51',
'T69',
'T7'},
'data/source_txt/t1_biology_2_505.deft': {'T204', 'T229', 'T36'},
'data/source_txt/t6_sociology_0_0.deft': {'T147',
'T40',
'T54',
'T82'},
'data/source_txt/t1_biology_2_303.deft': {'T227', 'T36', 'T61'},
'data/source_txt/t1_biology_1_0.deft': {'T143',
'T177',
'T238',
'T27',
'T47',
'T80'},
'data/source_txt/t1_biology_0_0.deft': {'T103',
'T105',
'T109',
'T139',
'T151',
'T193',
'T211'},
'data/source_txt/t7_government_1_202.deft': {'T88', 'T97'},
'data/source_txt/t1_biology_2_101.deft': {'T127',
'T236',
'T243',
'T257',
'T261'},
'data/source_txt/t2_history_0_101.deft': {'T9', 'T95'},
'data/source_txt/t4_psychology_0_101.deft': {'T228',
'T248',
'T272',
'T28'},
'data/source_txt/t3_physics_1_101.deft': {'T113',
'T143',
'T212',
'T31',
'T74',
'T98'},
'data/source_txt/t3_physics_1_0.deft': {'T123',
'T126',
'T135',
'T152',
'T34',
'T43'},
'data/source_txt/t1_biology_0_202.deft': {'T101',
'T120',
'T151',
'T159',
'T169',
'T281',
'T292',
'T298',
'T314',
'T51',
'T52',
'T56',
'T6',
'T64',
'T70',
'T85'},
'data/source_txt/t5_economic_2_0.deft': {'T105',
'T168',
'T171',
'T63',
'T77',
'T89'},
'data/source_txt/t7_government_2_0.deft': {'T20',
'T31',
'T36',
'T6'},
'data/source_txt/t1_biology_1_606.deft': {'T127',
'T136',
'T18',
'T213',
'T230',
'T28',
'T89',
'T94',
'T99'},
'data/source_txt/t4_psychology_2_202.deft': {'T38'},
'data/source_txt/t7_government_2_202.deft': {'T31'},
'data/source_txt/t5_economic_2_101.deft': {'T65'},
'data/source_txt/t7_government_0_404.deft': {'T32', 'T36', 'T43'},
'data/source_txt/t1_biology_1_101.deft': {'T100',
'T180',
'T188',
'T254',
'T54',
'T55'},
'data/source_txt/t6_sociology_2_101.deft': {'T31'},
'data/source_txt/t3_physics_2_0.deft': {'T135',
'T182',
'T19',
'T8',
'T96'},
'data/source_txt/t2_history_1_101.deft': {'T72', 'T81'},
'data/source_txt/t1_biology_0_606.deft': {'T253', 'T3', 'T85'},
'data/source_txt/t1_biology_0_404.deft': {'T15',
'T159',
'T232',
'T246',
'T288',
'T346',
'T38',
'T62',
'T77',
'T9'},
'data/source_txt/t5_economic_0_0.deft': {'T145'},
'data/source_txt/t5_economic_2_202.deft': {'T140', 'T2', 'T93'},
'data/source_txt/t4_psychology_0_0.deft': {'T212',
'T4',
'T72',
'T78',
'T82'},
'data/source_txt/t1_biology_2_0.deft': {'T39',
'T59',
'T72',
'T98'},
'data/source_txt/t4_psychology_1_101.deft': {'T157',
'T178',
'T179',
'T189',
'T210'},
'data/source_txt/t1_biology_1_202.deft': {'T116',
'T16',
'T163',
'T172',
'T271',
'T30',
'T40',
'T57'},
'data/source_txt/t4_psychology_1_202.deft': {'T113',
'T155',
'T28',
'T4',
'T44'},
'data/source_txt/t7_government_0_101.deft': {'T72'},
'data/source_txt/t1_biology_2_202.deft': {'T194',
'T203',
'T230',
'T263',
'T77'},
'data/source_txt/t3_physics_0_0.deft': {'T29'},
'data/source_txt/t7_government_2_101.deft': {'T31'},
'data/source_txt/t7_government_2_303.deft': {'T7', 'T9'}}

sashaspala pushed a commit that referenced this issue Mar 11, 2020
@davletov-aa
Copy link
Author

davletov-aa commented Mar 11, 2020

And here a little bit of left examples:
{'data/source_txt/t1_biology_1_505.deft': {'T190',
'T195',
'T243',
'T246',
'T282',
'T283'},
'data/source_txt/t1_biology_0_303.deft': {'T129',
'T131',
'T176',
'T296',
'T78',
'T94'},
'data/source_txt/t1_biology_0_101.deft': {'T261'},
'data/source_txt/t4_psychology_0_101.deft': {'T228', 'T248'},
'data/source_txt/t5_economic_2_0.deft': {'T107', 'T78'}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants