Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema: NER restriction #85

Open
matyaskopp opened this issue May 25, 2021 · 3 comments
Open

Schema: NER restriction #85

matyaskopp opened this issue May 25, 2021 · 3 comments
Labels
enhancement New feature or request
Milestone

Comments

@matyaskopp
Copy link
Collaborator

Current schema allows this situation:

<name type="LOC">
  <kinesic type="applause">
    <desc>Oklaski</desc>
  </kinesic>
</name>

<define name="ner_tokens">
<oneOrMore>
<choice>
<ref name="word"/>
<ref name="punct"/>
<ref name="ner"/>
<ref name="comment"/>
</choice>
</oneOrMore>
</define>

The schema should be restricted in this way:

  • every named entity should contain oneOrMore named entities or words.
  • And zeroOrMore comments

Related issue: #84

@matyaskopp matyaskopp added the enhancement New feature or request label May 25, 2021
@TomazErjavec
Copy link
Collaborator

I agree that should be restricted, but

  • Did you actually find such cases in the corpora? At least for the example that you gave, as far as I see, it doesn't exists in the PL corpus. I would be surprised if it did exist, as incidents were exceluded from annotation, so the system would in fact be annotating an empty string as NER
  • It will make the content model more complicated, in fact I'm not really sure how to impletement such a restriction, would have to study RelaxNG first.

Not saying I won't do it, just maybe not straight away.

@matyaskopp
Copy link
Collaborator Author

Did you actually find such cases in the corpora

No, I have built it based on the wrongly understood example from #84

IIt will make the content model more complicated, in fact I'm not really sure how to impletement such a restriction, would have to study RelaxNG first.

I don't know either. (CZ NER already made schema quite complicated...)

Not saying I won't do it, just maybe not straight away.

Ok, let's keep this issue for the next releases

@TomazErjavec TomazErjavec added this to the next milestone May 25, 2021
@TomazErjavec TomazErjavec modified the milestones: next, ParlaMint 3.1 release Jun 1, 2023
@TomazErjavec
Copy link
Collaborator

This is obviously "future"....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants