-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BE feedback #496
Comments
First the easy ones:
Cf the next comments for the more complex issues. |
I am not satisfied with this encoding, but I do not see a satisfactory alternative without altering the ParlaMint scheme. |
We have tried to do this as much as possible now. When categories we need are missing from the common taxonomy, we add a -BE file with the supplementary categories. In some cases, this could be merged into the common ontology.
|
Does this mean that you are unsure if it is an utterance
yes, taking CZ taxonomy is ok. But for UD-SYN taxonomy, it is better to use this: https://github.com/clarin-eric/ParlaMint/blob/main/Data/Taxonomies/ParlaMint-taxonomy-UD-SYN.ana.xml Lines 589 to 594 in 5deaeed
As for the
Yes, the taxonomy is limited, but it is as it is defined in ParlaMint.
If you want to extend this taxonomy, I guess you should create a new one as you did. But if the minister speaker is seeking, then you should use both taxonomies. (I hope this will not break @TomazErjavec script): <u ana="#regular #minister" ...> But remember that this categorization is speaker categorization, so if someone holds a minister position, it does not necessarily mean that he is speaking as a minister (not a regular MP) - in CZ, we are not able to distinguish this from the transcription.
https://github.com/clarin-eric/ParlaMint/blob/main/Data/Taxonomies/ParlaMint-taxonomy-UD-SYN.ana.xml taxonomy cover these situations: ParlaMint/Data/Taxonomies/ParlaMint-taxonomy-UD-SYN.ana.xml Lines 1210 to 1213 in 5deaeed
|
I agree,
I don't think so, if we are taliking about ParlaMint-taxonomy-parla.legislature(.xml) , that one contains much more the just plenary speech transcription classification. I would be in favour of adding BE categories in the common taxonomy, as long as they are nicely positioned in it. Didn't have a look yet at your taxonomy though. @matyaskopp, do you see a problem here? |
I can imagine that we can extend
|
Multipe speaker types indeed break the validation:
We interpreted 'regular' as "speaking as member of parliament". If a person holds a minister post at the time of speaker, he/she is not speaking as member of parliament. But I can map our current speaker types to parlamint, assuming that parliament members, ministers, prime ministers, secretaries of states are "regulars", and the rest are "guest" (incidental speakers)? |
Summarizing:
|
Yes, it will probably be the best. We need all corpora to be comparable... Thank |
I agree with @matyaskopp, all speakers are regular speakers (like MPs, ministers, prime minister), except invited guests, who are not affiliated with the parliament of government. Adding "#minister" would be redundant anyway, we know somebody is a minister given their affiliation and resolving the affiliation to and from with regard to when a person is speaking.
I think As for the taxonomy, I would need to find some quality time to understand the whole thing, which I can't seem to find, sigh. Maybe the weekend... |
invalid url format
<idno type="URI">https://www.dekamer.be/kvvcr/showpage.cfm?section=/cricra
&
language=nl
&
cfm=dcricra.cfm?type=plen
&
cricra=cri
&
count=all</idno> |
speeches misclassification
I still don't understand why there are a lot of speeches misclassification. From my point of view (without language knowledge) HTML classes, elements and other attributes can be used. Describing this: https://github.com/JessedeDoes/ParlaMint/blob/32213d529bbbb2b28ced35d2a7bfb74c2ba9edd1/Data/ParlaMint-BE/ParlaMint-BE_2021-03-30-definitief-55-commissie-ic427x.xml#L174-L177
<p class=italNL><a name=T016></a><span lang=NL>Het incident is gesloten.</span></p>
<p class=italFR><span lang=FR-BE>L'incident est clos.</span></p>
<p class=MsoNormal>...</p>
<p class=Titre2NL>... so only the beginning of the meeting and new topic before the first speech can contain unclassified notes or you can classify them as There are also chairman speeches that do not follow upper rules, but you have correctly identified them. notes do not contain
|
|
@JessedeDoes, in 77e8d95 I've added parla.meeting.committee to the general taxonomy. I'm not absolutely sure if the category belongs where I put it but it might be good enough for now. So, could you copy the new category into your general ParlaMint-taxonomy-parla.legislature taxonomy and remove you additinal taxonomy pls? ParlaMint/Data/Taxonomies/ParlaMint-taxonomy-parla.legislature.xml Lines 225 to 233 in 77e8d95
|
I have just a few observations:
Responsibility for lingv. annotations in TEI version
https://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE.xml#L17-L21
Wrong date
https://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE.xml#L70
Taxonomy fusion
You have invented some new taxonomies, and some common ones are modified. It is needed to unify this in v3.1
EG, you used new categories in
parla.legislature
You can check CZ folder for how common taxonomies should look.
wrong idno type
please follow the recommendation here: https://clarin-eric.github.io/ParlaMint/#TEI.idno
https://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE.xml#L408
should be
settingDesc date in corpus root files
ana="#parla.sitting"
from corpus root filesThe
date
should contain full corpus periodhttps://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE.xml#L392
speaker note before speech
type="speaker"
It is common to have a speaker note before a speech - it is not a part of the speech.
https://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE_2021-09-22-definitief-55-commissie-ic577x.xml#L108
should be
missing parts of transcriptions
https://github.com/JessedeDoes/ParlaMint/blob/1f0a9d3ef52e8a2aad8b3733dc1cc742bce4f0fe/Data/ParlaMint-BE/ParlaMint-BE_2021-09-22-definitief-55-commissie-ic577x.xml#L494-L497
missing notes
There are a lot of notes like this:
Which is missing in component files
The text was updated successfully, but these errors were encountered: