Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

对于数据集标注的一些问题 #73

Open
PeihanDou opened this issue Dec 29, 2021 · 1 comment
Open

对于数据集标注的一些问题 #73

PeihanDou opened this issue Dec 29, 2021 · 1 comment

Comments

@PeihanDou
Copy link

看了一下里面的标注,我发现有些样本是这样的
image
里面的“叶老桂”只标注了一次,而后一次则不算做标注。请问这是有意为之吗。我理解如果同一个实体出现多次,而只有第一次被标注,那么训练出的模型也只会关注首次出现的实体,从而对某些句子不能很好做出准确预测。是否应该对重复实体也进行重复标注呢?

@brightmart
Copy link
Member

照道理来说,应该都需要标注的。目前只能认为是数据标注需要改进,或者你那边可以用代码额外进行处理一下,把未标注的同样的实体也标注上。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants