Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于”用CRF做中文命名实体识别“例子 #490

Open
Lufffya opened this issue Aug 18, 2022 · 1 comment
Open

关于”用CRF做中文命名实体识别“例子 #490

Lufffya opened this issue Aug 18, 2022 · 1 comment

Comments

@Lufffya
Copy link

Lufffya commented Aug 18, 2022

对应 task_sequence_labeling_ner_crf.py,有个地方没看懂,想请教一下苏神:

image

经过观察,categories 是一个['PER','LOC',‘ORG’] 三类别的List

不太理解,这个label为什么是这样设计的:
labels[start] = categories.index(label) * 2 + 1
labels[start + 1:end + 1] = categories.index(label) * 2 + 2

谢谢苏神!!!

@Thove
Copy link

Thove commented Aug 19, 2022

这个问题我可以尝试回答。因为在本案例中,作者将作为每个类别开头的token标记为了 index*2 + 1,也就是BIO中的B;而将作为每个类别中间的tokens都标记为了index 2 + 2,也就是BIO中的I,这一过程请参看class data_generator的代码。因此实际上的label包含同一个标签作为B 和 I 的两种情况,也就是32 = 6个标签,再加上O标签也就是什么也不属于的标签,一共是7种。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants