Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

数据集问题 #20

Open
young-nlp opened this issue Nov 5, 2018 · 2 comments
Open

数据集问题 #20

young-nlp opened this issue Nov 5, 2018 · 2 comments

Comments

@young-nlp
Copy link

想问一下关于数据集的问题,这个数据集处理后得到的train有577088个句子,但论文里提到的是522611个句子,这是因为原本的数据集的train和test有部分的entity pair重复了。PCNN+ATT这篇论文过滤处理后得到是522611个句子的规模。但在Feng的源代码和您这个代码好像都是直接使用未过滤的。

@xuyanfu
Copy link
Owner

xuyanfu commented Nov 5, 2018

嗯嗯,确实是这样,没有进行过滤。会有一部分entity pair是重复的。

@zwd13122889
Copy link

您好,我想问下,为啥我下了之后,origin_data里的train.txt里没有数据,只有一个软件下载的网址,我想问下,我该如何找到训练数据集呢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants