Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unedited dataset #12

Open
ypz-git opened this issue Feb 2, 2021 · 0 comments
Open

Unedited dataset #12

ypz-git opened this issue Feb 2, 2021 · 0 comments

Comments

@ypz-git
Copy link

ypz-git commented Feb 2, 2021

The Wikipedia dataset source is a little bit vague.

Quoting directly from the paper,

Wikipedia edits: this public dataset is one month of edits made
by edits on Wikipedia pages [3]. We selected the 1,000 most edited
pages as items and editors who made at least 5 edits as users (a total
of 8,227 users). This generates 157,474 interactions. Similar to the
Reddit dataset, we convert the edit text into a LIWC-feature vector.

Where [3] is https://meta.wikimedia.org/wiki/Data_dumps.

But I failed to find the original Wikipedia dataset via this website. Could you please tell how exactly to find your unedited dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant