Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a mapping between original mentions and unified mentions #6

Open
sdruskat opened this issue Mar 31, 2021 · 0 comments
Open
Labels
nice-to-have Something that would be nice to have by the time we finish, but that is not strictly required

Comments

@sdruskat
Copy link
Collaborator

We have to inherently create some sort of mapping between what the mentions originally looked like in CORD-19 (e.g., ['Statistical Package for Social Sciences (SPSS)', 'SPSS', 'SPSS Statistics'] and what they look like in a normalized fashion in our new dataset (e.g., SPSS).

It would probably be very useful for other projects that may reuse our dataset to also have access to the mapping. Therefore, it would be nice to provide this mapping in some consumable form.

@sdruskat sdruskat added the nice-to-have Something that would be nice to have by the time we finish, but that is not strictly required label Mar 31, 2021
@sdruskat sdruskat added this to the Habeas beautiful corpus milestone Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nice-to-have Something that would be nice to have by the time we finish, but that is not strictly required
Projects
None yet
Development

No branches or pull requests

1 participant