Skip to content

Conversation

@ulysses4ever
Copy link
Contributor

this superceeds #87

I took #87, rebased it on the current master, ran recode as suggested in the readme, and ran ./gradlew run. I can see new data through ui/index.html locally.

@msridhar msridhar merged commit 90d44b5 into pcminer-tools:master Sep 13, 2025
2 checks passed
@msridhar
Copy link
Collaborator

Thanks!

@msridhar msridhar mentioned this pull request Sep 13, 2025
@robbertkrebbers
Copy link
Contributor

Thank you very much for rebasing my MR. I totally forgot about doing that myself.

@ulysses4ever
Copy link
Contributor Author

@robbertkrebbers no problem, thanks for preparing the initial PR! It's a pity that pcminer requires HTML entities instead of Unicode I think. It'd be great to teach it UTF-8 one day @msridhar...

@msridhar
Copy link
Collaborator

@robbertkrebbers no problem, thanks for preparing the initial PR! It's a pity that pcminer requires HTML entities instead of Unicode I think. It'd be great to teach it UTF-8 one day @msridhar...

PRs welcome! 🙂 I believe DBLP using HTML encoding so the code is just trying to make it easier to match that. Some kind of normalization of everything would of course be great

@ulysses4ever
Copy link
Contributor Author

ulysses4ever commented Sep 16, 2025

Right. I suspected that the dblp html pages are the root of this issue. But there are good news: unlike researchr, dblp actually entered 2020s and exposes machine-readable interfaces. E.g. the whole database is available as one XML file: https://dblp.org/faq/How+can+I+download+the+whole+dblp+dataset.html There's also a search API but Google AI is telling me that it's "less reliable" (which makes sense for a search).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants