Benchmarks for Danish document embeddings #196
KennethEnevoldsen
started this conversation in
Missing pieces for Danish NLP
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Statement of need
Multiple new and old tools seeks improve the representation of documents to better retrieve documents for search or for generative systems such as RAGs. It is currently unknown how well models perform on Danish for this specific task. Constructing a reasonable benchmark allow for meaningful development and selection of document encoders for Danish (be it multilingual or not).
Current state
A benchmark is currently being constructed at:
https://github.com/KennethEnevoldsen/scandinavian-embedding-benchmark
How to contribute:
If you wish to contribute to this task please see the associated GitHub for the project.
Beta Was this translation helpful? Give feedback.
All reactions