Relation Extraction (RE) is the procedure used to detect the relations between various entities in (unstructured/unlabelled) texts. RE is a very actively researched field and there have been a lot of very interesting papers and promising algorithms proposed in the recent years along with a multitude of high quality datasets.
This repositories goal is to provide an overview of the current research challenges and how they are adressed.
If some links appear broken, you can feel free to update that link and issue a pull request. (Or you can just notify me, that's fine too.)
- (Bach and Badaskar, 2007) A Review of Relation Extraction
- (de Abreu et al., 2013) A review on Relation Extraction with an eye on Portuguese
- (Konstantinova, 2014) Review of Relation Extraction Methods: What is New Out There?
- (Asghar, 2016) Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey
- (Kumar, 2017) A Survey of Deep Learning Methods for Relation Extraction
- (Pawar et al., 2017) Relation extraction: A survey
- (Cui et al., 2017) A Survey on Relation Extraction
- (Chakraborty et al., 2019) Introduction to Neural Network based Approaches for Question Answering over Knowledge Graphs
- (Han et al., 2020) More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction
- (Fu et al., 2020) A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges
- (Yang et al., 2021) A Survey on Extraction of Causal Relations from Natural Language Text
- (Nayak et al., 2021) Deep Neural Approaches to Relation Triplets Extraction: A Comprehensive Survey
- (Wang et al., 2021) Deep Neural Network Based Relation Extraction: An Overview
- (Aydar et al., 2021) Neural Relation Extraction: A Review
- (Pawar et al., 2021) Techniques for Jointly Extracting Entities and Relations A Survey
- (Lan et al., 2021) A Survey on Complex Knowledge Base Question Answering: Methods, Challenges and Solutions
- DBpedia Website / GitHub / Paper
- Freebase Website / DEPRECATED / Paper
- YAGO Website / Latest Release / Paper
- Wikidata Website / Paper
Here is a distribution of some of the most used datasets showing their usage frequency in over 550 papers.
If you created a new dataset or found something missing, please don't hesitate to create a pull request to add it here.
-
Datasets for Semantic Parsing
-
Datasets for Information Retrieval
- SimpleQuestions Paper / Repository
- WebQuestions Paper / Website
- ComplexQuestions (unfortunately, there are 2 datasets with the same name ComplexQuestions)
- ComplexQuestions (sometimes referred to as CompQ) Paper / Repository
- ComplexQuestions Paper / Website – Note that the dataset was provided by a different author
- MetaQA Paper / Repository
-
Datasets for Reinforcement Learning
- UMLS Paper / Repository (MINERVA Repository)
- NELL-995 Paper / Repository (MINERVA Repository)
- Kinship Paper / Repository (MINERVA Repository)
- FB15K-237 Paper (Original FB15K) / Paper (FB15K-237 Variant) / Download (FB15K-237)
- WN18RR Paper / Repository
- Countries Paper / Repository (MINERVA Repository)
-
Datasets for Hybrid KGQA
When reporting the results of your approach, make sure to be as precise as possible. You would be surprised, how many papers report ambiguous results. If your approach outperforms everyone else for a certain benchmark, make sure to mark it bold.
You should be familiar with the following report metrics, but if you are not, here's a short recap:
TP = True Positive
FP = False Positive
TN = True Negatives
FN = False Negatives
P = Precision
Measures, how many of your true predictions did you get right.
Formula: P = TP / (TP + FP)
R = Recall
How many positive labels did you find out of all the positive labels that exist
Formula = R = TP / (TP + FN)
F = F1
Harmonic mean of precision and recall
F1 = 2 * P * R / (P + R)
= 2 * TP / (2 * TP + FP + FN)
RE = Relation Extraction Subtask
This metric refers solely to the RE subtask, i.e. how well can you find the correct relations. This metric is different from E2E.
E2E = End to End
This metric shows the result of running your algorithm end to end on the dataset's test set. End to end means the whole process from start to finish.
1. QALD-Series
2. LC-QuAD
3. FreebaseQA
6. Free917
8. MetaQA
9. PathQuestion
10. MSF
11. NYT
14. KBC
15. PQA
QALD-5 | |
---|---|
HCqa (Asadifar et al., 2019)* | P = 0.7 R = 1.0 F = 0.81 |
*) Tested only on 10 questions
QALD-6 | |
---|---|
HCqa (Asadifar et al., 2019)* | P = 0.42 R = 0.42 F = 0.52 |
*) Tested only on 25 questions
QALD-7 | |
---|---|
SLING (Mihindukulasooriya et al., 2020) | P = 0.57 R = 0.76 F = 0.65 |
EARL (Dubey et al., 2018) | RE = 0.47 |
GGNN (Sorokin and Gurevych, 2018) | P = 0.2686 R = 0.3179 F = 0.2588 |
QALD-9 | |
---|---|
SLING (Mihindukulasooriya et al., 2020) | P = 0.50 R = 0.64 F = 0.56 |
LC-QuAD 1 | |
---|---|
SLING (Mihindukulasooriya et al., 2020) | P = 0.41 R = 0.44 F = 0.48 |
EARL (Dubey et al., 2018) | RE = 0.36 |
FreebaseQA (Paper / Repository ) | |
---|---|
Retrieve and Re-rank (Wang et al., 2021) | E2E = 0.517 |
SimpleQuestions | |
---|---|
AdvT-MMRD (Zhang et al., 2020) | RE = 0.938 E2E = 0.790 |
MLTA (Wang et al., 2019) | RE = 0.824 |
Question Matching (Abolghasemi et al., 2020) | RE = 0.9341 |
Relation Splitting (Hsiao et al., 2017) | E2E = 0.767 |
KSA-BiGRU (Zhu et al., 2019) | P = 0.867 R = 0.848 F = 0.849 E2E = 0.731 |
Alias Matching (Buzaaba and Amagasa, 2021) | RE = 0.8288 E2E = 0.7464 |
Synthetic Data (Sidiropoulos et al., 2020) | RE* (unseen domain) = 0.7041 E2E (seen domain) = 0.77 E2E* (unseen domain) = 0.6657 |
Transfer Learning with BERT (Lukovnikov et al., 2020) | RE = 0.836 E2E = 0.773 |
Retrieve and Re-rank (Wang et al., 2021) | E2E = 0.797 |
HR-BiLSTM (Yu et al., 2017) | RE = 0.933 E2E = 0.787 |
Multi-View Matching (Yu et al., 2018) | RE = 0.9375 |
*) Average of Micro + Macro
SimpleQuestions-Balanced (Paper / Repository) | |
---|---|
HR-BiLSTM (Yu et al., 2017) | RE* (seen) = 0.891 RE*(unseen) = 0.412 RE*(seen+unseen avg.) = 0.673 |
Representation Adapter (Wu et al., 2019) | RE* (seen) = 0.8925 RE*(unseen) = 0.7515 RE*(seen+unseen avg.) = 0.83 |
*) Average of Micro + Macro
WebQuestions | |
---|---|
Support Sentences (Li et al., 2017) | P = 0.572 R = 0.396 F = 0.382 E2E = 0.423 |
QARDTE (Zheng et al., 2018) | P = 0.512 R = 0.613 F = 0.558 RE = 0.843 |
HybQA (Mohamed et al., 2017) | F = 0.57 |
WebQuestionsSP | |
---|---|
HR-BiLSTM (Yu et al., 2017) | RE = 0.8253 |
UHOP (Chen et al., 2019) (w/ HR-BiLSTM) | RE = 0.8260 |
OPQL (Sun et al., 2021) | RE = 0.8540 E2E = 0.519 |
Multi-View Matching (Yu et al., 2018) | RE = 0.8595 |
Masking Mechanism (Chen et al., 2018) | RE = 0.77 |
WebQuestionsSP-WD (Paper / Repository) | |
---|---|
GGNN (Sorokin and Gurevych, 2018) | P = 0.2686 R = 0.3179 F = 0.2588 |
Free917 (Original Paper / Data) | |
---|---|
QARDTE (Zheng et al., 2018) | P = 0.683 R = 0.679 F = 0.663 |
ComplexQuestions | |
---|---|
HCqa (Asadifar et al., 2019) | F = 0.536 |
MetaQA | |
---|---|
OPQL (Sun et al., 2021) | E2E (2-Hop) = 0.885 E2E (3-Hop) = 0.871 |
RDAS (Wang et al., 2021) | E2E (1-Hop) = 0.991 E2E (2-Hop) = 0.97 E2E (3-Hop) = 0.856 |
Incremental Sequence Matching (Lan et al., 2019) | F = 0.981 E2E (1-Hop) = 0.963 E2E (2-Hop) = 0.991 E2E (3-Hop) = 0.996 |
PathQuestion (Paper / Repository) | |
---|---|
Incremental Sequence Matching (Lan et al., 2019) | F = 0.96 E2E* = 0.967 |
RDAS (Wang et al., 2021) | E2E (2-Hop) = 0.736 E2E (3-Hop) = 0.910 |
*) 2-Hop and 3-Hop mixed
MSF (Paper / Repository) | |
---|---|
OPQL (Sun et al., 2021) | E2E (2-Hop) = 0.492 E2E (3-Hop) = 0.297 |
NYT (Paper / Data) | |
---|---|
Deep RL (Qin et al., 2018) | F* = 0.778 |
ReQuest (Wu et al., 2017) | P = 0.404 R = 0.48 F = 0.439 |
*) Average
ComplexWebQuestions | |
---|---|
OPQL (Sun et al., 2021) | E2E = 0.407 |
OpenBookQA | |
---|---|
MHGRN (Feng et al., 2020) | E2E = 0.806 |
QA-GNN (Yasunaga et al., 2021) | E2E = 0.828 |
CommonsenseQA | |
---|---|
MHGRN (Feng et al., 2020) | E2E = 0.765 |
QA-GNN (Yasunaga et al., 2021) | E2E = 0.761 |
Kinship | |
---|---|
MINERVA (Das et al., 2018) | E2E = 0.605 |
Reward Shaping (Lin et al., 2018) | E2E = 0.811 |
UMLS | |
---|---|
MINERVA (Das et al., 2018) | E2E = 0.728 |
Reward Shaping (Lin et al., 2018) | E2E = 0.902 |
Countries | |
---|---|
MINERVA (Das et al., 2018) | E2E* = 0.9582 |
*) Average of S1, S2 and S3
WN18RR | |
---|---|
MINERVA (Das et al., 2018) | E2E = 0.413 |
Reward Shaping (Lin et al., 2018) | E2E = 0.437 |
FB15K-237 | |
---|---|
MINERVA (Das et al., 2018) | E2E = 0.217 |
Reward Shaping (Lin et al., 2018) | E2E = 0.329 |
NELL-995 | |
---|---|
MINERVA (Das et al., 2018) | E2E = 0.663 |
Reward Shaping (Lin et al., 2018) | E2E = 0.656 |
KBC (Paper / Repository) | |
---|---|
ROP (Yin et al., 2018) | E2E* = 0.7616 |
*) Here: the mean average precision
PQA (Paper / Repository) | |
---|---|
ROP (Yin et al., 2018) | E2E = 0.907 |
For each solution to a challenge, a short description is provided. If you write a paper, that deals with these challenges, you can create a pull request and add a link to your paper with a short description of the paper. If it fits to no challenge provided here, you may create a new entry and add your paper there. Make sure to add a little description of the new challenge that you added.
Table of Contents
1. Lexical Gap
The lexical gap problems refer to the situation in which the expression of a relation differs in how they are represented in a KB (this problem is also related to the relation linking problem). When faced with the question where was Angela Merkel born? the corresponding relation "birthPlace" does not appear in the question. This means that exact matching procedures would fail in this situation, requiring the usage of a different, softer matching mechanism.
- SLING (Mihindukulasooriya et al., 2020)
- Integrate abstract meaning representation to increase question understanding
- AdvT-MMRD (Zhang et al., 2020)
- Use semantic and literal question-relation matching and incorporate entity type information with adversarial training
- MLTA (Wang et al., 2019)
- Similarity computation between the question and relation candidates on multiple levels using an attention mechanism
- Support Sentences (Li et al., 2017)
- Enrich candidate pairs with support sentences from an external source
- Question Matching (Abolghasemi et al., 2020)
- Find the most matching question to the input question
One of the most known problems in KGQA is that KGs are incomplete (Min et al., 2013), i.e. certain relations or entities are missing, which is natural considering how vast and complex the body of human knowledge is (and that it keeps growing daily). This problem is especially evident in highly technical and specialised areas.
- OPQL (Sun et al., 2021)
- Construct a virtual knowledge base
- MINERVA (Das et al., 2018)
- Infer missing knowledge using RL
- Reward Shaping (Lin et al., 2018)
- Improve reward mechanism of MINERVA
- ROP (Yin et al., 2018)
- Predict KG paths using an RNN to infer new information
A difficult challenge for QA systems to overcome is the ambiguity of natural language. The problem here is, that certain relations may have the same name but a different meaning depending on the context. An example on a KB level (taken from Hsiao et al., 2017) would be the Freebase relation genre which both appears in the context of film.film.genre as well as music.artist.genre.
- Relation Splitting (Hsiao et al., 2017)
- Further split a relation into its type and property
- KSA-BiGRU (Zhu et al., 2019)
- Computing a probability distribution for every relation
- Alias Matching (Buzaaba and Amagasa, 2021)
- Match alias from question with KB and pick most likely relation
- EARL (Dubey et al., 2018)
- Perform entity and relation linking jointly
- HR-BiLSTM (Yu et al., 2017)
- Use an hierarchical BiLSTM model and entity re-ranking
In some domains, training data is sparse and typically involves manual human labour to annotate correctly. This process is, however, very time consuming and therefore not scalable. To overcome this problem, distant supervision (DS) was proposed, which is able to automatically generate training data. The problem with using DS is that the resulting training data can be very noisy, which in turn degrades the model's performance when trained on that data.
- ReQuest (Wu et al., 2018)
- Use indirect supervision from external QA corpus
- Deep RL (Qin et al., 2018)
- Use a policy-based RL agent to find false positives
The main idea of this research challenge is that subgraphs - either generated from the input query or from a KB using the input query - contain useful structural information. This structural information could be leveraged to perform KGQA more accurately.
- RDAS (Wang et al., 2021)
- Incorporate information direction within reasoning
- GGNN for SP (Sorokin and Gurevych, 2018)
- Integrate the structure of the semantic query
- MHGRN (Feng et al., 2020)
- Capture relations between entities using a Graph Relation Network
The hybrid QA challenge involves answering question while not only referring to a KB but also use knowledge from external, often natural language textual sources. This can be especially helpful in domains, in which knowledge is not readily available in triplet form. This challenge overlaps with the Incomplete KG challenge.
- HCqa (Asadifar et al., 2019)
- Extract knowledge from text using linguistic patterns
- QARDTE (Zheng et al., 2018)
- NN with attention mechanism to extract features from unstructured text based on the input question to be used during candidate re-ranking
- HybQA (Mohamed et al., 2017)
- Filter answers using Wikipedia as external source
The authors (Sidiropoulos et al., 2020) define an unseen domain as a domain for which facts exist in a given KB/KG but are absent within the training data.
- Representation Adapter (Wu et al., 2019)
- Use an adapter to map from general purpose representations to task specific ones (model-centric)
- Synthetic Data (Sidiropoulos et al., 2020)
- Generation of synthetic training data (distant supervision) for new, unseen domains (data-centric)
Pre-trained language models have learned knowledge in a more general sense, which means that they can struggle in situations in which structured or factual knowledge is required (Kassner and Schütze, 2020). Therefore, using language models alone for KGQA can lead to poor performance. However, leveraging language models with structural information from KGs can lead to better question understanding and increased accuracy (Yasunaga et al., 2021).
- Transfer Learning with BERT (Lukovnikov et al., 2020)
- Use BERT to predict the relation of the input
- QA-GNN (Yasunaga et al., 2021)
- Integrate QA context with KG subgraphs
Generating a set of relation candidates for an input query can be a very challenging task as it requires finding solutions for different problems such as finding the right candidates and limiting the candidate size. Furthermore, it is necessary to rank the candidates correctly in order to retrieve the correct answer. The following research addresses these problems.
- UHOP (Chen et al., 2019)
- Lifting the limit of hops without increasing the candidate set's size
- Incremental Sequence Matching (Lan et al., 2019)
- Iterative candidate path generation and pruning
- Retrieve and Re-rank (Wang et al., 2021)
- Create an inverted index and create a candidate set using the TF-IDF algorithm and rank the candidates using BERT
The goal of the following research is to increase the accuracy of RE.
- Multi-View Matching (Yu et al., 2018)
- Match the input question to multiple views from the KG to capture more information
- Masking Mechanism (Chen et al., 2018)
- Set a hop limit of 2 to hide far away relations, which might be irrelevant