You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutting edge technologies, this repository provides an easy-to-use toolkit for running and fine-tuning the state-of-the-art dense retrievers, namely **RocketQA**. This toolkit has the following advantages:
In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutting edge technologies, this repository provides an easy-to-use toolkit for running and fine-tuning the state-of-the-art dense retrievers, namely **🚀RocketQA**. This toolkit has the following advantages:
4
10
5
11
6
-
****State-of-the-art***: It provides well-trained RocketQA models, which achieve SOTA performance on many dense retrieval datasets. And it will continue to update the [latest models](https://github.com/PaddlePaddle/RocketQA#news).
7
-
****First-Chinese-model***: It provides the first open source Chinese dense retrieval model, which is trained on millions of manual annotation data from [DuReader](https://github.com/baidu/DuReader).
8
-
****Easy-to-use***: By integrating this toolkit with [JINA](https://jina.ai/), developers can build an end-to-end question answering system with several lines of code.
12
+
****State-of-the-art***: 🚀RocketQA provides our well-trained models, which achieve SOTA performance on many dense retrieval datasets. And it will continue to update the [latest models](https://github.com/PaddlePaddle/RocketQA#news).
13
+
****First-Chinese-model***: 🚀RocketQA provides the first open source Chinese dense retrieval model, which is trained on millions of manual annotation data from [DuReader](https://github.com/baidu/DuReader).
14
+
****Easy-to-use***: By integrating this toolkit with [JINA](https://jina.ai/), 🚀RocketQA can help developers build an end-to-end question answering system with several lines of code. <imgsrc="https://github.com/PaddlePaddle/RocketQA/blob/main/RocketQA_flow.png"alt=""align=center />
15
+
9
16
10
17
## Installation
11
18
@@ -39,7 +46,7 @@ docker run -it docker.io/rocketqa/rocketqa bash
39
46
40
47
## Getting Started
41
48
42
-
Refer to the examples below, you can build your own Search Engine with several lines of code.
49
+
Refer to the examples below, you can build and run your own Search Engine with several lines of code. We also provide a [Playground]() with JupyterNotebook. Try 🚀RocketQA straight away in your browser!
43
50
44
51
### Running with JINA
45
52
[JINA](https://jina.ai/) is a cloud-native neural search framework to build SOTA and scalable deep learning search applications in minutes. Here is a simple example to build a Search Engine based on JINA and RocketQA.
@@ -48,10 +55,11 @@ Refer to the examples below, you can build your own Search Engine with several l
48
55
cd examples/jina_example
49
56
pip3 install -r requirements.txt
50
57
51
-
# Index: Encodes and indexes text, then starts a searching service
58
+
# Generate vector representations and build a libray for your Documents
59
+
# JINA will automaticlly start a web service for you
52
60
python3 app.py index toy_data/test.tsv
53
61
54
-
#Query: Encodes query and searches for answer, returns candidates ranked by relevance score
62
+
#Try some questions related to the indexed Documents
55
63
python3 app.py query_cli
56
64
```
57
65
Please view [JINA example](https://github.com/PaddlePaddle/RocketQA/tree/main/examples/jina_example) to know more.
@@ -62,19 +70,19 @@ We also provide a simple example built on [Faiss](https://github.com/facebookres
62
70
cd examples/faiss_example/
63
71
pip3 install -r requirements.txt
64
72
65
-
#Index: Encodes and indexes text
73
+
#Generate vector representations and build a libray for your Documents
66
74
python3 index.py en ../marco.tp.1k marco_index
67
75
68
-
# Start service
76
+
# Start a web service on http://localhost:8888/rocketqa
69
77
python3 rocketqa_service.py en ../marco.tp.1k marco_index
70
78
71
-
#Request: Encodes query and searches for answer, returns candidates ranked by relevance score
79
+
#Try some questions related to the indexed Documents
72
80
python3 query.py
73
81
```
74
82
75
83
76
84
## API
77
-
RocketQA provide two types of models, ERNIE-based dual encoder for answer retrieval and ERNIE-based cross encoder for answer re-ranking. For running RocketQA models and your own checkpoints, you can use the following functions.
85
+
You can also easily integrate 🚀RocketQA into your own task. We provide two types of models, ERNIE-based dual encoder for answer retrieval and ERNIE-based cross encoder for answer re-ranking. For running our models, you can use the following functions.
78
86
79
87
### Load model
80
88
@@ -108,14 +116,13 @@ Cross-encoder returned by "load_model()" supports the following function:
108
116
109
117
Given a list of queries and paragraphs (and titles), returns their matching scores (probability that the paragraph is the query's right answer).
110
118
111
-
112
119
113
-
## Examples
120
+
###Examples
114
121
115
-
Following the examples below, you can run RocketQA models and your own checkpoints.
122
+
Following the examples below, you can retrieve the vector representations of your documents and connect 🚀RocketQA to your own tasks.
116
123
117
-
### Run RocketQA Model
118
-
To run RocketQA models, you should set the parameter `model` in 'load_model()' with RocketQA model name return by 'available_models()'.
124
+
####Run RocketQA Model
125
+
To run RocketQA models, you should set the parameter `model` in 'load_model()' with RocketQA model name returned by 'available_models()'.
0 commit comments