Skip to content

Commit 9e890ae

Browse files
author
xingyiran
committed
update readme, add title image and flow illustartion
1 parent c1f2530 commit 9e890ae

File tree

3 files changed

+24
-54
lines changed

3 files changed

+24
-54
lines changed

README.md

+24-54
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,18 @@
1-
# RocketQA
1+
<p align=center> <img src="https://github.com/PaddlePaddle/RocketQA/blob/main/RocketQA_title.png" /> </p>
22

3-
In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutting edge technologies, this repository provides an easy-to-use toolkit for running and fine-tuning the state-of-the-art dense retrievers, namely **RocketQA**. This toolkit has the following advantages:
3+
<div align=center>
4+
5+
![](https://img.shields.io/badge/license-Apache%202-blue) ![](https://img.shields.io/badge/version-v1.0-green) ![](https://img.shields.io/badge/JupyterNotebook-Try%20%F0%9F%9A%80RocketQA%20Now!-orange) ![](https://img.shields.io/badge/requirements-up%20to%20date-brightgreen) ![](https://img.shields.io/badge/size-1.68MB-blue)
6+
7+
</div>
8+
9+
In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutting edge technologies, this repository provides an easy-to-use toolkit for running and fine-tuning the state-of-the-art dense retrievers, namely **🚀RocketQA**. This toolkit has the following advantages:
410

511

6-
* ***State-of-the-art***: It provides well-trained RocketQA models, which achieve SOTA performance on many dense retrieval datasets. And it will continue to update the [latest models](https://github.com/PaddlePaddle/RocketQA#news).
7-
* ***First-Chinese-model***: It provides the first open source Chinese dense retrieval model, which is trained on millions of manual annotation data from [DuReader](https://github.com/baidu/DuReader).
8-
* ***Easy-to-use***: By integrating this toolkit with [JINA](https://jina.ai/), developers can build an end-to-end question answering system with several lines of code.
12+
* ***State-of-the-art***: 🚀RocketQA provides our well-trained models, which achieve SOTA performance on many dense retrieval datasets. And it will continue to update the [latest models](https://github.com/PaddlePaddle/RocketQA#news).
13+
* ***First-Chinese-model***: 🚀RocketQA provides the first open source Chinese dense retrieval model, which is trained on millions of manual annotation data from [DuReader](https://github.com/baidu/DuReader).
14+
* ***Easy-to-use***: By integrating this toolkit with [JINA](https://jina.ai/), 🚀RocketQA can help developers build an end-to-end question answering system with several lines of code. <img src="https://github.com/PaddlePaddle/RocketQA/blob/main/RocketQA_flow.png" alt="" align=center />
15+
916

1017
## Installation
1118

@@ -39,7 +46,7 @@ docker run -it docker.io/rocketqa/rocketqa bash
3946

4047
## Getting Started
4148

42-
Refer to the examples below, you can build your own Search Engine with several lines of code.
49+
Refer to the examples below, you can build and run your own Search Engine with several lines of code. We also provide a [Playground]() with JupyterNotebook. Try 🚀RocketQA straight away in your browser!
4350

4451
### Running with JINA
4552
[JINA](https://jina.ai/) is a cloud-native neural search framework to build SOTA and scalable deep learning search applications in minutes. Here is a simple example to build a Search Engine based on JINA and RocketQA.
@@ -48,10 +55,11 @@ Refer to the examples below, you can build your own Search Engine with several l
4855
cd examples/jina_example
4956
pip3 install -r requirements.txt
5057

51-
# Index: Encodes and indexes text, then starts a searching service
58+
# Generate vector representations and build a libray for your Documents
59+
# JINA will automaticlly start a web service for you
5260
python3 app.py index toy_data/test.tsv
5361

54-
# Query: Encodes query and searches for answer, returns candidates ranked by relevance score
62+
# Try some questions related to the indexed Documents
5563
python3 app.py query_cli
5664
```
5765
Please view [JINA example](https://github.com/PaddlePaddle/RocketQA/tree/main/examples/jina_example) to know more.
@@ -62,19 +70,19 @@ We also provide a simple example built on [Faiss](https://github.com/facebookres
6270
cd examples/faiss_example/
6371
pip3 install -r requirements.txt
6472

65-
# Index: Encodes and indexes text
73+
# Generate vector representations and build a libray for your Documents
6674
python3 index.py en ../marco.tp.1k marco_index
6775

68-
# Start service
76+
# Start a web service on http://localhost:8888/rocketqa
6977
python3 rocketqa_service.py en ../marco.tp.1k marco_index
7078

71-
# Request: Encodes query and searches for answer, returns candidates ranked by relevance score
79+
# Try some questions related to the indexed Documents
7280
python3 query.py
7381
```
7482

7583

7684
## API
77-
RocketQA provide two types of models, ERNIE-based dual encoder for answer retrieval and ERNIE-based cross encoder for answer re-ranking. For running RocketQA models and your own checkpoints, you can use the following functions.
85+
You can also easily integrate 🚀RocketQA into your own task. We provide two types of models, ERNIE-based dual encoder for answer retrieval and ERNIE-based cross encoder for answer re-ranking. For running our models, you can use the following functions.
7886

7987
### Load model
8088

@@ -108,14 +116,13 @@ Cross-encoder returned by "load_model()" supports the following function:
108116

109117
Given a list of queries and paragraphs (and titles), returns their matching scores (probability that the paragraph is the query's right answer).
110118

111-
112119

113-
## Examples
120+
### Examples
114121

115-
Following the examples below, you can run RocketQA models and your own checkpoints.
122+
Following the examples below, you can retrieve the vector representations of your documents and connect 🚀RocketQA to your own tasks.
116123

117-
### Run RocketQA Model
118-
To run RocketQA models, you should set the parameter `model` in 'load_model()' with RocketQA model name return by 'available_models()'.
124+
#### Run RocketQA Model
125+
To run RocketQA models, you should set the parameter `model` in 'load_model()' with RocketQA model name returned by 'available_models()'.
119126

120127
```python
121128
import rocketqa
@@ -134,43 +141,6 @@ p_embs = dual_encoder.encode_para(para=para_list)
134141
dot_products = dual_encoder.matching(query=query_list, para=para_list)
135142
```
136143

137-
### Run Self-development Model
138-
To run your own checkpoints, you should write a config file, and set the parameter `model` in 'load_model()' with the path of the config file.
139-
140-
```python
141-
import rocketqa
142-
143-
query_list = ["交叉验证的作用"]
144-
title_list = ["交叉验证的介绍"]
145-
para_list = ["交叉验证(Cross-validation)主要用于建模应用中,例如PCR 、PLS回归建模中。在给定的建模样本中,拿出大部分样本进行建模型,留小部分样本用刚建立的模型进行预报,并求这小部分样本的预报误差,记录它们的平方加和。"]
146-
147-
# conf
148-
ce_conf = {
149-
"model": ${YOUR_CONFIG}, # path of config file
150-
"use_cuda": True,
151-
"device_id": 0,
152-
"batch_size": 16
153-
}
154-
155-
# init cross encoder
156-
cross_encoder = rocketqa.load_model(**ce_conf)
157-
158-
# compute matching score of query and para
159-
ranking_score = cross_encoder.matching(query=query_list, para=para_list, title=title_list)
160-
```
161-
162-
${YOUR_CONFIG} is a JSON format file.
163-
```bash
164-
{
165-
"model_type": "cross_encoder",
166-
"max_seq_len": 160,
167-
"model_conf_path": "en_large_config.json", # path relative to config file
168-
"model_vocab_path": "en_vocab.txt", # path relative to config file
169-
"model_checkpoint_path": "marco_cross_encoder_large", # path relative to config file
170-
"joint_training": 0
171-
}
172-
```
173-
174144
## News
175145
* August 26, 2021: [RocketQA v2](https://arxiv.org/pdf/2110.07367.pdf) was accepted by EMNLP 2021.
176146
* May 5, 2021: [PAIR](https://aclanthology.org/2021.findings-acl.191.pdf) was accepted by ACL 2021

RocketQA_flow.png

32.2 KB
Loading

RocketQA_title.png

17.3 KB
Loading

0 commit comments

Comments
 (0)