Skip to content

Commit

Permalink
update link
Browse files Browse the repository at this point in the history
  • Loading branch information
SeaOfOcean committed Aug 28, 2024
1 parent 029f00a commit 7e82c63
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 14 deletions.
12 changes: 6 additions & 6 deletions docs/en/programming/vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ For now, we enable vLLM to accelerate policy generation.

## Model Definition

Similar to inheriting `MegatronModule` for implementing [PolicyInference Model](../../../examples/megatron/models/old_policy_inference.py), the vLLM backend can be enabled by inheriting `VLLMModule` class and implementing the following key modules:
Similar to inheriting `MegatronModule` for implementing [PolicyInference Model](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/old_policy_inference.py), the vLLM backend can be enabled by inheriting `VLLMModule` class and implementing the following key modules:
- model_provider: model definition function.
- setup: call model_provider to define model. Optionly, call `load_checkpoint` or others.
- build_dataset: Preprocess train/eval dataset with vLLM tokenizer.
Expand Down Expand Up @@ -48,9 +48,9 @@ class VLLMPolicyInference(VLLMModule):
pass
```

You can refer to[vllm_policy_inference.py](../../../examples/megatron/models/vllm_policy_inference.py), in which build_dataset/_add_request/forward_step/decode_internal clarified as following:
You can refer to[vllm_policy_inference.py](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py), in which build_dataset/_add_request/forward_step/decode_internal clarified as following:

- build_dataset: Use `tokenizer`, you only need to return prompt_ids and prompt string. In `build_dataset`, [VLLMPromptPipeline](../../../examples/megatron/data/prompt_dataset.py#141) shows as following:
- build_dataset: Use `tokenizer`, you only need to return prompt_ids and prompt string. In `build_dataset`, [VLLMPromptPipeline](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/data/prompt_dataset.py#141) shows as following:
```python
class VLLMPromptPipeline(PromptPipeline):
def __init__(self, prompts: List[str], max_prompt_length: int, tokenizer=None):
Expand Down Expand Up @@ -108,7 +108,7 @@ class VLLMPolicyInference(VLLMModule):
return self._forward_step(data, iteration, eval_mode=False)
```

- decode_internal: Refer to [examples](../../../examples/megatron/models/vllm_policy_inference.py#L119) for more details. Format of param `batched_outputs` is List[RequestOutput], in which [RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)includes the following key attributes:
- decode_internal: Refer to [examples](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py#L119) for more details. Format of param `batched_outputs` is List[RequestOutput], in which [RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)includes the following key attributes:

| Attibute |Type| Comment |
|:------:|:-----:|:-----:|
Expand Down Expand Up @@ -140,7 +140,7 @@ policy:
...
```

Or you can refer to [llama2 model yaml](../../../examples/megatron/configs/llama2/vllm_rlhf.yaml).
Or you can refer to [llama2 model yaml](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_rlhf.yaml).

## hyperparameter configuration yaml

Expand Down Expand Up @@ -186,4 +186,4 @@ Hyperparameter for vLLM can be divied into 5 parts:
- Others: `includes` specifies model structure.


You can refer to [vLLM Hyperparameter Configuration](../../../examples/megatron/configs/llama2/vllm_policy_inference.yaml) for details.
You can refer to [vLLM Hyperparameter Configuration](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_policy_inference.yaml) for details.
2 changes: 1 addition & 1 deletion docs/en/tutorial/ems.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ Alternatively, it can also be configured in the training script using environmen
- PPO policy model: `export free_memory_ppo_policy=True`
- PPO value model: `export free_memory_ppo_value=True`

A complete example can be found in the [llama2 configuration](../../../examples/megatron/configs/llama2/rlhf.yaml).
A complete example can be found in the [llama2 configuration](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/rlhf.yaml).
12 changes: 6 additions & 6 deletions docs/zh/programming/vllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ ChatLearn中支持vLLM进行跨机分布式推理,支持vllm和training backen

## 模型定义

类似于继承`MegatronModule`实现[PolicyInference模型](../../../examples/megatron/models/old_policy_inference.py),PolicyInference模型若想基于vLLM后端完成generation,需要继承`VLLMModule`父类,实现以下关键模块:
类似于继承`MegatronModule`实现[PolicyInference模型](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/old_policy_inference.py),PolicyInference模型若想基于vLLM后端完成generation,需要继承`VLLMModule`父类,实现以下关键模块:
- model_provider:模型定义函数。
- setup:调用model_provider定义模型,可根据需要决定是否load_checkpoint等。
- build_dataset:调用vLLM tokenizer处理数据,生成prompt dataset。
Expand Down Expand Up @@ -48,9 +48,9 @@ class VLLMPolicyInference(VLLMModule):
pass
```

示例可参考[vllm_policy_inference.py](../../../examples/megatron/models/vllm_policy_inference.py),补充说明build_dataset、_add_request、forward_step、decode_internal如下:
示例可参考[vllm_policy_inference.py](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py),补充说明build_dataset、_add_request、forward_step、decode_internal如下:

- build_dataset:调用tokenizer处理只需要返回prompt_ids、prompt str,其中build_dataset的[VLLMPromptPipeline](../../../examples/megatron/data/prompt_dataset.py#141)具体逻辑如下:
- build_dataset:调用tokenizer处理只需要返回prompt_ids、prompt str,其中build_dataset的[VLLMPromptPipeline](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/data/prompt_dataset.py#141)具体逻辑如下:
```python
class VLLMPromptPipeline(PromptPipeline):
def __init__(self, prompts: List[str], max_prompt_length: int, tokenizer=None):
Expand Down Expand Up @@ -108,7 +108,7 @@ class VLLMPolicyInference(VLLMModule):
return self._forward_step(data, iteration, eval_mode=False)
```

- decode_internal:可参考[examples](../../../examples/megatron/models/vllm_policy_inference.py#L119)实现。参数batched_outputs格式为List[RequestOutput],其中[RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)包含以下重要attributes:
- decode_internal:可参考[examples](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py#L119)实现。参数batched_outputs格式为List[RequestOutput],其中[RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)包含以下重要attributes:

| 属性 |类型| 含义 |
|:------:|:-----:|:-----:|
Expand Down Expand Up @@ -138,7 +138,7 @@ policy:
model_config_file: vllm_policy_inference.yaml
...
```
也可以参考示例 [llama2模型配置](../../../examples/megatron/configs/llama2/vllm_rlhf.yaml)
也可以参考示例 [llama2模型配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_rlhf.yaml)

## 超参配置

Expand Down Expand Up @@ -182,4 +182,4 @@ vLLM超参可分为五部分:
- tokenizer:vLLM tokenizer读取目录,可参考[LLama2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b)
- 其他:includes指定模型结构等其余参数;

可以参考 [vLLM超参配置](../../../examples/megatron/configs/llama2/vllm_policy_inference.yaml)
可以参考 [vLLM超参配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_policy_inference.yaml)
2 changes: 1 addition & 1 deletion docs/zh/tutorial/ems.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,4 @@ policy:
- ppo_policy 模型:`export free_memory_ppo_policy=True`
- ppo_value 模型:`export free_memory_ppo_value=True`

完整示例可以参考 [llama2 配置](../../../examples/megatron/configs/llama2/rlhf.yaml)。
完整示例可以参考 [llama2 配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/rlhf.yaml)。

0 comments on commit 7e82c63

Please sign in to comment.