diff --git a/docs/en/programming/vllm.md b/docs/en/programming/vllm.md index 20aa941d..bf1285ea 100644 --- a/docs/en/programming/vllm.md +++ b/docs/en/programming/vllm.md @@ -6,7 +6,7 @@ For now, we enable vLLM to accelerate policy generation. ## Model Definition -Similar to inheriting `MegatronModule` for implementing [PolicyInference Model](../../../examples/megatron/models/old_policy_inference.py), the vLLM backend can be enabled by inheriting `VLLMModule` class and implementing the following key modules: +Similar to inheriting `MegatronModule` for implementing [PolicyInference Model](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/old_policy_inference.py), the vLLM backend can be enabled by inheriting `VLLMModule` class and implementing the following key modules: - model_provider: model definition function. - setup: call model_provider to define model. Optionly, call `load_checkpoint` or others. - build_dataset: Preprocess train/eval dataset with vLLM tokenizer. @@ -48,9 +48,9 @@ class VLLMPolicyInference(VLLMModule): pass ``` -You can refer to[vllm_policy_inference.py](../../../examples/megatron/models/vllm_policy_inference.py), in which build_dataset/_add_request/forward_step/decode_internal clarified as following: +You can refer to[vllm_policy_inference.py](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py), in which build_dataset/_add_request/forward_step/decode_internal clarified as following: -- build_dataset: Use `tokenizer`, you only need to return prompt_ids and prompt string. In `build_dataset`, [VLLMPromptPipeline](../../../examples/megatron/data/prompt_dataset.py#141) shows as following: +- build_dataset: Use `tokenizer`, you only need to return prompt_ids and prompt string. In `build_dataset`, [VLLMPromptPipeline](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/data/prompt_dataset.py#141) shows as following: ```python class VLLMPromptPipeline(PromptPipeline): def __init__(self, prompts: List[str], max_prompt_length: int, tokenizer=None): @@ -108,7 +108,7 @@ class VLLMPolicyInference(VLLMModule): return self._forward_step(data, iteration, eval_mode=False) ``` -- decode_internal: Refer to [examples](../../../examples/megatron/models/vllm_policy_inference.py#L119) for more details. Format of param `batched_outputs` is List[RequestOutput], in which [RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)includes the following key attributes: +- decode_internal: Refer to [examples](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py#L119) for more details. Format of param `batched_outputs` is List[RequestOutput], in which [RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)includes the following key attributes: | Attibute |Type| Comment | |:------:|:-----:|:-----:| @@ -140,7 +140,7 @@ policy: ... ``` -Or you can refer to [llama2 model yaml](../../../examples/megatron/configs/llama2/vllm_rlhf.yaml). +Or you can refer to [llama2 model yaml](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_rlhf.yaml). ## hyperparameter configuration yaml @@ -186,4 +186,4 @@ Hyperparameter for vLLM can be divied into 5 parts: - Others: `includes` specifies model structure. -You can refer to [vLLM Hyperparameter Configuration](../../../examples/megatron/configs/llama2/vllm_policy_inference.yaml) for details. +You can refer to [vLLM Hyperparameter Configuration](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_policy_inference.yaml) for details. diff --git a/docs/en/tutorial/ems.md b/docs/en/tutorial/ems.md index 7deeff21..da7b8d9c 100644 --- a/docs/en/tutorial/ems.md +++ b/docs/en/tutorial/ems.md @@ -26,4 +26,4 @@ Alternatively, it can also be configured in the training script using environmen - PPO policy model: `export free_memory_ppo_policy=True` - PPO value model: `export free_memory_ppo_value=True` -A complete example can be found in the [llama2 configuration](../../../examples/megatron/configs/llama2/rlhf.yaml). \ No newline at end of file +A complete example can be found in the [llama2 configuration](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/rlhf.yaml). \ No newline at end of file diff --git a/docs/zh/programming/vllm.md b/docs/zh/programming/vllm.md index 22548ee0..b2fec2b2 100644 --- a/docs/zh/programming/vllm.md +++ b/docs/zh/programming/vllm.md @@ -6,7 +6,7 @@ ChatLearn中支持vLLM进行跨机分布式推理,支持vllm和training backen ## 模型定义 -类似于继承`MegatronModule`实现[PolicyInference模型](../../../examples/megatron/models/old_policy_inference.py),PolicyInference模型若想基于vLLM后端完成generation,需要继承`VLLMModule`父类,实现以下关键模块: +类似于继承`MegatronModule`实现[PolicyInference模型](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/old_policy_inference.py),PolicyInference模型若想基于vLLM后端完成generation,需要继承`VLLMModule`父类,实现以下关键模块: - model_provider:模型定义函数。 - setup:调用model_provider定义模型,可根据需要决定是否load_checkpoint等。 - build_dataset:调用vLLM tokenizer处理数据,生成prompt dataset。 @@ -48,9 +48,9 @@ class VLLMPolicyInference(VLLMModule): pass ``` -示例可参考[vllm_policy_inference.py](../../../examples/megatron/models/vllm_policy_inference.py),补充说明build_dataset、_add_request、forward_step、decode_internal如下: +示例可参考[vllm_policy_inference.py](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py),补充说明build_dataset、_add_request、forward_step、decode_internal如下: -- build_dataset:调用tokenizer处理只需要返回prompt_ids、prompt str,其中build_dataset的[VLLMPromptPipeline](../../../examples/megatron/data/prompt_dataset.py#141)具体逻辑如下: +- build_dataset:调用tokenizer处理只需要返回prompt_ids、prompt str,其中build_dataset的[VLLMPromptPipeline](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/data/prompt_dataset.py#141)具体逻辑如下: ```python class VLLMPromptPipeline(PromptPipeline): def __init__(self, prompts: List[str], max_prompt_length: int, tokenizer=None): @@ -108,7 +108,7 @@ class VLLMPolicyInference(VLLMModule): return self._forward_step(data, iteration, eval_mode=False) ``` -- decode_internal:可参考[examples](../../../examples/megatron/models/vllm_policy_inference.py#L119)实现。参数batched_outputs格式为List[RequestOutput],其中[RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)包含以下重要attributes: +- decode_internal:可参考[examples](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py#L119)实现。参数batched_outputs格式为List[RequestOutput],其中[RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)包含以下重要attributes: | 属性 |类型| 含义 | |:------:|:-----:|:-----:| @@ -138,7 +138,7 @@ policy: model_config_file: vllm_policy_inference.yaml ... ``` -也可以参考示例 [llama2模型配置](../../../examples/megatron/configs/llama2/vllm_rlhf.yaml)。 +也可以参考示例 [llama2模型配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_rlhf.yaml)。 ## 超参配置 @@ -182,4 +182,4 @@ vLLM超参可分为五部分: - tokenizer:vLLM tokenizer读取目录,可参考[LLama2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b) - 其他:includes指定模型结构等其余参数; -可以参考 [vLLM超参配置](../../../examples/megatron/configs/llama2/vllm_policy_inference.yaml)。 +可以参考 [vLLM超参配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_policy_inference.yaml)。 diff --git a/docs/zh/tutorial/ems.md b/docs/zh/tutorial/ems.md index dab68054..4cd552b8 100644 --- a/docs/zh/tutorial/ems.md +++ b/docs/zh/tutorial/ems.md @@ -29,4 +29,4 @@ policy: - ppo_policy 模型:`export free_memory_ppo_policy=True` - ppo_value 模型:`export free_memory_ppo_value=True` -完整示例可以参考 [llama2 配置](../../../examples/megatron/configs/llama2/rlhf.yaml)。 +完整示例可以参考 [llama2 配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/rlhf.yaml)。