update link

SeaOfOcean · SeaOfOcean · commit 7e82c6309cf1 · 2024-08-28T19:06:39.000+08:00
diff --git a/docs/en/programming/vllm.md b/docs/en/programming/vllm.md
@@ -6,7 +6,7 @@ For now, we enable vLLM to accelerate policy generation.
 
 ## Model Definition
 
-Similar to inheriting `MegatronModule` for implementing [PolicyInference Model](../../../examples/megatron/models/old_policy_inference.py), the vLLM backend can be enabled by inheriting `VLLMModule` class and implementing the following key modules:
+Similar to inheriting `MegatronModule` for implementing [PolicyInference Model](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/old_policy_inference.py), the vLLM backend can be enabled by inheriting `VLLMModule` class and implementing the following key modules:
 - model_provider: model definition function.
 - setup: call model_provider to define model. Optionly, call `load_checkpoint` or others.
 - build_dataset: Preprocess train/eval dataset with vLLM tokenizer.
@@ -48,9 +48,9 @@ class VLLMPolicyInference(VLLMModule):
         pass
 ```
 
-You can refer to[vllm_policy_inference.py](../../../examples/megatron/models/vllm_policy_inference.py), in which build_dataset/_add_request/forward_step/decode_internal clarified as following:
+You can refer to[vllm_policy_inference.py](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py), in which build_dataset/_add_request/forward_step/decode_internal clarified as following:
 
-- build_dataset: Use `tokenizer`, you only need to return prompt_ids and prompt string. In `build_dataset`, [VLLMPromptPipeline](../../../examples/megatron/data/prompt_dataset.py#141) shows as following:
+- build_dataset: Use `tokenizer`, you only need to return prompt_ids and prompt string. In `build_dataset`, [VLLMPromptPipeline](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/data/prompt_dataset.py#141) shows as following:
 ```python
 class VLLMPromptPipeline(PromptPipeline):
     def __init__(self, prompts: List[str], max_prompt_length: int, tokenizer=None):
@@ -108,7 +108,7 @@ class VLLMPolicyInference(VLLMModule):
         return self._forward_step(data, iteration, eval_mode=False)
 ```
 
-- decode_internal: Refer to [examples](../../../examples/megatron/models/vllm_policy_inference.py#L119) for more details. Format of param `batched_outputs` is List[RequestOutput], in which [RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)includes the following key attributes:
+- decode_internal: Refer to [examples](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py#L119) for more details. Format of param `batched_outputs` is List[RequestOutput], in which [RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)includes the following key attributes:
 
 |   Attibute  |Type| Comment  |
 |:------:|:-----:|:-----:|
@@ -140,7 +140,7 @@ policy:
     ...
 ```
 
-Or you can refer to [llama2 model yaml](../../../examples/megatron/configs/llama2/vllm_rlhf.yaml).
+Or you can refer to [llama2 model yaml](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_rlhf.yaml).
 
 ## hyperparameter configuration yaml
 
@@ -186,4 +186,4 @@ Hyperparameter for vLLM can be divied into 5 parts:
 - Others: `includes` specifies model structure.
 
 
-You can refer to [vLLM Hyperparameter Configuration](../../../examples/megatron/configs/llama2/vllm_policy_inference.yaml) for details.
+You can refer to [vLLM Hyperparameter Configuration](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_policy_inference.yaml) for details.
diff --git a/docs/en/tutorial/ems.md b/docs/en/tutorial/ems.md
@@ -26,4 +26,4 @@ Alternatively, it can also be configured in the training script using environmen
 - PPO policy model: `export free_memory_ppo_policy=True`
 - PPO value model: `export free_memory_ppo_value=True`
 
-A complete example can be found in the [llama2 configuration](../../../examples/megatron/configs/llama2/rlhf.yaml).
+A complete example can be found in the [llama2 configuration](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/rlhf.yaml).
diff --git a/docs/zh/programming/vllm.md b/docs/zh/programming/vllm.md
@@ -6,7 +6,7 @@ ChatLearn中支持vLLM进行跨机分布式推理，支持vllm和training backen
 
 ## 模型定义
 
-类似于继承`MegatronModule`实现[PolicyInference模型](../../../examples/megatron/models/old_policy_inference.py),PolicyInference模型若想基于vLLM后端完成generation，需要继承`VLLMModule`父类，实现以下关键模块：
+类似于继承`MegatronModule`实现[PolicyInference模型](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/old_policy_inference.py),PolicyInference模型若想基于vLLM后端完成generation，需要继承`VLLMModule`父类，实现以下关键模块：
 - model_provider：模型定义函数。
 - setup：调用model_provider定义模型，可根据需要决定是否load_checkpoint等。
 - build_dataset：调用vLLM tokenizer处理数据，生成prompt dataset。
@@ -48,9 +48,9 @@ class VLLMPolicyInference(VLLMModule):
         pass
 ```
 
-示例可参考[vllm_policy_inference.py](../../../examples/megatron/models/vllm_policy_inference.py)，补充说明build_dataset、_add_request、forward_step、decode_internal如下：
+示例可参考[vllm_policy_inference.py](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py)，补充说明build_dataset、_add_request、forward_step、decode_internal如下：
 
-- build_dataset：调用tokenizer处理只需要返回prompt_ids、prompt str，其中build_dataset的[VLLMPromptPipeline](../../../examples/megatron/data/prompt_dataset.py#141)具体逻辑如下：
+- build_dataset：调用tokenizer处理只需要返回prompt_ids、prompt str，其中build_dataset的[VLLMPromptPipeline](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/data/prompt_dataset.py#141)具体逻辑如下：
 ```python
 class VLLMPromptPipeline(PromptPipeline):
     def __init__(self, prompts: List[str], max_prompt_length: int, tokenizer=None):
@@ -108,7 +108,7 @@ class VLLMPolicyInference(VLLMModule):
         return self._forward_step(data, iteration, eval_mode=False)
 ```
 
-- decode_internal：可参考[examples](../../../examples/megatron/models/vllm_policy_inference.py#L119)实现。参数batched_outputs格式为List[RequestOutput]，其中[RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)包含以下重要attributes：
+- decode_internal：可参考[examples](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/models/vllm_policy_inference.py#L119)实现。参数batched_outputs格式为List[RequestOutput]，其中[RequestOutput](https://github.com/vllm-project/vllm/blob/v0.5.1/vllm/outputs.py#L67)包含以下重要attributes：
 
 |   属性  |类型| 含义  |
 |:------:|:-----:|:-----:|
@@ -138,7 +138,7 @@ policy:
     model_config_file: vllm_policy_inference.yaml
     ...
 ```
-也可以参考示例 [llama2模型配置](../../../examples/megatron/configs/llama2/vllm_rlhf.yaml)。
+也可以参考示例 [llama2模型配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_rlhf.yaml)。
 
 ## 超参配置
 
@@ -182,4 +182,4 @@ vLLM超参可分为五部分：
 - tokenizer：vLLM tokenizer读取目录，可参考[LLama2-7B-hf](https://huggingface.co/meta-llama/Llama-2-7b)
 - 其他：includes指定模型结构等其余参数；
 
-可以参考 [vLLM超参配置](../../../examples/megatron/configs/llama2/vllm_policy_inference.yaml)。
+可以参考 [vLLM超参配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/vllm_policy_inference.yaml)。
diff --git a/docs/zh/tutorial/ems.md b/docs/zh/tutorial/ems.md
@@ -29,4 +29,4 @@ policy:
 - ppo_policy 模型：`export free_memory_ppo_policy=True`
 - ppo_value 模型：`export free_memory_ppo_value=True`
 
-完整示例可以参考 [llama2 配置](../../../examples/megatron/configs/llama2/rlhf.yaml)。
+完整示例可以参考 [llama2 配置](https://github.com/alibaba/ChatLearn/blob/main/examples/megatron/configs/llama2/rlhf.yaml)。