-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minicpm enabling #1342
base: main
Are you sure you want to change the base?
Minicpm enabling #1342
Conversation
Performance benchmark python run_generation.py --model_name_or_path openbmb/MiniCPM3-4B --max_new_tokens 512 --prompt "推荐5个北京的景点。" --batch_size 1 --temperature 0.7 --do_sample --top_p 0.7 --use_kv_cache --use_chat_template --use_hpu_graphs [--bf16]
|
b48031e
to
0a7133e
Compare
0a7133e
to
7092e1c
Compare
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
""" | ||
MiniCPM model configuration. Copied from https://huggingface.co/openbmb/MiniCPM3-4B/tree/6fcf8b4e629d01a435b96e898899e0b6d9bddb7a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you list what the change in each section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The files with "Copied from..." did not have any changes, while the files with "Adapted from..." have changes. In modeling_minicpm.py
, I added docstrings for each method that I changed with what I modified. Is this sufficient? Or should I add a generic summary on the file docstring as well?
7092e1c
to
af00294
Compare
af00294
to
224f940
Compare
@pi314ever , you need to add jsonschema and datamodel_code_generator into requirements. Or remove custom tokenizer since the model owner already switch back to Llama tokenizer, as here https://huggingface.co/openbmb/MiniCPM3-4B/commit/e8a65f63cd4e4eff91571e603a2a34e50628ff67#d2h-846292 |
Signed-off-by: Daniel Huang <[email protected]>
@yao-matrix I have removed the tokenizer and validated it to run with default requirements.txt. There are no performance differences from basic testing. |
What does this PR do?
Enables MiniCPM3 model for Causal LM. Follows #1133 in optimizing remote code. The following changes were added:
htcore.mark_step()
between each decoder layer to reduce graph sizetoken_idx
support passthrough to Attention implementations. (Currently incomplete, missing reset of cache)Before submitting