Minicpm enabling #1342

pi314ever · 2024-09-19T01:04:31Z

What does this PR do?

Enables MiniCPM3 model for Causal LM. Follows #1133 in optimizing remote code. The following changes were added:

htcore.mark_step() between each decoder layer to reduce graph size
Add token_idx support passthrough to Attention implementations. (Currently incomplete, missing reset of cache)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

pi314ever · 2024-09-20T21:23:17Z

Performance benchmark
Command:

python run_generation.py --model_name_or_path openbmb/MiniCPM3-4B  --max_new_tokens 512 --prompt "推荐5个北京的景点。" --batch_size 1 --temperature 0.7 --do_sample --top_p 0.7 --use_kv_cache --use_chat_template --use_hpu_graphs [--bf16]

throughput (tokens/s)	A100	Gaudi 2 (Graph mode)
BF 16	20.09	65.16
FP 32	23.89	44.46

libinta · 2024-11-12T05:34:03Z

optimum/habana/transformers/models/minicpm/configuration_minicpm.py

+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+MiniCPM model configuration. Copied from https://huggingface.co/openbmb/MiniCPM3-4B/tree/6fcf8b4e629d01a435b96e898899e0b6d9bddb7a


can you list what the change in each section?

The files with "Copied from..." did not have any changes, while the files with "Adapted from..." have changes. In modeling_minicpm.py, I added docstrings for each method that I changed with what I modified. Is this sufficient? Or should I add a generic summary on the file docstring as well?

yao-matrix · 2024-11-15T05:09:23Z

@pi314ever , you need to add jsonschema and datamodel_code_generator into requirements. Or remove custom tokenizer since the model owner already switch back to Llama tokenizer, as here https://huggingface.co/openbmb/MiniCPM3-4B/commit/e8a65f63cd4e4eff91571e603a2a34e50628ff67#d2h-846292

Signed-off-by: Daniel Huang <[email protected]>

pi314ever · 2024-11-15T19:25:02Z

@yao-matrix I have removed the tokenizer and validated it to run with default requirements.txt. There are no performance differences from basic testing.

pi314ever marked this pull request as ready for review September 20, 2024 21:23

pi314ever requested review from ssarkar2, bhargaveede, vivekgoe and regisss as code owners September 20, 2024 21:23

pi314ever force-pushed the minicpm-enabling branch from b48031e to 0a7133e Compare September 25, 2024 00:24

pi314ever force-pushed the minicpm-enabling branch from 0a7133e to 7092e1c Compare October 9, 2024 23:13

libinta reviewed Nov 12, 2024

View reviewed changes

pi314ever force-pushed the minicpm-enabling branch from 7092e1c to af00294 Compare November 12, 2024 19:48

pi314ever added 10 commits November 12, 2024 13:15

Add model file snapshot

2464cb5

Register minicpm3 model type

382c990

Enable Gaudi and example

e9ad881

Refactor and comment

a5f2328

Incomplete token_idx support

bc54495

Make style

f22a4fd

Completed token_idx support

753e2b1

Added chat template

756db84

Add CI tests

d649026

Make style

224f940

pi314ever force-pushed the minicpm-enabling branch from af00294 to 224f940 Compare November 12, 2024 21:16

Remove tokenizer

2044699

Signed-off-by: Daniel Huang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minicpm enabling #1342

Minicpm enabling #1342

pi314ever commented Sep 19, 2024

pi314ever commented Sep 20, 2024 •

edited

Loading

libinta Nov 12, 2024

pi314ever Nov 12, 2024

yao-matrix commented Nov 15, 2024 •

edited

Loading

pi314ever commented Nov 15, 2024 •

edited

Loading

Minicpm enabling #1342

Are you sure you want to change the base?

Minicpm enabling #1342

Conversation

pi314ever commented Sep 19, 2024

What does this PR do?

Before submitting

pi314ever commented Sep 20, 2024 • edited Loading

libinta Nov 12, 2024

Choose a reason for hiding this comment

pi314ever Nov 12, 2024

Choose a reason for hiding this comment

yao-matrix commented Nov 15, 2024 • edited Loading

pi314ever commented Nov 15, 2024 • edited Loading

pi314ever commented Sep 20, 2024 •

edited

Loading

yao-matrix commented Nov 15, 2024 •

edited

Loading

pi314ever commented Nov 15, 2024 •

edited

Loading