-
Notifications
You must be signed in to change notification settings - Fork 31.2k
GLM-V update with new processor #42122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+3,928
−226
Merged
Changes from 71 commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
0428a86
init
zRzRzRzRzRzRzR 621de8d
update
zRzRzRzRzRzRzR bf860ed
add
zRzRzRzRzRzRzR 480caa4
Update video_processing_glm46v.py
zRzRzRzRzRzRzR 9a64f65
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR fd7a3f0
update doc
zRzRzRzRzRzRzR c137191
Merge branch 'glm-v' of github.com:zRzRzRzRzRzRzR/transformers into g…
zRzRzRzRzRzRzR 0e1c22e
Update modular_glm46v.py
zRzRzRzRzRzRzR 42d757f
2
zRzRzRzRzRzRzR 28bb922
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR db33336
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR 6f5aa1a
Update processing_glm46v.py
zRzRzRzRzRzRzR a43c606
21
zRzRzRzRzRzRzR 3e994f5
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR a242652
Update check_repo.py
zRzRzRzRzRzRzR 5225f53
Merge branch 'glm-v' of github.com:zRzRzRzRzRzRzR/transformers into g…
zRzRzRzRzRzRzR ce596eb
Update check_repo.py
zRzRzRzRzRzRzR 559fcf8
Update test_processor_glm46v.py
zRzRzRzRzRzRzR 513c2cc
Update modeling_auto.py
zRzRzRzRzRzRzR d6e966e
update
zRzRzRzRzRzRzR 275ebfe
Update glm46v.md
zRzRzRzRzRzRzR 6991564
Update configuration_auto.py
zRzRzRzRzRzRzR 3dff216
2
zRzRzRzRzRzRzR f9546bd
update with glm46v import
zRzRzRzRzRzRzR 58bed84
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR a5d95fc
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR 6e5ae03
uppercase
zRzRzRzRzRzRzR 1e2535d
upload
zRzRzRzRzRzRzR 8ae004f
upload
zRzRzRzRzRzRzR c1425c3
upload with modular
zRzRzRzRzRzRzR 9e184af
1
zRzRzRzRzRzRzR 0bd1a50
-
zRzRzRzRzRzRzR 1f74fa7
update
zRzRzRzRzRzRzR b9c8484
1
zRzRzRzRzRzRzR 7c92ad7
2
zRzRzRzRzRzRzR 5552ff2
1
zRzRzRzRzRzRzR f7bfc34
2
zRzRzRzRzRzRzR 0376e22
2
zRzRzRzRzRzRzR d03061d
1
zRzRzRzRzRzRzR c2033b4
update config
zRzRzRzRzRzRzR 79eef09
1
zRzRzRzRzRzRzR 514dec8
update as automoel
zRzRzRzRzRzRzR 3ceff09
1
zRzRzRzRzRzRzR 57b1b34
Merge branch 'huggingface:main' into glm-v
zRzRzRzRzRzRzR 63dafb1
try remove
zRzRzRzRzRzRzR 900c335
delete
zRzRzRzRzRzRzR 1c8905d
delete
zRzRzRzRzRzRzR 9c2d854
test
zRzRzRzRzRzRzR beedf50
update
zRzRzRzRzRzRzR efe6495
1
zRzRzRzRzRzRzR c811264
Update modular_glm46v.py
zRzRzRzRzRzRzR dd4dc1f
Update test_modeling_glm46v.py
zRzRzRzRzRzRzR b79487d
update 1513
zRzRzRzRzRzRzR e5b4a6d
1
zRzRzRzRzRzRzR 325216e
use PreTrainedConfig
zRzRzRzRzRzRzR b08437d
Update modular_glm46v.py
zRzRzRzRzRzRzR 87f1887
Update configuration_glm46v.py
zRzRzRzRzRzRzR ed88670
model_type = "glm46v"
zRzRzRzRzRzRzR 25b36eb
remove glm46v_text
zRzRzRzRzRzRzR 733d27b
Update image_processing_auto.py
zRzRzRzRzRzRzR ee29a2b
1
zRzRzRzRzRzRzR 6dac0c4
update readme
zRzRzRzRzRzRzR 3910b8a
GLM-4.6V
zRzRzRzRzRzRzR 7c160f0
update
zRzRzRzRzRzRzR b0654d9
update
zRzRzRzRzRzRzR b94cc13
Update __init__.py
zRzRzRzRzRzRzR ca310ee
update
zRzRzRzRzRzRzR 92fa57b
update doc
zRzRzRzRzRzRzR c9f260c
Update check_docstrings.py
zRzRzRzRzRzRzR 5ca3144
update doc
zRzRzRzRzRzRzR cdd9040
Merge branch 'main' into glm-v
zucchini-nlp 8b61f6b
Merge branch 'main' into glm-v
ArthurZucker 6d325c3
fix copies for tied weight keys!
ArthurZucker fef665e
more fixup
ArthurZucker ee546a1
Merge branch 'main' into glm-v
ArthurZucker e23d221
Merge branch 'main' into glm-v
ArthurZucker 0170755
fix copies?
ArthurZucker 54a2196
more fix copies
ArthurZucker 0fd529a
Up
ArthurZucker File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| # GLM-4.6V | ||
|
|
||
| ## Glm46VConfig | ||
|
|
||
| [[autodoc]] Glm46VConfig | ||
|
|
||
| ## Glm46VImageProcessor | ||
|
|
||
| [[autodoc]] Glm46VImageProcessor | ||
| - preprocess | ||
|
|
||
| ## Glm46VVideoProcessor | ||
|
|
||
| [[autodoc]] Glm46VVideoProcessor | ||
| - preprocess | ||
|
|
||
| ## Glm46VImageProcessorFast | ||
|
|
||
| [[autodoc]] Glm46VImageProcessorFast | ||
| - preprocess | ||
|
|
||
| ## Glm46VProcessor | ||
|
|
||
| [[autodoc]] Glm46VProcessor | ||
|
|
||
| ## Glm46VModel | ||
|
|
||
| [[autodoc]] Glm46VModel | ||
| - forward | ||
|
|
||
| ## Glm46VForConditionalGeneration | ||
|
|
||
| [[autodoc]] Glm46VForConditionalGeneration | ||
| - forward | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| # Copyright 2025 The HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| from ...utils import _LazyModule | ||
| from ...utils.import_utils import define_import_structure | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from .configuration_glm46v import * | ||
| from .image_processing_glm46v import * | ||
| from .image_processing_glm46v_fast import * | ||
| from .modeling_glm46v import * | ||
| from .processing_glm46v import * | ||
zRzRzRzRzRzRzR marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| from .video_processing_glm46v import * | ||
| else: | ||
| import sys | ||
|
|
||
| _file = globals()["__file__"] | ||
| sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 | ||
| # This file was automatically generated from src/transformers/models/glm46v/modular_glm46v.py. | ||
| # Do NOT edit this file manually as any edits will be overwritten by the generation of | ||
| # the file from the modular. If any change should be done, please apply the change to the | ||
| # modular_glm46v.py file directly. One of our CI enforces this. | ||
| # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨 | ||
| # coding=utf-8 | ||
| # Copyright 2025 the HuggingFace Team. All rights reserved. | ||
| # | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
|
|
||
| from ...configuration_utils import PreTrainedConfig | ||
| from ..auto import CONFIG_MAPPING, AutoConfig | ||
|
|
||
|
|
||
| class Glm46VConfig(PreTrainedConfig): | ||
| r""" | ||
| This is the configuration class to store the configuration of a [`Glm4vModel`]. It is used to instantiate a | ||
| GLM-4.6V model according to the specified arguments, defining the model architecture. Instantiating a | ||
| configuration with the defaults will yield a similar configuration to that of | ||
| GLM-4.1V-9B-Thinking [zai-org/GLM-4.1V-9B-Thinking](https://huggingface.co/zai-org/GLM-4.1V-9B-Thinking). | ||
|
|
||
| Configuration objects inherit from [`PreTrainedConfig`] and can be used to control the model outputs. Read the | ||
| documentation from [`PreTrainedConfig`] for more information. | ||
|
|
||
| Args: | ||
| text_config (`Union[PreTrainedConfig, dict]`, *optional*, defaults to `Glm4vTextConfig`): | ||
| The config object or dictionary of the text backbone. | ||
| vision_config (`Union[PreTrainedConfig, dict]`, *optional*, defaults to `Glm4vVisionConfig`): | ||
| The config object or dictionary of the vision backbone. | ||
| image_token_id (`int`, *optional*, defaults to 151343): | ||
| The image token index to encode the image prompt. | ||
| video_token_id (`int`, *optional*, defaults to 151344): | ||
| The video token index to encode the image prompt. | ||
| image_start_token_id (`int`, *optional*, defaults to 151339): | ||
| The image start token index to encode the start of image. | ||
| image_end_token_id (`int`, *optional*, defaults to 151340): | ||
| The image end token index to encode the end of image. | ||
| video_start_token_id (`int`, *optional*, defaults to 151361): | ||
| The video start token index to encode the start of video. | ||
| video_end_token_id (`int`, *optional*, defaults to 151362): | ||
| The video end token index to encode the end of video. | ||
|
|
||
| ```python | ||
| >>> from transformers import Glm46VForConditionalGeneration, Glm46VConfig | ||
|
|
||
| >>> # Initializing a GLM-4.6V style configuration | ||
| >>> configuration = Glm46VConfig() | ||
|
|
||
| >>> # Initializing a model from the GLM-4.6V style configuration | ||
| >>> model = Glm4vForConditionalGeneration(configuration) | ||
|
|
||
| >>> # Accessing the model configuration | ||
| >>> configuration = model.config | ||
| ```""" | ||
|
|
||
| model_type = "glm46v" | ||
| sub_configs = {"text_config": AutoConfig, "vision_config": AutoConfig} | ||
| keys_to_ignore_at_inference = ["past_key_values"] | ||
|
|
||
| def __init__( | ||
| self, | ||
| text_config=None, | ||
| vision_config=None, | ||
| image_token_id=151343, | ||
| video_token_id=151344, | ||
| image_start_token_id=151339, | ||
| image_end_token_id=151340, | ||
| video_start_token_id=151361, | ||
| video_end_token_id=151362, | ||
| **kwargs, | ||
| ): | ||
| if isinstance(vision_config, dict): | ||
| vision_config["model_type"] = vision_config.get("model_type", "glm4v_vision") | ||
| self.vision_config = CONFIG_MAPPING[vision_config["model_type"]](**vision_config) | ||
| elif vision_config is None: | ||
| self.vision_config = CONFIG_MAPPING["glm4v_vision"]() | ||
|
|
||
| if isinstance(text_config, dict): | ||
| text_config["model_type"] = text_config.get("model_type", "glm4v_text") | ||
| self.text_config = CONFIG_MAPPING[text_config["model_type"]](**text_config) | ||
| elif text_config is None: | ||
| self.text_config = CONFIG_MAPPING["glm4v_text"]() | ||
|
|
||
| self.image_token_id = image_token_id | ||
| self.video_token_id = video_token_id | ||
| self.video_start_token_id = video_start_token_id | ||
| self.video_end_token_id = video_end_token_id | ||
| self.image_start_token_id = image_start_token_id | ||
| self.image_end_token_id = image_end_token_id | ||
|
|
||
| super().__init__(**kwargs) | ||
|
|
||
|
|
||
| __all__ = ["Glm46VConfig"] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add something in the docs, usage examples and model description for example. Or do you want to wait until model is released first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a technical report will be added here once the model is ready.