Fix model_input_names singleton issue causing shared state #42051

yashwantbezawada · 2025-11-06T00:29:13Z

What does this PR do?

Fixes #42024

I found this while testing tokenizers - when you modify model_input_names on one tokenizer instance, it was affecting all other instances of the same tokenizer class.

The problem is that model_input_names is defined as a class-level list, and in the __init__ method (line 1417), when no custom model_input_names is provided, it was just referencing the class attribute directly instead of making a copy.

So all instances were sharing the same list object. This is a classic Python gotcha with mutable class attributes.

The fix is simple - wrap it in list() to create a new list for each instance:

Before:

self.model_input_names = kwargs.pop("model_input_names", self.model_input_names)

After:

self.model_input_names = list(kwargs.pop("model_input_names", self.model_input_names))

This ensures each tokenizer instance gets its own independent copy of the list. Now modifications to one instance won't affect others.

The reproduction from the issue shows the problem clearly - with the fix, the second tokenizer instance will have the original list values instead of inheriting the modifications from the first instance.

Fixes huggingface#42024 The model_input_names attribute was defined as a class-level list, and when initializing tokenizer instances, they were all pointing to the same list object. This meant modifying model_input_names on one instance would affect all other instances. The issue was in tokenization_utils_base.py line 1417: ```python self.model_input_names = kwargs.pop("model_input_names", self.model_input_names) ``` When no model_input_names is passed in kwargs, it would use the class attribute directly (self.model_input_names), creating a reference to the shared list instead of creating a new list for the instance. Fixed by wrapping it in list() to ensure each instance gets its own copy: ```python self.model_input_names = list(kwargs.pop("model_input_names", self.model_input_names)) ``` This is a standard pattern for handling mutable default values in Python.

Rocketknight1 · 2025-11-06T12:49:27Z

This solution LGTM but cc @ArthurZucker @itazap!

yashwantbezawada force-pushed the fix/model-input-names-singleton-42024 branch from 4d018bf to 6ba1ffb Compare November 6, 2025 02:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix model_input_names singleton issue causing shared state #42051

Fix model_input_names singleton issue causing shared state #42051

Uh oh!

yashwantbezawada commented Nov 6, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix model_input_names singleton issue causing shared state #42051

Are you sure you want to change the base?

Fix model_input_names singleton issue causing shared state #42051

Uh oh!

Conversation

yashwantbezawada commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Rocketknight1 commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yashwantbezawada commented Nov 6, 2025 •

edited

Loading