Feat auto round #1064

pablomlago · 2024-10-20T22:19:45Z

Reason for this PR

Implement AutoRound within Brevitas (see https://github.com/intel/auto-round, https://arxiv.org/pdf/2309.05516)

Changes Made in this PR

Incorporate AutoRound, refactored learned round methods into a single common interface.

Testing Summary

New tests for the learned round utilities, replicate results from AutoRound repo.

Risk Highlight

This PR includes code from another work (please detail).
This PR contains API-breaking changes.
This PR depends on work in another PR (please provide links/details).
This PR introduces new dependencies (please detail).
There are coverage gaps not covered by tests.
Documentation updates required in subsequent PR.

Checklist

Code comments added to any hard-to-understand areas, if applicable.
Changes generate no new warnings.
Updated any relevant tests, if applicable.
No conflicts with destination dev branch.
I reviewed my own code changes.
Initial CI/CD passing.
1+ reviews given, and any review issues addressed and approved.
Post-review full CI/CD passing.

…t-auto-round

nickfraser · 2024-11-04T15:57:23Z

@pablomlago, can you switch this target dev not master?

Giuseppe5

Preliminary review, I'll play with the code a bit in the meantime while you address these comments

src/brevitas/core/function_wrapper/auto_round.py

src/brevitas/optim/sign_sgd.py

src/brevitas_examples/llm/benchmark/llm_benchmark.py

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

src/brevitas_examples/llm/benchmark/llm_benchmark.py

Giuseppe5 · 2024-11-06T14:17:54Z

src/brevitas_examples/common/learned_round/learned_round_method.py

+            loss, rec_loss, round_loss, b)
+
+
+class AdaRound(LearnedRound):


Let's rename this, not sure to what

Giuseppe5 · 2024-11-06T14:18:44Z

src/brevitas_examples/common/learned_round/learned_round_method.py

+        return "loss = {:.4f}".format(loss)
+
+
+class AutoRound(LearnedRound):


Same as above, rename to something that makes more clear the difference between this FloatToIntImpl and the one above.

Giuseppe5 · 2024-11-06T14:19:57Z

src/brevitas_examples/llm/main.py

+
+        learned_round_llm_utils = LearnedRoundLLMUtils()
+        learned_round = AutoRound()
+        learned_round_optimiser = LearnedRoundOptimizer(


Not a fan of this entrypoint, I'd rather have strings that have a meaning, that are interpreted within the class to the correct classes

Agreed. I created a builder method. I don't think it's optimal still, and I'd be happy to iterate over it.

Giuseppe5 · 2024-11-06T14:20:31Z

src/brevitas_examples/llm/main.py

+    if args.learned_round:
+        print("Applying learned round...")
+
+        learned_round_llm_utils = LearnedRoundLLMUtils()


Ideally, we shouldn't be needing this class, and try to merge everything with the vision one.
For anything that can't be merged, it is user defined.

My idea with this design is to decouple the logic of the PTQ algorithm, which is contained in LearnedRoundOptimizer, from anything that it is model/dataloader specific, e.g. capturing inputs/outputs for a block. This needs to be managed by specific utilities, which adhere to the interface LearnedRoundModelUtils (thus following the strategy pattern). By doing so, a potential change in the vision models would not break the LLM entrypoint, and viceversa.

Giuseppe5 · 2024-11-06T14:20:57Z

tests/brevitas/optim/test_sign_sgd.py

@@ -0,0 +1,310 @@
+"""


Still confused about this license (not sure where my previous comment went)

Similar to: https://github.com/Xilinx/brevitas/blob/master/src/brevitas/nn/quant_mha.py.

tests/brevitas_examples/test_imagenet.py

tests/brevitas_examples/test_learned_round_utils.py

Giuseppe5 · 2024-11-06T14:22:09Z

src/brevitas_examples/common/learned_round/learned_round_method.py

+            return self.end_b + (self.start_b - self.end_b) * max(0.0, (1 - rel_t))
+
+
+class AdaRoundLoss(LearnedRoundLoss):


Change name to something more meaningful

Giuseppe5 · 2024-11-06T14:22:45Z

src/brevitas/core/function_wrapper/learned_round.py

@@ -52,6 +53,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
        return p


+# TODO: Change name to AdaRoundSte for consistency


Nope don't change

Remove comment

Giuseppe5 · 2024-11-06T14:23:37Z

src/brevitas/core/function_wrapper/learned_round.py

@@ -92,3 +94,36 @@ def _load_from_state_dict(
        value_key = prefix + 'value'
        if config.IGNORE_MISSING_KEYS and value_key in missing_keys:
            missing_keys.remove(value_key)
+
+
+class AutoRoundSte(brevitas.jit.ScriptModule):


Is there a way to merge this into the previous class?
In general, it would be nice to have a single learned round class that is general enough to support the different types of learned round

This should also simplify the rest of the work in the other files

Currently we have different float_to_int_impl (round, ceil, floor, ...), each of them with their corresponding STE (RoundSte, FloorSte, ...). Therefore, it seems sensible to me for those implementations of float_to_int_impl which involve learned parameters to be given the same treatment, and not aiming to aggregate them within a single general class, while keeping separate classes for those rounding methods which are not learnable. Moreover, the learned round methods that we have right now only have a single parameter tensor, but this does not need to be the case for future methods, so by trying to aggregated the learned round methods under a common umbrella now, we might make it more difficult in the future to integrate other methods.

Giuseppe5 · 2024-11-06T14:25:26Z

src/brevitas_examples/imagenet_classification/ptq/learned_round_utils.py

-
-
-class DataSaverHook:
+class LearnedRoundVisionUtils(LearnedRoundModelUtils):


As said somewhere else, it'd be nice to have a single class to handle both CNN/LLM or anything else really

See #1064 (comment).

Giuseppe5 · 2024-11-06T14:26:17Z

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

+        self.llm_cache_state = model.config.use_cache
+        model.config.use_cache = False
+
+    def finish_model_learned_round(self, model: nn.Module) -> None:


This would be a part of some function implemented in llm utils file, not of the PTQ algorithm

This method is not needed in that case

Giuseppe5 · 2024-11-06T14:26:36Z

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

+            disable_quant_class.enable_param_quantization(model, False)
+            restore_return_quant_tensor(model, return_quant_tensor_state)
+
+    def init_model_learned_round(self, model: nn.Module) -> None:


Same as below

Giuseppe5 · 2024-11-06T14:26:46Z

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

+        model.config.use_cache = self.llm_cache_state
+        self.llm_cache_state = None
+
+    def init_cache(self) -> Any:


Same as above

Giuseppe5 · 2024-11-06T14:28:49Z

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

+
+        return (args, kwargs), outs
+
+    def run_forward(


This shouldn't be here. Rather, the user should be able to specify whatever interface for the input to the model.
We should provide the interface to accept a function with a certain signature, and the user decides what happens inside that function.

Possible signature

def model_forward(model, model_args, model_kwargs):

This signature is defined in LearnedRoundModelUtils.

Giuseppe5 · 2024-11-06T14:29:02Z

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

+        self,
+        loss: torch.Tensor,
+    ) -> torch.Tensor:
+        return loss * 1000


Hardcoded stuff, bad.
Why is this here?

This is intended to help prevent gradient underflow if training in float16 (if not used, there's a +1.3 perplexity increase). Agreed that it should not be hard-coded for sure.

Giuseppe5 · 2024-11-06T14:30:09Z

src/brevitas_examples/llm/llm_quant/learned_round_utils.py

+    def default_block_check_fn(self, module: nn.Module, module_name: str) -> bool:
+        return isinstance(module, LlamaDecoderLayer) or isinstance(module, OPTDecoderLayer)
+
+    class _DataSaverHookLLM:


Is this different from the CNN version? How?

Giuseppe5 · 2024-11-07T14:21:19Z

src/brevitas_examples/common/learned_round/learned_round_builder.py

+
+def solve_learned_round_method_cls(method_type) -> LearnedRound:
+    if method_type == "ada_round":
+        return AdaRound


No AdaRound/AutoRound.

LearnedRound with options Sigmoid, HardSigmoid, Linear

pablomlago added 7 commits October 16, 2024 15:01

AutoRound standalone implementation

f0ab7d2

Initial implementation

00d94c4

Refactoring before removing legacy code

15c4d04

Remove legacy code

8a45723

LLM learned round

49c272b

Refactoring round methos

c3713b3

Merge branch 'dev' of https://github.com/pablomlago/brevitas into fea…

75c45d9

…t-auto-round

pablomlago changed the title ~~[DRAFT, DO NOT MERGE] Feat auto round~~ Feat auto round Nov 4, 2024

pablomlago changed the base branch from master to dev November 4, 2024 15:59

nickfraser requested review from nickfraser and Giuseppe5 November 4, 2024 17:04

pablomlago added 3 commits November 4, 2024 19:25

Remove unused import

1bb834f

Fix license and refactor benchmark

c58515c

Minor license change

834c765

Giuseppe5 reviewed Nov 5, 2024

View reviewed changes

pablomlago added 2 commits November 5, 2024 09:55

Minor refactor in SignSGD

15fe68c

Include appropiate licensing

e25e0f7

Giuseppe5 reviewed Nov 5, 2024

View reviewed changes

src/brevitas_examples/llm/benchmark/llm_benchmark.py Outdated Show resolved Hide resolved

pablomlago added 2 commits November 5, 2024 17:30

Address comments

41d29f2

Add missing change

26dbd6d

Giuseppe5 reviewed Nov 6, 2024

View reviewed changes

tests/brevitas_examples/test_imagenet.py Show resolved Hide resolved

Giuseppe5 reviewed Nov 6, 2024

View reviewed changes

tests/brevitas_examples/test_learned_round_utils.py Show resolved Hide resolved

Giuseppe5 reviewed Nov 6, 2024

View reviewed changes

pablomlago added 2 commits November 6, 2024 17:15

Fix progress bar

d65f5c4

Minor improvements

55da400

Giuseppe5 reviewed Nov 7, 2024

View reviewed changes

Giuseppe5 added the next release PRs which should be merged for the next release label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat auto round #1064

Feat auto round #1064

pablomlago commented Oct 20, 2024 •

edited

Loading

nickfraser commented Nov 4, 2024

Giuseppe5 left a comment

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 7, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 7, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 6, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 7, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 7, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 7, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 7, 2024

Giuseppe5 Nov 6, 2024

pablomlago Nov 7, 2024

Giuseppe5 Nov 6, 2024

Giuseppe5 Nov 7, 2024

		return "loss = {:.4f}".format(loss)


		class AutoRound(LearnedRound):

		return self.end_b + (self.start_b - self.end_b) * max(0.0, (1 - rel_t))


		class AdaRoundLoss(LearnedRoundLoss):

		@@ -52,6 +53,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
		return p


		# TODO: Change name to AdaRoundSte for consistency



		class DataSaverHook:
		class LearnedRoundVisionUtils(LearnedRoundModelUtils):

Feat auto round #1064

Are you sure you want to change the base?

Feat auto round #1064

Conversation

pablomlago commented Oct 20, 2024 • edited Loading

Reason for this PR

Changes Made in this PR

Testing Summary

Risk Highlight

Checklist

nickfraser commented Nov 4, 2024

Giuseppe5 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pablomlago commented Oct 20, 2024 •

edited

Loading