Replies: 2 comments
-
This is great. I think the etLLM flow can also leverage this when it's in good shape. The recipes can be called under the hood of export_llm. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What’s the problem we are trying to solve?
This is what the flow to lowering to ExecuTorch today looks like:
Requires users to know the right parameters and configurations to achieve the best performance possible for their targeted backend.
Important to note: This is not intended to be a replacement for the lower level ExecuTorch API’s. Those will continue to exist and be supported as public interfaces by us. There will also be a small set of use cases for which this sort of an interface might be too restrictive and that’s completely fine. The aim is to enhance usability for the majority of ET users, approximately 80-90%, who are common users and will greatly benefit from a simpler system.
How do we solve this?
The proposal here is to introduce a higher level API that abstracts away these details and allows the users to target a model for on-device deployment using a recipe whose structure will be defined by us.
ExecuTorch repo will contain a small set of recipes that are maintained by us and target backends that are most commonly used and available as part of the ExecuTorch repo.
Proposed API:
Input class:
Will contain all the parameters needed to successfully export and lower a model. At the bare-minimum the users will need to have a model and a set of inputs to go along with it. Beyond that if the user has additional features required such as multi-method export etc. support will exist for that.
Recipe class:
The recipe class essentially contains all the configuration details needed to successfully export and lower a model to ExecuTorch such as partitioner configuration for targeting a certain backend, transformation passes required to get the optimal graph for a certain backend etc.
What’s contained within the recipe and what those details mean should not matter for most users, except for relatively advanced users. The users who will create and maintain these recipes will be users who have relatively advanced knowledge of ExecuTorch such as backend authors, core maintainers/contributors etc.
Users will refer to recipes by simple names such as:
To target different SOC’s we can have helper utilities on top of that:
How are recipes going to be managed?
Recipes are generally intended to be model agnostic and that has worked reasonably well for most use cases we’ve seen internally. The same recipes that we use to lower vision models to HTP can also be leveraged to lower LLM’s to HTP.
There will be cases where we have to specialize recipes for certain use cases to run an extra set of passes or to pass in different configs to a partitioner and that is completely fine. The main point of the recipes is to abstract out these inner details into an opaque object that most users don’t care about and can just leverage them directly for their intended use case.
For example we can have a base recipe for a certain backend that can then be extended in a helper utility:
Recipes owned and managed by the ExecuTorch team:
We will have a set of recipes targeting all the primary backends hosted in the ExecuTorch repo. These recipes will be owned by us and external partners.
Backward compatibility:
For the recipe structure and the recipes owned by us we will follow the general BC compatibility policy that the rest of the Executorch Python API’s follow. Beyond the structure of the Recipe interfaces and what attributes a recipe contains we will not offer any other additional BC guarantees. We will fall back to the BC guarantees that the lower level API’s provide.
Quantization:
Quantization today broadly consists of two sets of flows PTQ and QAT. We can leverage recipes and offer these simplified interfaces to users to do quantization.
Given that the recipes already contain quantizers (configured in the appropriate way by the recipe generation function) the quantization flows are pretty standard after that. For QAT specifically the approach might be a little more involved, but it’s also assumed that users who are going through the QAT flow are advanced enough to figure out any further issues.
Before and After flows:
What the flow looks like today:
What the flow will look like in a recipe based system:
Beta Was this translation helpful? Give feedback.
All reactions