[RFC] executorch.export -- reducing the complexity of lowering models to ExecuTorch #9027

tarun292 · 2025-03-06T21:21:12Z

tarun292
Mar 6, 2025
Collaborator

What’s the problem we are trying to solve?

This is what the flow to lowering to ExecuTorch today looks like:

# Quantize:
prepare_pt2e(...)
# train or calibrate
convert_pt2e(...)
# Edge dialect, delegation and ExecuTorch targeting:
edge_manager = to_edge(model, compile_config)
edge_manager.transform(passes = [...])
edge_manager.to_backend(partitioners = [])
# or 
edge_manager = to_edge_transform_and_lower(model, partitioners = [], passes = [], compile_config)

et = edge_manager.to_executorch()
save_to_file(et, path)

The APIs expose lower level details of ExecuTorch that most users don’t necessarily care about. All they want to do is provide a model and get a PTE file that is optimized for a certain backend.
Adds to the cognitive burden of onboarding onto ExecuTorch and hinders the onboarding process.
Requires users to know the right parameters and configurations to achieve the best performance possible for their targeted backend.
A new user who’s working with ExecuTorch for the first time and wants to get the best possible out of box experience on CPU with ExecuTorch will need to figure out how to quantize and delegate to XNNPack. If there are minor configurations they miss out on, they might be leaving performance on the table without even knowing it.

Important to note: This is not intended to be a replacement for the lower level ExecuTorch API’s. Those will continue to exist and be supported as public interfaces by us. There will also be a small set of use cases for which this sort of an interface might be too restrictive and that’s completely fine. The aim is to enhance usability for the majority of ET users, approximately 80-90%, who are common users and will greatly benefit from a simpler system.

How do we solve this?

The proposal here is to introduce a higher level API that abstracts away these details and allows the users to target a model for on-device deployment using a recipe whose structure will be defined by us.

ExecuTorch repo will contain a small set of recipes that are maintained by us and target backends that are most commonly used and available as part of the ExecuTorch repo.

Proposed API:

executorch.export(input, recipe)

Input class:

Will contain all the parameters needed to successfully export and lower a model. At the bare-minimum the users will need to have a model and a set of inputs to go along with it. Beyond that if the user has additional features required such as multi-method export etc. support will exist for that.

class Input:
    name: None | str 
    model: nn.Module | torch.fx.GraphModule
    # Example inputs will be provided via this argument for exporting the mode
    data_loader: Iterator[tuple[torch.Tensor, ...]] | list[tuple[torch.Tensor, ...]]
    methods: list[str] = field(default_factory=lambda: ["forward"])
    dynamic_shapes: dict[str, Any] | tuple[Any]

Recipe class:

The recipe class essentially contains all the configuration details needed to successfully export and lower a model to ExecuTorch such as partitioner configuration for targeting a certain backend, transformation passes required to get the optimal graph for a certain backend etc.
What’s contained within the recipe and what those details mean should not matter for most users, except for relatively advanced users. The users who will create and maintain these recipes will be users who have relatively advanced knowledge of ExecuTorch such as backend authors, core maintainers/contributors etc.

Users will refer to recipes by simple names such as:

CPU_OPTIMIZED_RECIPE (will lower to ExecuTorch using XNNPack delegate)
COREML_EXECUTORCH_RECIPE (will lower to ExecuTorch using CoreML delegate)

To target different SOC’s we can have helper utilities on top of that:

e.g. get_qnn_recipe(soc_name = “xxxx”)

class Recipe:
    name: str
    edge_compile_config: EdgeCompileConfig
    # In the rare case that a recipe needs to run pre-edge transform passes, those will be
    # specified here.
    pre_edge_transform_passes: None | (Callable[[ExportedProgram], ExportedProgram]) = (None)
    # Transform passes that will be run after converting to edge dialect.
    edge_transform_passes: Sequence[PassType] = ()
    transform_check_ir_validity: bool = True
    partitioners: list[Partitioner] | None = None
    executorch_backend_config: ExecutorchBackendConfig | None = None
    artifact_dir: str | None = None
    mode: Mode = Mode.RELEASE
    source_transforms: Callable[[torch.nn.Module], torch.nn.Module] | None = None

How are recipes going to be managed?

Recipes are generally intended to be model agnostic and that has worked reasonably well for most use cases we’ve seen internally. The same recipes that we use to lower vision models to HTP can also be leveraged to lower LLM’s to HTP.

There will be cases where we have to specialize recipes for certain use cases to run an extra set of passes or to pass in different configs to a partitioner and that is completely fine. The main point of the recipes is to abstract out these inner details into an opaque object that most users don’t care about and can just leverage them directly for their intended use case.

For example we can have a base recipe for a certain backend that can then be extended in a helper utility:

def iphone_coreml_et_recipe(ios: int = 17, compute_unit: str = "CPU_ONLY"):
    #Generate compile specs based on ios version and compute_unit
    ....
    core_ml_partitioner = CoreMLPartitioner(compile_specs=compile_specs)
    return Recipe(
        ....
        partitioner = [core_ml_partitioner]
        ....
    )

Recipes owned and managed by the ExecuTorch team:

We will have a set of recipes targeting all the primary backends hosted in the ExecuTorch repo. These recipes will be owned by us and external partners.

Backward compatibility:

For the recipe structure and the recipes owned by us we will follow the general BC compatibility policy that the rest of the Executorch Python API’s follow. Beyond the structure of the Recipe interfaces and what attributes a recipe contains we will not offer any other additional BC guarantees. We will fall back to the BC guarantees that the lower level API’s provide.

Quantization:

Quantization today broadly consists of two sets of flows PTQ and QAT. We can leverage recipes and offer these simplified interfaces to users to do quantization.
Given that the recipes already contain quantizers (configured in the appropriate way by the recipe generation function) the quantization flows are pretty standard after that. For QAT specifically the approach might be a little more involved, but it’s also assumed that users who are going through the QAT flow are advanced enough to figure out any further issues.

def post_train_quantize(input_model: Input, recipe: Recipe, num_calibration_batches: int) -> torch.fx.GraphModule
def prepare_qat(input_model: Input, recipe: Recipe) -> torch.fx.GraphModule
def convert_qat(input_model: Input, recipe: Recipe) -> torch.fx.GraphModule

Before and After flows:

What the flow looks like today:

compile_specs = generate_compile_specs_from_args(args)
exir_program_aten = torch.export.export(model, example_inputs, strict=True)
edge_program_manager = exir.to_edge(exir_program_aten)
edge_copy = copy.deepcopy(edge_program_manager)
partitioner = CoreMLPartitioner(
    skip_ops_for_coreml_delegation=None, compile_specs=compile_specs
)
delegated_program_manager = edge_program_manager.to_backend(partitioner)
exec_program = delegated_program_manager.to_executorch()
buffer = exec_prog.buffer
with open(f"{model_name}_coreml_{compute_unit}.pte", "wb") as file:
    file.write(buffer)

What the flow will look like in a recipe based system:

Only concepts the user knows about are Input and Recipe
Users never need to know what a partitioner is or what partitioner to use for their target use case.
Users don’t need to know about compile specs and how to generate them.
Users don't need to know that to_edge, to_backend etc. exist and what order they need to be called in.

recipe = get_iphone_coreml_et_recipe(ios = 17, str = "CPU_ONLY")
input = Input(model = model, inputs = [input_tuple])

# Quantization
quantized_graph = post_train_quantize(input_model, recipe, num_calibration_batches) 
input.model = quantized_graph

et = executorch.export(input, recipe)
et.save_pte(f"{model_name}_coreml_{compute_unit}.pte")

tarun292 · 2025-03-06T21:23:02Z

tarun292
Mar 6, 2025
Collaborator Author

cc: @mergennachin @iseeyuan @kimishpatel @byjlw @Gasoonjia

0 replies

iseeyuan · 2025-03-06T21:55:21Z

iseeyuan
Mar 6, 2025
Collaborator

This is great. I think the etLLM flow can also leverage this when it's in good shape. The recipes can be called under the hood of export_llm.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] executorch.export -- reducing the complexity of lowering models to ExecuTorch #9027

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

[RFC] executorch.export -- reducing the complexity of lowering models to ExecuTorch #9027

tarun292 Mar 6, 2025 Collaborator

What’s the problem we are trying to solve?

How do we solve this?

Input class:

Recipe class:

How are recipes going to be managed?

Recipes owned and managed by the ExecuTorch team:

Backward compatibility:

Quantization:

Before and After flows:

What the flow looks like today:

What the flow will look like in a recipe based system:

Replies: 2 comments

tarun292 Mar 6, 2025 Collaborator Author

iseeyuan Mar 6, 2025 Collaborator

tarun292
Mar 6, 2025
Collaborator

tarun292
Mar 6, 2025
Collaborator Author

iseeyuan
Mar 6, 2025
Collaborator