[RFC] etLLM: LLM via ExecuTorch #8228

iseeyuan · 2025-02-05T20:27:53Z

iseeyuan
Feb 5, 2025
Collaborator

tl;dr

The goal of this RFC is to streamline the end-to-end on-device LLM deployment flow via ExecuTorch. Potential users:

An LLM developer or hobbyist: I want to quickly put an LLM on a device and tune the accuracy/performance.
An app developer: I want to efficiently integrate an LLM to my Android or iOS app.

Some successful metrics of this project:

Time to deploy a new LLM model is competitive to other LLM deployment frameworks.
Users find ExecuTorch useful with the stack/tools and advantages to run on different HW backends in Android and iOS.
Usage and adoption of ExecuTorch is increased.

LLM developers and hobbyists use ExecuTorch in their day to day development.
App developers integrate ExecuTorch to their production apps.

Nogoal: This project is not about deploying an arbitrary non-LLM model via ExecuTorch. However, it can be a critical part of “ExecuTorch just works”.
Context
Popular LLMs share similar transformer based architectures. The fixed architecture brings some convenience on deployment. An example is llama.cpp. However, when deploying to a variety of backends, the flows can be different due to different limitations of the backends. Those limitations include
Static shapes vs. dynamic shapes
Static quantization vs. dynamic quantization
Data types a backend supports
Kernels available in a backend
Different types of the attention layer
How to handle the KV Cache
Exploded code copies
Sometimes, updating the exporting recipe is not sufficient or efficient. Supporting a specific backend may involve a different copy of model definition, and a different version of runtime code. On the other hand, we can see the potential trend of scale due to more use cases and models to be supported. Adding those new models up, and multiplying them with the number of backends, and use cases to support, the final versions can explode.
Below is a table to summarize those existing code versions, their unique properties and use cases.

Eager mode Export Runtime Properties Use cases Comments

llama_transformer export_llama_libconfiged by commandline args llama runnerdeps on extension/llm KV cache in attributes. Input_posMoE. dynamic shapeQuant: eager, pt2e, spinquant, QATBackends: heavily on CPU, QNN? CoreML? Executorch specific with different options Complicated commandline args.

static_llama llama2llama3 runnerwith kvcach/io management KV cache as IO,no input_pos, feedforward_conv? sha?Quant: pt2e, QAT? QNN

executorch/examples/apple/coreml/llama export.py in the same folder ANE

huggingface Guang’s post llama_runner in executorch export to simple xnnpack so far Hugging face models

torchtunellama3.2 visionTransformer definition is shared image encoder and LLM Ecoder and LLM use different export due to different input formats and quants. MM runtime similar to Llava Reuse torchtune multimodal definition

torchchat export.py Export to both server and ExecuTorch torchchat for local deployment

Slow deployment to devices
For each model, the journey from the source code to on-device deployment is slow. Users may feel difficulties to understand the details of each step: export, dynamic shapes, quantization schemes, partitioners in delegates, custom ops, runtime builds, runners in Android/iOS, profiling process etc.
Redundant work of enabling new models
When the number of new models scale, there can be redundant work in the ExecuTorch deployment flow, for the same or similar architecture.

RFC
Types of new LLM models
There are three types of “new” LLM models from the inference point of view:
There’s only weight and configuration change. For example, From Llama3.1 to Llama3.2, and to other models like Qwen, Phi-3, etc.
There is a significant difference in some components. For example, the DeepSeek models use Mixture of Expert (MoE) on FFN. Multi-head Latent Attention (MLA) is used vs. Multi Head Attention (MHA). However, the popular varieties are limited.
A new model is composed of a transformer model(s) and other parts. For example, the vision encoder, CLIP, is composed of transformers and other embeddings and layer norms.
We are exploring efficient solutions for those types. The high-level design thoughts are:
Reuse as much as possible existing flows.
Hide the implementation details, but expose the necessary configurations to users.

Entry point of export_llm
The interface can be as simple as:

python -m export_llm --check_point="ckp.pt" --model_config="model_config.json" --export_config="export_config.yaml"

# Alternatively, directly downloading from Hugging Face
python -m export_llm --hf_id="meta-llama/Llama-3.2-1B" --export_config="export_config.yaml"
The check_point and model_config arguments are the same as existing export_llama. Differences are:
It can be extended to other LLMs (other than Llama).
The export configs are aggregated to one file, instead of a long list of arguments. The content of this yaml file should be user-facing, like target backend, quantization scheme, etc.
[To explore] The output artifacts should be both .pte model file, and optionally a runtime binary/app that can directly run on Android and iOS devices, and cache.
Hide all the cmake options, which are related to export options like DEXECUTORCH_BUILD_XNNPACK, DEXECUTORCH_BUILD_KERNELS_CUSTOM, etc.
The benchmark results can be obtained immediately with a flag in export_config.yaml.
Below are thoughts on implementation details that are not exposed directly to users.
Eager mode definition
Note: With the improvements of export capabilities and backend support, ideally an arbitrary eager mode definition can be exported and lowered to any backend. However, we don’t see the feasibility in the near future

To handle new models in type 1-3, there can be model definitions maintained by ExecuTorch.
It’s clean for further export recipes.
Variety implementations are hidden from the users. For example, the popular attention implementations. If necessary, we can also keep a copy of definition for a type of backends. For example, attention with static shapes for QNN and ANE.
It’s easy to compose a new model based on the definition components.
.
Provide tools to help the source code transforms. Options are listed below:
Weight mapping. For example, using torchtune utils to convert HF safetensor weights or torchtune format to PyTorch checkpoint. Example here.
Convert the configs: Qwen, deepseek, or easy ways to build those models using the existing components.
Source-level transforms. Good for code unification; Not straightforward for readability. Cannot be used in all situations like different APIs.

Open questions
How is QAT handled?
users may want to do QAT on torchtune models since the infra is set up there.
If it’s eager-mode QAT (weight only) we can do a transform to the QAT submodule.
PT2E QAT: has to be in ET definition. Should we set up the QAT flow based on the ET transformer?
Export recipes
export_llama looks over-complicated with all the command options to handle different quantization schemes, different backends, etc. Inspired by our internal ModAI tool, as well as torchtunes configuration structure based on hydra:

[RFC] Have one configuration file/recipe for each backend.
What’s the format to host this recipe, is it a python script or a yaml file? A config file may have the advantage of simplicity (users don’t have to know the implementation details) and better version control, but may introduce more effort to maintain.
What’s the granularity of the recipes? If all configs are decoupled from the implementation, it may be more reasonable to have one implementation, like export_llama_cpu, but multiple config yamls for each target use case, like different quantization group sizes.
Modularize the code like checkpoints, quantization, etc.
Runtime
Runtime is another user-facing entry point. It’s deployed to Android or iOS, with the capability of loading and running the LLM models. The LLM model artifacts (.pte files) can be from
downloaded from hf.com/executorch-community
using optimum-executorch file
executorch.export API
Runtime codes may need more simplification and unification, due to the complexity of maintaining and building the C++ codes.

[RFC] Runtime code should be as backend-agnostic as possible. Some features should be modularized and a library of those features to be provided.
Runtime code should be simple. The complex logic should be put into the model if possible, for reasons below:
Scalability: C++ codes reusable through operators. Don’t need to maintain multiple C++ files for multiple models.
Portability: no need to sync C++ files in two repos in dev stage (like in ExecuTorch and torchchat).
Better UX: it’s easier for users to integrate the model inference to their use cases and less error prone.
To accelerate the development efficiency, python binding of the runtime APIs would be provided. Users can call runtime components where python is available, like on Macs for development purpose.
There is a strong need for runtime components. For example,
KV Cache management. We should modularize those and provide APIs to access KV Caches.
When the user logic gets more complicated, a local data container to efficiently and safely store/retrieve the data would be necessary. Our existing MLDW may help here.
Tokenizer
On-device deployment
The on-device deployment (to Android and iOS) should be fast.
A testing binary or iOS app can be built with minimum user interaction. (Can we do it in the same entry point of export_llm?)
It would be easy for users to quickly integrate ET with their own app or SDK.
Hansong’s Android Roadmap 25H1, the developer experience session would help for LLM on-device deployment.
Evaluation and Benchmark
Nothing new here, but there’s still a gap to achieve this:
The benchmark information should be easy to obtain with a configuration.
The benchmark data should be easy to understand, like the performance bottleneck and the hierarchical structure of the model.
Hansong’s OSS Android Benchmarking: minibench and microbench is on ET in general, it would be great to have LLM specific benchmark, similar to the internal one.

GregoryComer · 2025-02-06T00:44:31Z

GregoryComer
Feb 6, 2025
Collaborator

@iseeyuan Do you think it would be possible to get to a point where we can meet model definitions where they are - even if we're maybe getting 50%-80% of theoretical peak performance? The reason I ask is that there is a significant burden involved in writing the model the way that ET expects. If we could provide a "peak performance path" and an "out of box / just works" path, that would be very nice. My experience with some internal teams is that they try to run existing language models on ET and drop it when the out of box performance is far behind what they expect.

Aside from that, I think standardizing the language model APIs will be a big win for usability. Thanks for putting this together.

11 replies

kimishpatel Feb 11, 2025
Collaborator

(b) We can develop the export flow and expose necessary config to the users. @tarun292 is looking at some high-level export flows to wrap the technical details.

@iseeyuan I think what @GregoryComer means is that, if someone wanted to adapt model definition or reuse components from it, like custom ops etc., it is not clear how user can do that for their own model.

cccclai Feb 11, 2025
Collaborator

I wouldn't worry about export or kernel changes too much in terms of performance regression, because export is about capturing graph semantics, and kernel won't be related because the graph is completely lowered to NPU backends. The example case I provided is a real case and it's hard to capture if there is a regression. There can be lots of engineers' hours to optimize the performance while it's not very visible from PyTorch source code level, while it can be obvious on graph level. That's also one reason for the multiple model definitions and each backend can focus on its own optimization while not regressing other backends, because different backends' kernel can have different flavors.

kimishpatel Feb 11, 2025
Collaborator

@cccclai can your concern be address via appropriate source transformation? Like I saw you added a few variants of KVCache implementation. Or if kv cache was IO do you feel that such optimizations are not possible?

iseeyuan Feb 12, 2025
Collaborator Author

@cccclai

That's also one reason for the multiple model definitions and each backend can focus on its own optimization while not regressing other backends, because different backends' kernel can have different flavors.

This RFC is not trying to unify the source definition to a single one. That's why I added the attention interface and @sxu added the static_attention. I'm just trying to unify the flow to start from a single entry point, instead of scattered copies around different locations.

while it's not very visible from PyTorch source code level, while it can be obvious on graph level

But graph is not the format most of direct users familiar with. And the improvement you made in your PR is a good example on how we should keep our own copy of transformer codes to maintain the performance.

I'd still argue that perf effect is not just from source code.

@kimishpatel Yes, we should use kv cache as IO. It's another example of a source code and flow that we maintain, and may help the performance, instead of an arbitrary source code.

cccclai Feb 12, 2025
Collaborator

@cccclai can your concern be address via appropriate source transformation? Like I saw you added a few variants of KVCache implementation. Or if kv cache was IO do you feel that such optimizations are not possible?

For this particular examples, yes. My question is mostly about that, while we're trying to unify code, we might end up with perf regression, because the ops used under the PyTorch source code isn't obvious.

perf effect is not just from source code.

The PR I shared is exactly the evidence that the source code change will affect performance.

This RFC is not trying to unify the source definition to a single one

There are still sort of connections parts from modules to modules, and some components might be shared by multiple backends, or maybe not. My question is how to guarantee there won't be perf regression to the other backends if one backend attempts to make those changes to the share parts.

kimishpatel · 2025-02-11T23:07:05Z

kimishpatel
Feb 11, 2025
Collaborator

meta comment: you need to format the RFC a bit. Indentation and spacing is off that it makes it a hard read

1 reply

iseeyuan Feb 12, 2025
Collaborator Author

My bad of copying from a Google doc. Will do a one-time update together with all the feedback.

kimishpatel · 2025-02-11T23:17:11Z

kimishpatel
Feb 11, 2025
Collaborator

@iseeyuan I feel that you should split this RFC into two. One for model architecture definition and one for export_llama_lib refactoring.
My reasoning is that
a: model definition as it relates to model API is the most user visible thing and the only thing that comes to mind here is kv cache as IO vs. not. Input_pos as part of model's API. The rest, such as MHA vs. SHA, different variants of attention etc. are the kind of transformations, at source level or graph level, that can be captured in b.
b. Most optimizations should be applied as part of export_llama_lib. So we should refactor this, as presented here, export_coreml, export_qnn etc. ANd we can leverage structure like Tester that @digantdesai wrote for XNNPACK. The "Exporter" can have different stages each amenable to different kind of transformations. This is kind of already the case with builder https://github.com/pytorch/executorch/blob/main/extension/llm/export/builder.py#L143. So maybe builder should be extensible by backends. This will simplify menagerie of options currently in export_llama.

2 replies

iseeyuan Feb 12, 2025
Collaborator Author

@kimishpatel This RFC is focused on the end to end user experience of deploying transformers (or checkpoints) to devices, instead of just focusing on one individual step. When we see the full picture of how all steps are working together, we can create sub projects. For example, what can be captured in b and what can be done in a, we need this RFC to define the boundary. It includes runtime parts as well. I'm talking to @guangy10 and brainstorm on the benchmark, which is a critical part of the deployment iteration.

We could start from the builder, and expose user-facing APIs, and use components of the internal tool with recipes. @tarun292 has an issue on a high-level export APIs that can hide the details but still expose enough API for users.

Could you share the code pointer of the Tester that @digantdesai wrote?

digantdesai Feb 21, 2025
Collaborator

Oh yes - here is the tester and here is a simple example on how to use it.

The design goal was to encapsulate the details inside a Stage, which is fully customizable. You make a pipeline by connecting these stages, which can be interacted with, while being Stage type agnostic.

As Kimish said, we can have different "transforms" as Stages, in addition to normal stages, which can optionally be part of the pipeline.

guangy10 · 2025-02-12T02:38:54Z

guangy10
Feb 12, 2025
Collaborator

Currently rewriting is not avoidable for export and backend limitations

To what extent is this true? If someone doesn't have permission to rewrite the modeling code, does that mean the model won't work for that backend at all? Or will it still work but just not achieve the best performance? Maybe @kimishpatel @digantdesai @cccclai can comment about it?
I think my question implies what @GregoryComer mentioned, that is, there could be separate paths for "peak performance" and "out-of-the-box" performance.

[RFC1] Provide tools to help the source code rewrite.

I agree that we should provide tools to make source code rewriting easier. However, I don’t think rewriting should always be done in the original modeling code, as this could impact dozens of models and make OSS contributions increasingly difficult as more models are covered. (This is how I interpret the proposal, given its goal of unifying code and reducing boilerplate.)

For example, as a user, if the code I modify could affect the performance of numerous models and use cases, I’d be hesitant to make changes and would likely defer to ET developers instead. This not only places us at the center of enablement and improvement but also increases the risk of making contributions more intimidating.

For the second layer of rewrite (backend specific), we are seeking for code sharing to avoid redundant work.

Many of the proposed ideas already exist in HF. For example, the ability to add and register different attention implementations is already supported (pointer). Additionally, the lifted cache is already exported as IO in Exported IR (example).

My impression is that this proposal is leaning toward consolidating HF Transformers' definitions and own it in our repo, aiming to support as many transformer models as possible—including text, audio, and vision transformers. Can this approach scale effectively?

One of the core principles HF Transformers upholds is "single model, single file" (as mentioned at PTC 2024). I believe they are fully aware of the downside of this approach—namely, redundant code—but it provides significant flexibility in isolating performance impacts across models and reduces the complexity of performance testing. So far, this strategy has proven highly successful.

Runtime code should be as backend-agnostic as possible. Some features should be modularized and a library of those features to be provided.

I want to second this. Some ML engineers who just want to prototype quickly in Python shouldn’t need to be aware of the runtime code (C++). Take HF workflow as an example—good UX means an ML engineer should be able to experiment with different models and recipes, validating end-to-end in Python without needing any knowledge of the underlying runtime. This requires the interface to runtime(s) to be not only backend-agnostic but also model-agnostic.

When supporting internal and open source projects, we realized that there are a number of transformer definitions, export flows and runtimes existing in both OSS and internal repos.

Back to the key problem highlighted in this proposal, having multiple modeling sources in our repo is indeed a challenge, but is having multiple modeling sources itself a problem? I see these as two distinct issues, and the latter doesn’t seem avoidable—it will happen somewhere regardless.

We will focus on transformer decoders at the moment

You mean decoder-only transformers, right? What about encoder-only transformers (like BERT) and encoder-decoder transformers (like T5)? What’s the plan for non-transformer models, such as diffusion models or Timm models? If we’re heading down this path, I think we need to consider the full picture.

Q: Should the ExecuTorch repo serve as a recipe repository? If so, how many recipes do you expect to host in the ExecuTorch repo?

This proposal seems to imply that the ExecuTorch repo will also function as a recipe repository.

I agree that providing a default recipe for each backend makes sense. However, that alone doesn’t justify the need to host these recipes within ExecuTorch. Some of the proposed ideas, such as controlling recipes via a configuration file, are already well-supported by Hugging Face not just for eager, but also for ONNX and TFLite. Why is it necessary to rebuild a similar mechanism and maintain it in our repo?

From the perspective of building a vibrant community, I think it is key that recipes are separated from the core. While we can offer a default recipe for each backend as an option, we shouldn’t restrict users to copying and customizing them for their own needs. To encourage organic community growth, users should be able to create as many recipes as they want and make recipes shareable so that other OSS users can benefit. This level of openness wouldn’t be possible if recipes were tightly coupled in our repo.

6 replies

guangy10 Feb 12, 2025
Collaborator

Yeah if a model can't export, it has to be fixed anyway regardless of the backend? Can we expect most delegates will work out-of-the-box if a model is exportable though it may not provide the best performance?

larryliu0820 Feb 13, 2025
Collaborator

controlling recipes via a configuration file, are already well-supported by Hugging Face not just for eager, but also for ONNX and TFLite.

@guangy10 I'm curious to learn more about this, can ONNX / TFLite directly work on the transformers model definition without any rewrite?

If they can, then for ExecuTorch it's slightly different because we need some level of model rewrite, can the rewritten model definition be hosted in HF as well?

If they also need some model rewrite, how is this done in HF?

jackzhxng Feb 13, 2025
Collaborator

Some of the proposed ideas, such as controlling recipes via a configuration file, are already well-supported by Hugging Face not just for eager, but also for ONNX and TFLite. Why is it necessary to rebuild a similar mechanism and maintain it in our repo?

We already control our llama_transformer.py via the configuration file (params.json), which maps to ModelArgs. These configurations modify how the llama_transformer behaves, such as n_layers, moe, etc. If we add a bit more modularity, we can start capturing more models with this llama_transformer, such as Qwen 2.5 which I was able to easily implement here.

From the perspective of building a vibrant community, I think it is key that recipes are separated from the core. While we can offer a default recipe for each backend as an option, we shouldn’t restrict users to copying and customizing them for their own needs.

I agree, but I don't think the existence of this new framework prevents users from doing this. They can still write their own model definitions and export/lower manually.

guangy10 Feb 14, 2025
Collaborator

controlling recipes via a configuration file, are already well-supported by Hugging Face not just for eager, but also for ONNX and TFLite.

@guangy10 I'm curious to learn more about this, can ONNX / TFLite directly work on the transformers model definition without any rewrite?

If they can, then for ExecuTorch it's slightly different because we need some level of model rewrite, can the rewritten model definition be hosted in HF as well?

If they also need some model rewrite, how is this done in HF?

@larryliu0820 HF is moving towards organizing different backends in separate repo, from putting all backends in the central optimum. Now they have:

optimum-intel: https://github.com/huggingface/optimum-intel
optimum-amd: https://github.com/huggingface/optimum-amd
optimum-neuron: https://github.com/huggingface/optimum-neuron
optimum-nvidia: https://github.com/huggingface/optimum-nvidia
And a brand new repo for ExecuTorch, optimum-executorch: https://github.com/huggingface/optimum-executorch

The modeling code rewriting can not be avoided not only for ExecuTorch backends. Taking optimum-intel for example, it's rewriting the original transforms in:

And many more. The modeling code rewriting for ONNXRuntime is here: https://github.com/huggingface/optimum/blob/main/optimum/onnxruntime/modeling_decoder.py To me it seems to be an ideal place for patching the original modeling code, creating recipes from the HF Transformers to be tailored to a specific backend. cc: @mergennachin

kimishpatel Feb 14, 2025
Collaborator

Thank you @guangy10 for detailed write up and I ended up reading a bit on HF transformers here https://huggingface.co/docs/transformers/add_new_model.

I generally agree, having read through, that if we provide composable components in ExecuTorch in order to enable different variants of transformer models then we are just sort of recreating what HF transformers is.
Not only that we are further away from leveraging same model definition that tune has. So this direction probably needs heavy reconsideration.

Evidence on accelerator or backend specific optimum repo suggests that thats the place for doing things of this sorts. Especially if you are targeting HF users.

On the philosophy of single model single file versus reusable modules, tune has opted for the latter. Although I cannot say which one is truly better or suites our needs.

My opinion would be that we should not host model definition in executorch in the long term. A challenge we have though is that we need to be able to support model definitions coming from HF (or via optimum-executorch) as well as tune. We have internal customers leveraging tune as well. So whatever our eventual solution is should try to address both of these clientele and maybe we need two different solutions.

So the challenges I see are

Supporting HF models via optimum-executorch
Supporting tune model via what? optimum-executorch as well?

A common denominator might be that we host optimized attentions, kv cache etc. modules and we probably gonna have to scrub say at least 5-10 transformer, maybe decoder only to start with, models and make sure our optimized modules actually work with these models. I have very little signal on this. What about Tune's model definition then, would it work there? Lets say we figure this part out.

Then the question is if we need significant model rewrites where should those be hosted? optimum-executorch does sound like a good place as with other repos there. But can we address the need to be compatible with tune's model definition? This part seems non-trivial and I need to think a bit more but I did want to leave my early thoughts on this.

Once again thanks @guangy10 for bringing these up

iseeyuan · 2025-02-13T21:32:26Z

iseeyuan
Feb 13, 2025
Collaborator Author

Thanks for the feedback! I'm planning to revamp the RFC to highlight:

the ultimate goals of 1) shorten the model deployment time.
1. Improve the device-aware development.
focus on user-facing APIs like checkpoint loading and hide details on how we maintain backend-friendly definitions.

0 replies

iseeyuan · 2025-02-26T21:16:57Z

iseeyuan
Feb 26, 2025
Collaborator Author

Okay, Updated the discussion to V2. Thanks again for your comments. Please take a look and let me know if you have further comments! cc @GregoryComer @kimishpatel @cccclai @digantdesai @guangy10 @larryliu0820 @jackzhxng

0 replies

[RFC] etLLM: LLM via ExecuTorch #8228

iseeyuan Feb 5, 2025 Collaborator

tl;dr

Context

Exploded code copies

Slow deployment to devices

Redundant work of enabling new models

RFC

Types of new LLM models

Entry point of export_llm

Eager mode definition

Export recipes

Runtime

On-device deployment

Evaluation and Benchmark

Replies: 6 comments · 20 replies

GregoryComer Feb 6, 2025 Collaborator

kimishpatel Feb 11, 2025 Collaborator

cccclai Feb 11, 2025 Collaborator

kimishpatel Feb 11, 2025 Collaborator

iseeyuan Feb 12, 2025 Collaborator Author

cccclai Feb 12, 2025 Collaborator

kimishpatel Feb 11, 2025 Collaborator

iseeyuan Feb 12, 2025 Collaborator Author

kimishpatel Feb 11, 2025 Collaborator

iseeyuan Feb 12, 2025 Collaborator Author

digantdesai Feb 21, 2025 Collaborator

guangy10 Feb 12, 2025 Collaborator

guangy10 Feb 12, 2025 Collaborator

larryliu0820 Feb 13, 2025 Collaborator

jackzhxng Feb 13, 2025 Collaborator

guangy10 Feb 14, 2025 Collaborator

kimishpatel Feb 14, 2025 Collaborator

iseeyuan Feb 13, 2025 Collaborator Author

iseeyuan Feb 26, 2025 Collaborator Author

iseeyuan
Feb 5, 2025
Collaborator

Replies: 6 comments 20 replies

GregoryComer
Feb 6, 2025
Collaborator

kimishpatel Feb 11, 2025
Collaborator

cccclai Feb 11, 2025
Collaborator

kimishpatel Feb 11, 2025
Collaborator

iseeyuan Feb 12, 2025
Collaborator Author

cccclai Feb 12, 2025
Collaborator

kimishpatel
Feb 11, 2025
Collaborator

iseeyuan Feb 12, 2025
Collaborator Author

kimishpatel
Feb 11, 2025
Collaborator

iseeyuan Feb 12, 2025
Collaborator Author

digantdesai Feb 21, 2025
Collaborator

guangy10
Feb 12, 2025
Collaborator

guangy10 Feb 12, 2025
Collaborator

larryliu0820 Feb 13, 2025
Collaborator

jackzhxng Feb 13, 2025
Collaborator

guangy10 Feb 14, 2025
Collaborator

kimishpatel Feb 14, 2025
Collaborator

iseeyuan
Feb 13, 2025
Collaborator Author

iseeyuan
Feb 26, 2025
Collaborator Author