-
Notifications
You must be signed in to change notification settings - Fork 52
Description
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
What is the outcome that you are trying to reach?
Agentic workflows are gaining traction, but they’re not easy to productionize. They often involve multiple LLM calls, external tools, and search APIs, which makes them:
- Hard to observe: it’s not clear what the agent is doing at each step.
- Fragile: if one step fails, the whole run usually has to restart.
- Expensive to scale: repeated retries and full restarts waste both time and money.
For teams that want to build serious deep research or multi-step reasoning agents, these gaps make it tough to move from prototype to production.
Describe the solution you would like
I’d like to contribute a deep research blueprint that runs on EKS, using Flyte 2.0 as the orchestration layer.
This blueprint would:
- Show why EKS is a great fit for scaling these workloads in a containerized environment.
- Demonstrate how Flyte 2.0 adds resilience with retries, recoverability from failures e.g. tool calls, and observability built in so agentic workflows don’t feel like black boxes.
- Highlight how users can author workflows in Python with all the convenience of dynamic programming, while still benefiting from production-grade guarantees.
- Use an open-source stack (Flyte 2.0, Together AI's Open Deep Research) to make it reproducible and easy for others to adopt.
Users will be able to run a deep research agent on EKS that can scale, recover from failures without starting over, and expose rich logs/metrics for every step. It also broadens the scope of AI on EKS by showing how orchestration + agents fit into the story, which is highly relevant to how people are experimenting with LLMs today.