Blueprint Proposal: Production-grade Agentic Deep Research Workflow on EKS with Flyte 2.0

### Community Note

* Please vote on this issue by adding a 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to the original issue to help the community and maintainers prioritize this request
* Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
* If you are interested in working on this issue or have submitted a pull request, please leave a comment



#### What is the outcome that you are trying to reach?

Agentic workflows are gaining traction, but they’re not easy to productionize. They often involve multiple LLM calls, external tools, and search APIs, which makes them:

- Hard to observe: it’s not clear what the agent is doing at each step.
- Fragile: if one step fails, the whole run usually has to restart.
- Expensive to scale: repeated retries and full restarts waste both time and money.

For teams that want to build serious deep research or multi-step reasoning agents, these gaps make it tough to move from prototype to production.



#### Describe the solution you would like

I’d like to contribute a deep research blueprint that runs on EKS, using Flyte 2.0 as the orchestration layer.

This blueprint would:

- Show why EKS is a great fit for scaling these workloads in a containerized environment.
- Demonstrate how Flyte 2.0 adds resilience with retries, recoverability from failures e.g. tool calls, and observability built in so agentic workflows don’t feel like black boxes. 
- Highlight how users can author workflows in Python with all the convenience of dynamic programming, while still benefiting from production-grade guarantees.
- Use an open-source stack (Flyte 2.0, Together AI's Open Deep Research) to make it reproducible and easy for others to adopt.

Users will be able to run a deep research agent on EKS that can scale, recover from failures without starting over, and expose rich logs/metrics for every step. It also broadens the scope of AI on EKS by showing how orchestration + agents fit into the story, which is highly relevant to how people are experimenting with LLMs today.



#### Describe alternatives you have considered



#### Additional context

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Blueprint Proposal: Production-grade Agentic Deep Research Workflow on EKS with Flyte 2.0 #159

Community Note

What is the outcome that you are trying to reach?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Blueprint Proposal: Production-grade Agentic Deep Research Workflow on EKS with Flyte 2.0 #159

Description

Community Note

What is the outcome that you are trying to reach?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions