Skip to content

[Feature Request] Work Autonomously in a Sandbox #2336

@Bblumenberg

Description

@Bblumenberg

I would love to let Q autonomously work without me having to monitor it's progress for certain tasks that have an existing human-review step at the end. An example is giving Q a task from an existing backlog which is well documented with a strategy and acceptance criteria, asking it to analyze the problem, implement the solution, and raise a pull request for me to review. In order to do this autonomously, Q will need to take certain non-read-only actions autonomously, such as creating directories for the workspace, checking out packages, modifying files, and making git commits. I'm unwilling to grant a session blanket permission to trust all tools, due to the risk of a hallucination leading Q to do something unexpected outside the scope of it's task. As such, requesting the tools trust system be updated to allow broader tool-level trust, while applying some guardrail at the session level to create a "sandbox." A couple ways to think about implementing this:

  • Granular trust for specific tools: Similar to how execute_bash is by default trusted for readonly commands, allow it to also be trusted for specific things like making a git commit when the repository is under a certain file path. Similarly, trust fs_write but only for files under a certain path. These modifications are probably simpler to implement, and help achieve the goal, but might not be fully "complete." I'd still have to check in on the session occasionally to make sure it's not blocked by some tool I haven't granted permission for. This would be an acceptable tradeoff.
  • Automatically setup a chroot environment, virtual machine, or virtual container (i.e. using Docker) in which Q can work. This way, I can trust everything, and know that Q can't screw with anything outside of it's new sandbox environment. I imagine the challenge here would be in getting the right tools installed/exposed to that environment for Q to function properly and do what it needs to do, but it would likely provide a more complete solution for autonomous work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions