-
Notifications
You must be signed in to change notification settings - Fork 46.1k
feat(platform): add Human In The Loop block with review workflow #11380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
This commit implements a comprehensive Human In The Loop (HITL) block that allows agents to pause execution and wait for human approval/modification of data before continuing. Key features: - Added WAITING_FOR_REVIEW status to AgentExecutionStatus enum - Created PendingHumanReview database table for storing review requests - Implemented HumanInTheLoopBlock that extracts input data and creates review entries - Added API endpoints at /api/executions/review for fetching and reviewing pending data - Updated execution manager to properly handle waiting status and resume after approval - Created comprehensive frontend UI components: - PendingReviewCard for individual review handling - PendingReviewsList for multiple reviews - FloatingReviewsPanel for graph builder integration - Integrated review UI into 3 locations: legacy library, new library, and graph builder - Added proper type safety throughout with SafeJson handling - Optimized database queries using count functions instead of full data fetching 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
✅ Deploy Preview for auto-gpt-docs-dev canceled.
|
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a human-in-the-loop review flow: schema/model, data APIs, a HumanInTheLoop block that pauses execution to create/fetch reviews, executor integration to enqueue and resume executions, REST endpoints and frontend components to list and act on pending reviews. Changes
Sequence Diagram(s)sequenceDiagram
participant Graph as Graph Execution
participant Block as HumanInTheLoopBlock
participant DB as DatabaseManager / Data Layer
participant API as Review API
participant User as User (frontend)
Graph->>Block: execute(node_exec_id, input_data)
Block->>DB: get_or_create_human_review(user_id, node_exec_id, graph_exec_id, payload...)
DB-->>Block: ReviewResult | None (None => pending)
alt review pending
Block->>Graph: yield pause / set node status WAITING_FOR_REVIEW
Graph->>DB: has_pending_reviews_for_graph_exec(graph_exec_id)
Note right of DB: Frontend polls/queries DB via API
User->>API: GET /review/pending or /review/execution/{id}
API->>DB: get_pending_reviews_for_user / get_pending_reviews_for_execution
DB-->>API: PendingHumanReviewModel[]
User->>API: POST /review/{node_exec_id}/action (approve/reject)
API->>DB: update_review_action(...) --(atomic)-->
DB-->>API: updated review
API->>DB: update_graph_execution_stats / resume helper
DB-->>Graph: queue/resume node execution
else review ready
Block->>Graph: yield reviewed_data/status/review_message
end
Graph-->>User: execution updates / final status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
✅ Deploy Preview for auto-gpt-docs canceled.
|
|
Here's the code health analysis summary for commits Analysis Summary
|
|
Thank you for implementing the Human In The Loop block with review workflow! The feature looks comprehensive and well-designed, with all necessary components for both backend and frontend. The code follows good practices like proper typing and error handling. Before we can merge this PR, please:
Once the checklist is completed, we can proceed with the review of your implementation. |
|
Thank you for this comprehensive PR implementing the Human In The Loop (HITL) block! This is an excellent addition that will enable more interactive workflows. Feedback
Overall, this is a well-implemented feature. Please complete the test plan before this can be approved for merging. |
autogpt_platform/backend/backend/server/v2/executions/review/__init__.py
Outdated
Show resolved
Hide resolved
autogpt_platform/backend/backend/server/v2/executions/review/routes.py
Outdated
Show resolved
Hide resolved
autogpt_platform/backend/backend/server/v2/executions/review/routes.py
Outdated
Show resolved
Hide resolved
autogpt_platform/backend/backend/server/v2/executions/review/routes.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (20)
autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/SelectedRunView/components/RunStatusBadge.tsx (1)
39-43: Consider a distinct icon for better visual differentiation.The
WAITING_FOR_REVIEWstatus uses the samePauseCircleIconas theRUNNINGstatus (line 34), differing only in color (blue vs yellow). While the color distinction is present, using a unique icon might improve visual scanning and reduce potential confusion between these two states.Consider alternatives like
ClockIcon(already used for QUEUED but semantically fits "waiting"),EyeIcon, orUserCircleIconto better convey human review.autogpt_platform/backend/backend/blocks/human_in_the_loop.py (4)
17-51: Block docstring and IO schemas are clear; optional status typing refinementThe docstring and Input/Output schemas clearly express the HITL semantics and defaults. As a small type-safety improvement, you could narrow
statusfromstrto an enum/constant set (e.g., shared status constants or aLiteral["approved", "rejected"]) so downstream logic can rely on the exact allowed values.
83-108: Approved-path logic matches spec; consider guardingconvertfailuresThe APPROVED branch correctly:
- Retrieves the review for this node,
- Extracts the actual data from the stored JSON (with a reasonable fallback),
- Converts it back to the original input type, yields outputs, and then deletes the review.
Depending on how
convertbehaves when the reviewer significantly changes the shape/type ofdata, you may want to defensively handle conversion failures (e.g., wrap in try/except and either surface a clearer error or fall back to the rawapproved_data) to avoid opaque runtime errors.
110-117: Verify whetherreviewed_datashould be emitted on rejectionIn the REJECTED path you only yield
statusandreview_message, while the Output schema also definesreviewed_data. If the block framework or downstream consumers assume that all schema fields are present, this could be a subtle source of errors; if they treat missing fields as acceptable, it’s fine.Consider either:
- Yielding a sentinel value (e.g.,
Noneor the originalinput_data.data) forreviewed_dataon rejection, or- Explicitly documenting that
reviewed_datais only present whenstatus == "approved".
119-140: WAITING upsert behavior works but could be more status-awareCreating/updating a
PendingHumanReviewwithstatus: "WAITING"on first run matches the pause-and-review requirement. Two small refinements to consider:
- If an existing review already has
status == "WAITING", you can early-return instead of upserting again to avoid extra writes when the block is re-entered without any human action.- If additional statuses are ever added on the Prisma side (e.g., “IN_REVIEW”), this unconditional upsert would overwrite them. Making the upsert conditional on
not existing_revieworexisting_review.status == "WAITING"and using the generated Prisma enum/typed status constants instead of raw strings would reduce the chance of status drift or accidental overwrites.autogpt_platform/frontend/src/app/(platform)/build/components/legacy-builder/Flow/Flow.tsx (1)
1012-1015: Redundant undefined coercion.Line 1013 uses
executionId={flowExecutionID || undefined}, butflowExecutionIDis already typed asGraphExecutionID | undefined(line 110-112). The|| undefinedis redundant.Apply this diff to simplify:
<FloatingReviewsPanel - executionId={flowExecutionID || undefined} + executionId={flowExecutionID} className="fixed bottom-24 right-4" />autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/OldAgentLibraryView/components/agent-run-status-chip.tsx (1)
26-26: Note: Temporary status mapping causes label inconsistency.WAITING_FOR_REVIEW is mapped to "queued" which will display as "Queued" in this component, while other components (e.g., NodeExecutionBadge.tsx line 26) display "Waiting for Review". The TODO comment indicates this is temporary, but consider prioritizing a consistent label to avoid user confusion.
Consider adding a dedicated status entry for WAITING_FOR_REVIEW:
export const agentRunStatusMap: Record< GraphExecutionMeta["status"], AgentRunStatus > = { INCOMPLETE: "draft", COMPLETED: "success", FAILED: "failed", QUEUED: "queued", RUNNING: "running", TERMINATED: "stopped", - WAITING_FOR_REVIEW: "queued", // Map to queued for now + WAITING_FOR_REVIEW: "review", // TODO: implement "draft" - https://github.com/Significant-Gravitas/AutoGPT/issues/9168 };And update the statusData record:
const statusData: Record< AgentRunStatus, { label: string; variant: keyof typeof statusStyles } > = { // ... existing entries ... review: { label: "Waiting for Review", variant: "info" }, };autogpt_platform/backend/backend/executor/manager.py (1)
577-591: Make PendingHumanReview lookup more robust and observableThe PendingHumanReview check correctly drives
WAITING_FOR_REVIEWvsCOMPLETED, but a couple of improvements would make this safer:
- The broad
except Exceptioncurrently swallows all errors and silently treats the node asCOMPLETED, which can mask schema or connectivity issues and effectively disable HITL without any signal. At minimum, log the exception (e.g., vialog_metadata.exception(...)) before defaulting toCOMPLETED.- Importing and querying
PendingHumanReviewdirectly from Prisma here bypasses the existing database abstraction (DatabaseManagerAsyncClient). Consider exposing a small helper on the DB manager (e.g.,has_pending_human_review(node_exec_id)) and calling that instead, so executor code stays decoupled from Prisma models and field names.These changes preserve the graceful fallback to
COMPLETEDbut improve debuggability and maintainability.autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/SelectedRunView/SelectedRunView.tsx (1)
18-20: Hook integration for execution-specific pending reviews is soundUsing
usePendingReviewsForExecution(runId)here cleanly co-locates review data with the selected run view. Consider, as a small enhancement, surfacingerrorfrom the hook into the UI (e.g., a lightweight inline message in the Reviews tab) so users get feedback if the review API fails, instead of seeing an empty/unchanged state.Also applies to: 39-43
autogpt_platform/frontend/src/components/organisms/FloatingReviewsPanel/FloatingReviewsPanel.tsx (1)
1-93: Tighten hook usage and close logic in FloatingReviewsPanelOverall behavior is good, but a couple of small refinements would make this more robust:
usePendingReviewsForExecution(executionId || "")will still instantiate the underlying query even whenexecutionIdis undefined. It would be cleaner ifusePendingReviewsForExecutionitself treated a falsygraphExecIdas “no-op” (returning an empty list and no network call), so callers like this component don’t need to pass an empty string.- In
handleReviewComplete, you close the panel ifpendingReviews.length <= 1before the refetch result is applied. That’s usually fine, but if new reviews arrive concurrently, you could close while there are still pending reviews. If that matters, consider basing the close decision on the refetch result (e.g., via athencallback) instead of the pre-refetch length.These are minor and can be addressed later, but they’ll make the panel’s behavior more predictable around edge cases.
autogpt_platform/frontend/src/hooks/usePendingReviews.ts (1)
1-32: Hooks are clean; consider making executionId optional-friendlyBoth hooks nicely normalize the API response into
{ pendingReviews, isLoading, error, refetch }, which simplifies consumers. Given that some callers (e.g.,FloatingReviewsPanel) may not always have an execution ID, it could be useful to letusePendingReviewsForExecutionaccept an optionalgraphExecIdand short‑circuit to{ pendingReviews: [], isLoading: false, error: undefined }when it’s absent, instead of requiring callers to pass an empty string.autogpt_platform/frontend/src/components/organisms/PendingReviewCard/PendingReviewCard.tsx (2)
21-27: Clarify and centralize assumptions aboutreview.datashapeYou’re assuming
review.datais either a{ data, message }object or raw JSON, and duplicating shape checks in both the initialreviewDatastate and the “Instructions” section. This works, but it’s a bit fragile andany-heavy.Consider extracting a small helper/type guard (e.g.,
isStructuredReviewPayload(review.data)) that returns{ instructions, editableData }. That would:
- Avoid repeated
"data" in review.data/"message" in review.datachecks.- Make it easier to evolve the payload shape without touching multiple call sites.
Also applies to: 95-105
73-83: Default reject message may overwrite user intentFor rejects you always send a message, defaulting to
"Rejected by user"whenreviewMessageis empty. That’s fine as a UX choice, but it does mean the backend can’t distinguish “no comment” from a generic comment.If you want to preserve that distinction (for analytics or cleaner audit logs), consider sending
undefinedwhen the textarea is blank and having the backend fill in any default message.autogpt_platform/backend/schema.prisma (1)
42-63: PendingHumanReview schema and relations look coherent; consider enum forstatusThe new
PendingHumanReviewmodel and its relations toUser,AgentGraphExecution, andAgentNodeExecutionlook consistent:
@@index([userId, status])and@@index([graphExecId, status])align with the query patterns in the review routes.@@unique([nodeExecId])enforces the “one pending review per node execution” invariant nicely.- Adding
WAITING_FOR_REVIEWtoAgentExecutionStatusfits the new execution flow.One small improvement would be to model
PendingHumanReview.statusas an enum (e.g.,PendingHumanReviewStatus) instead of a bareString, to avoid typos and keep it aligned with the literals used in the API layer. This is optional but would tighten type-safety.Also applies to: 347-356, 359-406, 409-435, 473-495
autogpt_platform/backend/backend/server/v2/executions/review/routes.py (2)
22-57: Pending reviews listing endpoint matches schema and index usage
/review/pendingfilters on{ userId, status: "WAITING" }and orders bycreatedAt DESC, which lines up with the Prisma indices you added and the intended UX. The mapping intoPendingHumanReviewResponseis direct and clear.If you find yourself adding more endpoints that return this shape, consider a small helper to avoid repeating the same field mapping.
59-95: Execution-scoped pending listing is consistent with the global listing
/review/execution/{graph_exec_id}mirrors the global listing logic while scoping bygraphExecIdand sorting ascending bycreatedAt, which makes sense for a per-run review timeline. The response mapping again matchesPendingHumanReviewResponse.Same comment as above: you could factor the Prisma→response mapping into a shared function to reduce duplication.
autogpt_platform/frontend/src/app/api/openapi.json (4)
4756-4786: Add pagination and clarify response guarantees for pending reviews.
- Consider page and page_size query params to avoid unbounded payloads when many reviews are queued.
- Keep response typed, but document it returns only status=WAITING items to align with the path name.
Apply parameters:
"get": { "tags": ["v2", "executions", "execution-review", "private"], "summary": "Get Pending Reviews", "description": "Get all pending reviews for the current user.", "operationId": "getV2Get pending reviews", + "parameters": [ + { + "name": "page", + "in": "query", + "required": false, + "schema": { "type": "integer", "minimum": 1, "default": 1, "title": "Page" } + }, + { + "name": "page_size", + "in": "query", + "required": false, + "schema": { "type": "integer", "minimum": 1, "maximum": 100, "default": 25, "title": "Page Size" } + } + ],
4788-4835: Constrain path parameter type for execution id.Add an explicit format/pattern for graph_exec_id to improve validation and client generation.
{ "name": "graph_exec_id", "in": "path", "required": true, - "schema": { "type": "string", "title": "Graph Exec Id" } + "schema": { "type": "string", "format": "uuid", "title": "Graph Exec Id" } }
7997-8020: Tighten ReviewActionRequest: SafeJson + conditional required fields.
- Use SafeJson for reviewed_data instead of empty schema {}.
- Require reviewed_data when action=approve (and optionally allow message when reject).
"ReviewActionRequest": { "properties": { "action": { "type": "string", "enum": ["approve", "reject"], "title": "Action", "description": "Action to take" }, "reviewed_data": { - "anyOf": [{}, { "type": "null" }], + "anyOf": [ + { "$ref": "#/components/schemas/SafeJson" }, + { "type": "null" } + ], "title": "Reviewed Data", "description": "Modified data (only for approve action)" }, "message": { "anyOf": [{ "type": "string" }, { "type": "null" }], "title": "Message", "description": "Optional message from the reviewer" } }, "type": "object", - "required": ["action"], + "required": ["action"], + "allOf": [ + { + "if": { "properties": { "action": { "const": "approve" } }, "required": ["action"] }, + "then": { "required": ["action", "reviewed_data"] } + } + ], "title": "ReviewActionRequest", "description": "Request model for reviewing data." },
4756-4890: Normalize operationId casing across all endpoints—not just these three.Verification found 75+ operationIds with spaces throughout the entire spec, not just the three in this section. While the examples you cited (getV2Get pending reviews, getV2Get pending reviews for execution, postV2Review data) are confirmed, this is a systematic issue affecting many v1 and v2 endpoints.
Recommend a batch normalization pass across the entire openapi.json to convert all operationIds to camelCase or PascalCase without spaces (e.g., getV1InitiateOAuthFlow, postV2ReviewData, etc.). This will improve compatibility with code generators and Orval configurations.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (23)
autogpt_platform/backend/backend/blocks/human_in_the_loop.py(1 hunks)autogpt_platform/backend/backend/data/execution.py(3 hunks)autogpt_platform/backend/backend/executor/database.py(4 hunks)autogpt_platform/backend/backend/executor/manager.py(3 hunks)autogpt_platform/backend/backend/server/rest_api.py(2 hunks)autogpt_platform/backend/backend/server/v2/executions/review/model.py(1 hunks)autogpt_platform/backend/backend/server/v2/executions/review/routes.py(1 hunks)autogpt_platform/backend/schema.prisma(5 hunks)autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/Flow/Flow.tsx(2 hunks)autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeExecutionBadge.tsx(2 hunks)autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/helpers.ts(1 hunks)autogpt_platform/frontend/src/app/(platform)/build/components/legacy-builder/Flow/Flow.tsx(2 hunks)autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/RunsSidebar/components/RunListItem.tsx(1 hunks)autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/SelectedRunView/SelectedRunView.tsx(4 hunks)autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/SelectedRunView/components/RunStatusBadge.tsx(2 hunks)autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/OldAgentLibraryView/components/agent-run-status-chip.tsx(1 hunks)autogpt_platform/frontend/src/app/api/openapi.json(4 hunks)autogpt_platform/frontend/src/components/organisms/FloatingReviewsPanel/FloatingReviewsPanel.tsx(1 hunks)autogpt_platform/frontend/src/components/organisms/PendingReviewCard/PendingReviewCard.tsx(1 hunks)autogpt_platform/frontend/src/components/organisms/PendingReviewsList/PendingReviewsList.tsx(1 hunks)autogpt_platform/frontend/src/hooks/useAgentGraph.tsx(1 hunks)autogpt_platform/frontend/src/hooks/usePendingReviews.ts(1 hunks)autogpt_platform/frontend/src/lib/autogpt-server-api/types.ts(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: chromatic
- GitHub Check: test
- GitHub Check: types
- GitHub Check: test (3.11)
- GitHub Check: test (3.13)
- GitHub Check: test (3.12)
- GitHub Check: Check PR Status
🔇 Additional comments (21)
autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/SelectedRunView/components/RunStatusBadge.tsx (1)
80-82: LGTM! Appropriate text formatting for multi-word status.The conditional handling correctly displays "Waiting for Review" in a human-readable format, while preserving the existing lowercase transformation for other statuses. Combined with the
capitalizeclass on line 78, this will render as "Waiting For Review", which is appropriate for a multi-word status label.autogpt_platform/backend/backend/blocks/human_in_the_loop.py (2)
1-14: Imports and helpers look appropriateThe imported dependencies are minimal and aligned with the block’s responsibilities (Prisma model, block base classes,
SchemaField,SafeJson, andconvert); nothing appears unused or missing for this implementation.
53-70: Constructor wiring and test metadata look consistentThe block id, description, category, schemas, and test_input/test_output all align with the documented behavior of an approval path and should integrate cleanly with the existing block infrastructure.
autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/components/NodeExecutionBadge.tsx (1)
10-10: LGTM!The status styling and text rendering for WAITING_FOR_REVIEW is consistent with the existing status handling pattern.
Also applies to: 26-26
autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/Flow/Flow.tsx (1)
16-21: LGTM!The integration of FloatingReviewsPanel is clean and follows React best practices. Reading flowExecutionID from URL params and passing it to the panel component is the correct approach for this feature.
Also applies to: 77-77
autogpt_platform/frontend/src/app/(platform)/build/components/FlowEditor/nodes/CustomNode/helpers.ts (1)
7-7: LGTM!The status styling for WAITING_FOR_REVIEW follows the existing pattern correctly.
autogpt_platform/frontend/src/hooks/useAgentGraph.tsx (1)
360-366: LGTM! Status ranking logic correctly prioritizes WAITING_FOR_REVIEW.The updated status ranking makes sense for the review workflow:
- WAITING_FOR_REVIEW at rank 1 ensures it's prominently displayed when any node awaits review
- Higher priority than QUEUED (rank 2) is appropriate - review state needs more immediate attention
- All other statuses shift down by 1 to accommodate the new status
The exclusion of WAITING_FOR_REVIEW from the terminal status check (lines 549-559) is also correct, as this status represents a paused state rather than a completed execution.
autogpt_platform/frontend/src/lib/autogpt-server-api/types.ts (1)
280-281: LGTM!The addition of WAITING_FOR_REVIEW to the status union types is straightforward and enables the new review workflow status across the frontend.
Also applies to: 418-419
autogpt_platform/backend/backend/server/rest_api.py (2)
32-32: Review router import wiring looks consistentImporting
backend.server.v2.executions.review.routesalongside other v2 routers keeps server wiring centralized and consistent; no issues here.
290-294: Execution review router mount is correctMounting the review router under
prefix="/api/executions"with tags["v2", "executions"]matches existing naming and keeps review endpoints grouped logically under executions.autogpt_platform/backend/backend/executor/manager.py (2)
677-686: WAITING_FOR_REVIEW → RUNNING resume flow looks correctHandling
ExecutionStatus.WAITING_FOR_REVIEWby switching the graph back toRUNNINGand persisting the status viaupdate_graph_execution_stateis consistent with the newVALID_STATUS_TRANSITIONS(WAITING_FOR_REVIEW→RUNNING) and mirrors the existing resume paths for FAILED/TERMINATED.
1033-1041: Final graph status uses node-level WAITING_FOR_REVIEW appropriatelyDeriving the final graph execution status by checking
get_node_executions_count(..., statuses=[ExecutionStatus.WAITING_FOR_REVIEW])ensures that any node waiting for review pauses the entire graph (statusWAITING_FOR_REVIEW), while still returningCOMPLETEDwhen no such nodes exist. This is a clear and efficient integration point for the HITL workflow.autogpt_platform/frontend/src/app/(platform)/library/agents/[id]/components/AgentRunsView/components/SelectedRunView/SelectedRunView.tsx (1)
80-84: Conditional Reviews tab UX is reasonableGating both the “Reviews” tab trigger and its content on
pendingReviews.length > 0avoids showing an empty tab and keeps the experience simple; wiringonReviewComplete={refetchReviews}ensures the list stays in sync after actions. No functional issues here.Also applies to: 109-123
autogpt_platform/frontend/src/components/organisms/PendingReviewsList/PendingReviewsList.tsx (1)
1-50: PendingReviewsList component is well-scoped and typedThe list cleanly handles both the empty state and populated state, uses
PendingHumanReviewResponse[]for type safety, and forwardsonReviewCompleteto eachPendingReviewCardwith stablekey={review.id}. This is a solid reusable building block for review UIs.autogpt_platform/backend/backend/data/execution.py (1)
95-122: WAITING_FOR_REVIEW transitions and node count helper align with executor logicAdding
ExecutionStatus.WAITING_FOR_REVIEWas an allowed source forRUNNINGand as a target fromRUNNINGinVALID_STATUS_TRANSITIONSmatches the executor’s resume path and ensuresupdate_graph_execution_statsenforces the intended RUNNING ↔ WAITING_FOR_REVIEW lifecycle. The newget_node_executions_countmirrorsget_node_executionsfilter semantics (graph_exec_id/node_id/block_ids/statuses/time range) and is a good fit for cheap existence checks like the final graph status decision inExecutionProcessor._on_graph_execution. The explicit “no user_id check” docstring is also consistent with other internal-only helpers.Also applies to: 1009-1038
autogpt_platform/backend/backend/executor/database.py (4)
7-26: Newget_node_executions_countimport is consistent with data layer usageImporting
get_node_executions_countalongside the other execution helpers keeps this module aligned withbackend.data.execution; no issues spotted.
125-143: Service exposure ofget_node_executions_countmatches existing patternWiring
get_node_executions_count = _(get_node_executions_count)mirrors the other execution APIs and should integrate cleanly with the app service RPC mechanism.
200-210: Sync client exposure looks correct
DatabaseManagerClient.get_node_executions_count = _(d.get_node_executions_count)follows the sameendpoint_to_syncpattern as neighboring methods; no concerns.
241-252: Async client passthrough is consistent
DatabaseManagerAsyncClient.get_node_executions_count = d.get_node_executions_countmatches how other async methods are surfaced; this should be usable wherever async access is preferred.autogpt_platform/backend/backend/server/v2/executions/review/model.py (1)
7-41: Review models align with DB shape and API usage
PendingHumanReviewResponseandReviewActionRequestcleanly mirror the Prisma model and the frontend expectations (ids,data, status, timestamps, and optionalreviewed_data/message). Literal types for bothstatusandactionshould help catch misuse at compile time.Looks good.
autogpt_platform/backend/backend/server/v2/executions/review/routes.py (1)
97-167: Review mutation logic is sound; data patch behavior is well-definedThe approve/reject endpoint correctly:
- Ensures the review exists and belongs to the current user.
- Rejects non-WAITING reviews with a clear 400.
- For approve, conditionally patches only the
datafield of a structured payload (whenreview.datais a dict with adatakey), falling back to replacing the entire payload otherwise.- Persists the updated status, data, message, and
reviewedAttimestamp usingSafeJson.This gives a predictable “patch vs. replace” behavior and keeps the DB consistent with what the frontend sends.
No blocking issues here.
autogpt_platform/backend/backend/server/v2/executions/review/routes.py
Outdated
Show resolved
Hide resolved
...brary/agents/[id]/components/AgentRunsView/components/RunsSidebar/components/RunListItem.tsx
Outdated
Show resolved
Hide resolved
|
@ntindle thanks for the review, this PR is still a draft, I only vibecoded it and ask to create a PR so that rabbit can start reviewing it |
|
Same page |
…security and workflow improvements ## Summary - Complete implementation of Human In The Loop (HITL) block for pausing execution pending human review - Fix all critical security vulnerabilities and architectural issues identified in PR reviews - Refactor codebase to follow project patterns and improve maintainability ## Backend Changes - Add `ReviewStatus` enum to Prisma schema with proper indexing for performance - Implement comprehensive authorization checks with graph ownership verification - Add atomic database transactions to prevent race conditions - Fix critical `user_id=""` bug that prevented review resumption - Add `wasEdited` field to track data modifications during review - Implement proper input validation with size/depth limits to prevent DoS attacks - Create service layer separation between business logic and database operations - Fix Pydantic v2 validator compatibility issues - Add proper error handling and remove silent failures - Update execution status transitions to support WAITING_FOR_REVIEW state ## Frontend Changes - Fix WAITING_FOR_REVIEW color consistency across UI components (purple theme) - Add missing WAITING_FOR_REVIEW status handling in ActivityItem.tsx - Generate updated OpenAPI client with proper type safety - Remove unsafe `as any` type casting with proper type guards ## API Improvements - Add structured `ReviewActionResponse` model for better type generation - Implement comprehensive request validation with security checks - Add proper OpenAPI schema generation for better developer experience - Support both legacy and structured data formats for backward compatibility ## Security Enhancements - Add authorization checks to verify graph ownership before review access - Implement size limits (1MB) and nesting depth validation (10 levels) - Add SQL injection protection and input sanitization - Use atomic database operations to prevent concurrent modification issues ## Testing - Add comprehensive unit tests covering security, validation, and business logic - Test all edge cases including race conditions and data validation - Verify API endpoints with proper error handling and status codes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…ph_version parameter The HumanInTheLoopBlock test was failing because the test framework wasn't providing the required `graph_version` parameter that the block's run method expects. Changes: - Add `graph_version: 1` to test framework's `extra_exec_kwargs` in backend/util/test.py - Add test mocks for HumanInTheLoopBlock to avoid database/service dependencies during testing - Add conditional logic to use mocks in test environment while preserving production functionality The block now passes all tests while maintaining full production functionality for the human review workflow. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
|
Thanks for submitting this comprehensive PR implementing the Human In The Loop functionality! The feature looks well-designed with consideration for both backend and frontend integration. Missing Required ElementsBefore this PR can be approved, please add the standard PR checklist from the template. For a substantial code change like this, the checklist needs to be completed to ensure all quality steps have been followed. Technical ReviewThe implementation looks solid, with:
I particularly like the attention to security in the review validation logic - the checks for JSON serialization, size limits, and nesting depth are excellent practices. Once you add the required PR checklist and mark the appropriate items as completed, this PR should be good to go! |
|
You are nearing your monthly Qodo Merge usage quota. For more information, please visit here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
autogpt_platform/backend/backend/server/v2/executions/review/routes.py
Outdated
Show resolved
Hide resolved
|
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
…at/human-in-the-loop-block
|
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
…e UI updates ## Backend Changes - Fix HITL block to use proper async_update_node_execution_status wrapper for websocket events - Update human review data layer with batch processing for better performance - Add comprehensive test coverage for human review operations - Streamline review processing workflow for execution resumption ## Frontend Changes - Fix review status icon to use eyes instead of pause for better UX - Enable real-time execution status updates in both new and legacy Flow components - Pass execution status directly to FloatingReviewsPanel for immediate reactivity - Fix tab switching and visibility issues in review interfaces - Improve review workflow with proper status propagation ## Key Improvements - Real-time websocket updates ensure UI reflects REVIEW status immediately - Better separation between running and idle states (REVIEW is idle, not running) - Enhanced error handling and validation in review processing - Consistent execution status handling across different UI components 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
Thank you for this comprehensive implementation of the Human In The Loop functionality! This is a valuable addition to the platform that will enable more flexible agent workflows with user interaction. The code implementation looks thorough and well-designed, with proper separation of concerns across the backend and frontend components. I particularly appreciate:
Before merging, please complete the test plan checklist in the PR description to confirm that all the listed test scenarios have been verified:
Once those boxes are checked to confirm testing has been completed, this PR will be ready to merge. |
…tion - Restore store/routes.py delete endpoint to return boolean instead of StoreSubmission - Restore store/db.py delete function signature and error handling - These changes were accidentally included in HITL feature development - Only HITL-related functionality should be in this branch 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
Thank you for this comprehensive implementation of the Human In The Loop block! The feature looks well-designed and thoroughly implemented across backend, frontend, and database layers. However, before we can approve this PR, please add the standard PR checklist to your description and complete it. This checklist is required for all PRs that contain code changes. The checklist should include:
Once you've added the checklist and checked off the relevant items, this PR should be ready for approval. The actual implementation looks solid and well-tested. |
- Add HITL block ID (8b2a7b3c-6e9d-4a5f-8c1b-2e3f4a5b6c7d) to BETA_BLOCKS feature flag - Block will be hidden from users by default unless beta-blocks flag is configured - This allows controlled rollout of the Human-in-the-Loop functionality - Beta users can access the block while it's being tested and refined 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
Thank you for this comprehensive implementation of the Human In The Loop block! The code looks well-structured with proper handling of user permissions and a good separation of concerns between frontend and backend components. Before this can be merged, please complete the test plan checklist in your PR description by checking off the test items. All checkboxes in the PR description need to be checked for PR approval according to our guidelines. Otherwise, the implementation is solid with proper user_id validation in backend functions, clear UI components for handling reviews, and appropriate database schema changes. |
…-Gravitas/AutoGPT into feat/human-in-the-loop-block
- Backend: Fix message handling and unify API structure - Fix HITL block to properly yield review messages for both approved and rejected reviews - Unify review API structure with single `reviews` array using `approved` boolean field - Remove separate approved_reviews/rejected_review_ids in favor of cleaner unified approach - Frontend: Complete UI/UX overhaul for review interface - Replace plain JSON textarea with type-aware input components matching run dialog styling - Add "Approve All" and "Reject All" buttons with smart disabled states - Show rejection reason input only when excluding items (simplified UX) - Fix Reviews tab auto-population when execution status changes to REVIEW - Add proper local state management for real-time input updates - Use design system Input components for consistent rounded styling Key improvements: - No more JSON syntax errors for string inputs - Professional appearance matching platform standards - Intuitive workflow with conditional UI elements - Type-safe unified API structure - Real-time input updates with proper state management 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Remove reviewData prop from PendingReviewCard usage - Fix TypeScript error after prop removal - Component now extracts data directly from review payload 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…nents Remove unnecessary comments and simplify code across HITL components: - PendingReviewsList: Remove verbose comments, simplify logic flow - FloatingReviewsPanel: Remove excessive commenting - PendingReviewCard: Clean up type guard comments - usePendingReviews: Remove redundant JSDoc comments This improves code readability while maintaining all functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
This PR implements a well-designed Human In The Loop block with comprehensive integration across the platform. The code quality looks good, and I particularly appreciate the thorough test coverage. However, before we can merge this PR, please update the description to include the completed checklist from our PR template. This is required for all significant code changes. Please ensure all items in the checklist are checked off to confirm you've verified each requirement. Once the checklist is added and completed, this PR should be ready for approval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@majdyz We don’t need to make any changes in useFlow or useFlowRealtime — everything is already being saved in the node store automatically.
We just need to create a Review Panel component that imports nodeExecutionResult from the node store and checks whether it needs to be shown based on PendingReview boolean. Then, inside the same folder, we can use the autogenerated client to fetch the PendingHumanReview payloads using the node execution ID we get from the node store.
So apart from this component, we don’t need to change anything in the new builder’s code. The most important thing is that the architecture is modular — to add a new functionality, we only need to create a new file, and the existing infrastructure provides everything required. Since it uses autogenerated models, there’s no need to modify anything outside this file. If backend models change, the node store will store the updated model datatypes automatically.
Outside this new file, you only need to add two lines in:
to add the design for the review status in the custom node.
Summary
This PR implements a comprehensive Human In The Loop (HITL) block that allows agents to pause execution and wait for human approval/modification of data before continuing.
Screen.Recording.Nov.20.2025.from.Online.Video.Cutter.mp4
Key Features
Frontend Components
Technical Implementation
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit