Data analysis agent integration #21706

qchiujunhao · 2026-01-31T02:06:21Z

Integrate the “chat with your data” interactive tool into the agent framework, using Pyodide to run the generated code.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.
Instructions for manual testing are as follows:
1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

Also some prettier fixes for ChatGXY.vue

…ting

…Planner

…ling; update GalaxyDSPyPlanner to provide execution success feedback.

- Updated agent import structure to allow optional dependencies without breaking imports. - Introduced a new ChatExecutionService to handle Pyodide execution results, separating concerns from the ChatAPI. - Enhanced metadata handling for Pyodide execution results, including improved artifact collection and follow-up response generation. - Added new Pydantic schemas for agent responses and execution results. - Implemented utility functions for inferring package requirements from Python code and merging execution metadata. - Added unit tests for new functionalities and ensured existing tests cover the refactored code paths.

…dling

client/src/components/ChatGXY/pyodide.worker.ts

+        currentIndexUrl = targetUrl;
+        pyodidePromise = (async () => {
+            if (!(self as any).loadPyodide) {
+                self.importScripts(`${targetUrl}/pyodide.js`);


client/src/components/ChatGXY/pyodide.worker.ts

+            if (!(self as any).loadPyodide) {
+                self.importScripts(`${targetUrl}/pyodide.js`);
+            } else if (targetUrl !== DEFAULT_INDEX_URL) {
+                self.importScripts(`${targetUrl}/pyodide.js`);


guerler · 2026-01-31T08:10:57Z

This should be implemented as a visualization. As written, it duplicates visualization functionality, introduces a parallel Pyodide integration, and couples dataset metadata with agent logic by embedding execution and analysis directly. Galaxy already has a visualization framework that provides these capabilities with clear separation and versioning. Duplicating this in core is unnecessary.

qchiujunhao · 2026-02-02T15:57:24Z

This should be implemented as a visualization. As written, it duplicates visualization functionality, introduces a parallel Pyodide integration, and couples dataset metadata with agent logic by embedding execution and analysis directly. Galaxy already has a visualization framework that provides these capabilities with clear separation and versioning. Duplicating this in core is unnecessary.

Thank you for your comment!

To clarify what this PR implements:

What This Is

The Data Analysis Agent is a general-purpose analytical assistant that generates and executes arbitrary Python code. It's part of the ChatGXY agent framework alongside ErrorAnalysisAgent, CustomToolAgent, GTNTrainingAgent, etc.

The fundamental difference here is one of intent. A visualization tool answers the question “what does this look like?” The Data Analysis Agent answers “what does this mean, and what should I do about it?”

When a researcher sits down with a dataset, they rarely know exactly what chart they want. They have questions: Is there a relationship between these variables? Are there outliers skewing my results? Which samples should I exclude? What preprocessing do I need before downstream analysis? This agent engages in that iterative, exploratory process — the same back-and-forth a researcher would have with a colleague or a statistician.

Concretely, the core requirements here are multi-turn context, an iterative generate → execute → observe → refine loop, and persisting results back to Galaxy history as first-class datasets/artifacts (and next, invoking Galaxy tools/workflows based on what’s discovered).

The agent handles the full spectrum of data analysis tasks — statistical analysis, data quality assessment, transformation/export, and ML preprocessing — through the same iterative loop and history-backed outputs.

The agent uses DSPy’s ReAct pattern for iterative reasoning: generate code → execute in Pyodide → observe stdout/stderr/artifacts → reason about results → generate follow-up code. This enables error recovery, result interpretation, and multi-step analysis workflows.

Galaxy Integration

Generated artifacts (plots, CSVs, processed data) are uploaded back to Galaxy history as datasets. Conversation context is preserved across exchanges, enabling multi-turn exploration:

“Load my RNA-seq data” → “Filter to significant genes” → “Show me the distribution” → “Export the filtered set”

The agent’s position within the ChatGXY framework is intentional. It shares infrastructure with agents that debug failed jobs, recommend tools, and find training materials. These agents are designed to work together — an analysis might surface a quality issue, which leads to a tool recommendation, which leads to a workflow invocation. That’s not a pipeline we can build with isolated plugins; it requires agents that understand Galaxy’s data model and can hand off context to each other.

Integration with the agent-operations branch will allow the agent to leverage Galaxy’s service layer directly:

“Run FastQC on this dataset” → agent invokes the tool
“This data has adapter contamination — clean it with Trimmomatic” → agent runs the workflow
“Upload these results to a new history” → agent creates history and datasets

This positions the Data Analysis Agent as a conversational interface to Galaxy’s full capabilities, not just an analysis endpoint.

On Pyodide

Pyodide here is a sandboxed runtime substrate for safely executing exploratory Python in-browser; it’s not intended to be a parallel visualization framework.

Both this PR and visualization plugins can use Pyodide. The Pyodide execution is a means to an end, not the end itself. Running Python in the browser gives us a sandboxed environment for exploratory code, but the real value is in the reasoning loop around it — and in the eventual ability to act on what’s discovered by invoking Galaxy’s actual tools and workflows. Shared initialization infrastructure could be explored later.

ChatGXY component refactoring is already underway to better separate concerns.

guerler · 2026-02-03T13:26:16Z

The agent’s reasoning, orchestration, and multi turn context are well placed in core. My concern is the client side execution. This PR introduces a duplicated Pyodide execution surface that already exists in the visualization framework, demonstrated in Vintent in a reusable and fully versioned way, and embeds a parallel runtime directly into core. That duplication is not required by the agent model. The agent can own reasoning and coordination while delegating browser side Pyodide execution to a visualization, and we should improve ChatGXY visualization integration rather than creating a second client execution surface in core.

qchiujunhao and others added 12 commits January 28, 2026 19:26

Data analysis and chat updates (squash)

4b61253

Some minor lint/typing fixes for ChatGXY component (several left)

c14fa61

initial implementation of FormData as the dataset selector for ChatGXY

a86b48c

Also some prettier fixes for ChatGXY.vue

Add DSPy optional dependency hook; chat dataset handling; backend lin…

8f0352f

…ting

Sort ChatGXY.vue imports

d04bdce

Add missing import for AgentResponse in base.py

be2784c

add ensurePendingPyodideTasks method to ChatGXY

594df1f

Implement caching for DSPy examples to optimize loading in GalaxyDSPy…

77da519

…Planner

Enhance DataAnalysisAgent to support original and sanitized code hand…

6a221f1

…ling; update GalaxyDSPyPlanner to provide execution success feedback.

update router prompt for data analysis agent

7fd838c

Enhance ActionCard styles for button visibility and text overflow han…

cd81338

…dling

github-actions bot added area/documentation area/UI-UX area/testing area/API area/util area/dependencies area/testing/integration labels Jan 31, 2026

github-actions bot added this to the 26.1 milestone Jan 31, 2026

github-advanced-security bot found potential problems Jan 31, 2026

View reviewed changes

qchiujunhao marked this pull request as draft January 31, 2026 02:55

ahmedhamidawan added the kind/feature label Jan 31, 2026

ahmedhamidawan self-assigned this Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data analysis agent integration #21706

Data analysis agent integration #21706

Uh oh!

qchiujunhao commented Jan 31, 2026

Uh oh!

Check warning

Check warning

guerler commented Jan 31, 2026

Uh oh!

qchiujunhao commented Feb 2, 2026

Uh oh!

guerler commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Data analysis agent integration #21706

Are you sure you want to change the base?

Data analysis agent integration #21706

Uh oh!

Conversation

qchiujunhao commented Jan 31, 2026

How to test the changes?

License

Uh oh!

Check warning

Uh oh!

Check warning

Uh oh!

guerler commented Jan 31, 2026

Uh oh!

qchiujunhao commented Feb 2, 2026

What This Is

Galaxy Integration

On Pyodide

Uh oh!

guerler commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants