Skip to content

Conversation

@qchiujunhao
Copy link
Contributor

Integrate the “chat with your data” interactive tool into the agent framework, using Pyodide to run the generated code.

image

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

qchiujunhao and others added 12 commits January 28, 2026 19:26
…ling; update GalaxyDSPyPlanner to provide execution success feedback.
- Updated agent import structure to allow optional dependencies without breaking imports.
- Introduced a new ChatExecutionService to handle Pyodide execution results, separating concerns from the ChatAPI.
- Enhanced metadata handling for Pyodide execution results, including improved artifact collection and follow-up response generation.
- Added new Pydantic schemas for agent responses and execution results.
- Implemented utility functions for inferring package requirements from Python code and merging execution metadata.
- Added unit tests for new functionalities and ensured existing tests cover the refactored code paths.
currentIndexUrl = targetUrl;
pyodidePromise = (async () => {
if (!(self as any).loadPyodide) {
self.importScripts(`${targetUrl}/pyodide.js`);

Check warning

Code scanning / CodeQL

Client-side URL redirect Medium

Untrusted URL redirection depends on a
user-provided value
.
if (!(self as any).loadPyodide) {
self.importScripts(`${targetUrl}/pyodide.js`);
} else if (targetUrl !== DEFAULT_INDEX_URL) {
self.importScripts(`${targetUrl}/pyodide.js`);

Check warning

Code scanning / CodeQL

Client-side URL redirect Medium

Untrusted URL redirection depends on a
user-provided value
.
@qchiujunhao qchiujunhao marked this pull request as draft January 31, 2026 02:55
@guerler
Copy link
Contributor

guerler commented Jan 31, 2026

This should be implemented as a visualization. As written, it duplicates visualization functionality, introduces a parallel Pyodide integration, and couples dataset metadata with agent logic by embedding execution and analysis directly. Galaxy already has a visualization framework that provides these capabilities with clear separation and versioning. Duplicating this in core is unnecessary.

@qchiujunhao
Copy link
Contributor Author

This should be implemented as a visualization. As written, it duplicates visualization functionality, introduces a parallel Pyodide integration, and couples dataset metadata with agent logic by embedding execution and analysis directly. Galaxy already has a visualization framework that provides these capabilities with clear separation and versioning. Duplicating this in core is unnecessary.

Thank you for your comment!

To clarify what this PR implements:

What This Is

The Data Analysis Agent is a general-purpose analytical assistant that generates and executes arbitrary Python code. It's part of the ChatGXY agent framework alongside ErrorAnalysisAgent, CustomToolAgent, GTNTrainingAgent, etc.

The fundamental difference here is one of intent. A visualization tool answers the question “what does this look like?” The Data Analysis Agent answers “what does this mean, and what should I do about it?”

When a researcher sits down with a dataset, they rarely know exactly what chart they want. They have questions: Is there a relationship between these variables? Are there outliers skewing my results? Which samples should I exclude? What preprocessing do I need before downstream analysis? This agent engages in that iterative, exploratory process — the same back-and-forth a researcher would have with a colleague or a statistician.

Concretely, the core requirements here are multi-turn context, an iterative generate → execute → observe → refine loop, and persisting results back to Galaxy history as first-class datasets/artifacts (and next, invoking Galaxy tools/workflows based on what’s discovered).

The agent handles the full spectrum of data analysis tasks — statistical analysis, data quality assessment, transformation/export, and ML preprocessing — through the same iterative loop and history-backed outputs.

The agent uses DSPy’s ReAct pattern for iterative reasoning: generate code → execute in Pyodide → observe stdout/stderr/artifacts → reason about results → generate follow-up code. This enables error recovery, result interpretation, and multi-step analysis workflows.

Galaxy Integration

Generated artifacts (plots, CSVs, processed data) are uploaded back to Galaxy history as datasets. Conversation context is preserved across exchanges, enabling multi-turn exploration:

“Load my RNA-seq data” → “Filter to significant genes” → “Show me the distribution” → “Export the filtered set”

The agent’s position within the ChatGXY framework is intentional. It shares infrastructure with agents that debug failed jobs, recommend tools, and find training materials. These agents are designed to work together — an analysis might surface a quality issue, which leads to a tool recommendation, which leads to a workflow invocation. That’s not a pipeline we can build with isolated plugins; it requires agents that understand Galaxy’s data model and can hand off context to each other.

Integration with the agent-operations branch will allow the agent to leverage Galaxy’s service layer directly:

  • “Run FastQC on this dataset” → agent invokes the tool
  • “This data has adapter contamination — clean it with Trimmomatic” → agent runs the workflow
  • “Upload these results to a new history” → agent creates history and datasets

This positions the Data Analysis Agent as a conversational interface to Galaxy’s full capabilities, not just an analysis endpoint.

On Pyodide

Pyodide here is a sandboxed runtime substrate for safely executing exploratory Python in-browser; it’s not intended to be a parallel visualization framework.

Both this PR and visualization plugins can use Pyodide. The Pyodide execution is a means to an end, not the end itself. Running Python in the browser gives us a sandboxed environment for exploratory code, but the real value is in the reasoning loop around it — and in the eventual ability to act on what’s discovered by invoking Galaxy’s actual tools and workflows. Shared initialization infrastructure could be explored later.

ChatGXY component refactoring is already underway to better separate concerns.

@guerler
Copy link
Contributor

guerler commented Feb 3, 2026

The agent’s reasoning, orchestration, and multi turn context are well placed in core. My concern is the client side execution. This PR introduces a duplicated Pyodide execution surface that already exists in the visualization framework, demonstrated in Vintent in a reusable and fully versioned way, and embeds a parallel runtime directly into core. That duplication is not required by the agent model. The agent can own reasoning and coordination while delegating browser side Pyodide execution to a visualization, and we should improve ChatGXY visualization integration rather than creating a second client execution surface in core.

@ahmedhamidawan ahmedhamidawan self-assigned this Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants