-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Data analysis agent integration #21706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Also some prettier fixes for ChatGXY.vue
…ling; update GalaxyDSPyPlanner to provide execution success feedback.
- Updated agent import structure to allow optional dependencies without breaking imports. - Introduced a new ChatExecutionService to handle Pyodide execution results, separating concerns from the ChatAPI. - Enhanced metadata handling for Pyodide execution results, including improved artifact collection and follow-up response generation. - Added new Pydantic schemas for agent responses and execution results. - Implemented utility functions for inferring package requirements from Python code and merging execution metadata. - Added unit tests for new functionalities and ensured existing tests cover the refactored code paths.
| currentIndexUrl = targetUrl; | ||
| pyodidePromise = (async () => { | ||
| if (!(self as any).loadPyodide) { | ||
| self.importScripts(`${targetUrl}/pyodide.js`); |
Check warning
Code scanning / CodeQL
Client-side URL redirect Medium
user-provided value
| if (!(self as any).loadPyodide) { | ||
| self.importScripts(`${targetUrl}/pyodide.js`); | ||
| } else if (targetUrl !== DEFAULT_INDEX_URL) { | ||
| self.importScripts(`${targetUrl}/pyodide.js`); |
Check warning
Code scanning / CodeQL
Client-side URL redirect Medium
user-provided value
|
This should be implemented as a visualization. As written, it duplicates visualization functionality, introduces a parallel Pyodide integration, and couples dataset metadata with agent logic by embedding execution and analysis directly. Galaxy already has a visualization framework that provides these capabilities with clear separation and versioning. Duplicating this in core is unnecessary. |
Thank you for your comment! To clarify what this PR implements: What This IsThe Data Analysis Agent is a general-purpose analytical assistant that generates and executes arbitrary Python code. It's part of the ChatGXY agent framework alongside ErrorAnalysisAgent, CustomToolAgent, GTNTrainingAgent, etc. The fundamental difference here is one of intent. A visualization tool answers the question “what does this look like?” The Data Analysis Agent answers “what does this mean, and what should I do about it?” When a researcher sits down with a dataset, they rarely know exactly what chart they want. They have questions: Is there a relationship between these variables? Are there outliers skewing my results? Which samples should I exclude? What preprocessing do I need before downstream analysis? This agent engages in that iterative, exploratory process — the same back-and-forth a researcher would have with a colleague or a statistician. Concretely, the core requirements here are multi-turn context, an iterative generate → execute → observe → refine loop, and persisting results back to Galaxy history as first-class datasets/artifacts (and next, invoking Galaxy tools/workflows based on what’s discovered). The agent handles the full spectrum of data analysis tasks — statistical analysis, data quality assessment, transformation/export, and ML preprocessing — through the same iterative loop and history-backed outputs. The agent uses DSPy’s ReAct pattern for iterative reasoning: generate code → execute in Pyodide → observe stdout/stderr/artifacts → reason about results → generate follow-up code. This enables error recovery, result interpretation, and multi-step analysis workflows. Galaxy IntegrationGenerated artifacts (plots, CSVs, processed data) are uploaded back to Galaxy history as datasets. Conversation context is preserved across exchanges, enabling multi-turn exploration: “Load my RNA-seq data” → “Filter to significant genes” → “Show me the distribution” → “Export the filtered set” The agent’s position within the ChatGXY framework is intentional. It shares infrastructure with agents that debug failed jobs, recommend tools, and find training materials. These agents are designed to work together — an analysis might surface a quality issue, which leads to a tool recommendation, which leads to a workflow invocation. That’s not a pipeline we can build with isolated plugins; it requires agents that understand Galaxy’s data model and can hand off context to each other. Integration with the agent-operations branch will allow the agent to leverage Galaxy’s service layer directly:
This positions the Data Analysis Agent as a conversational interface to Galaxy’s full capabilities, not just an analysis endpoint. On PyodidePyodide here is a sandboxed runtime substrate for safely executing exploratory Python in-browser; it’s not intended to be a parallel visualization framework. Both this PR and visualization plugins can use Pyodide. The Pyodide execution is a means to an end, not the end itself. Running Python in the browser gives us a sandboxed environment for exploratory code, but the real value is in the reasoning loop around it — and in the eventual ability to act on what’s discovered by invoking Galaxy’s actual tools and workflows. Shared initialization infrastructure could be explored later. ChatGXY component refactoring is already underway to better separate concerns. |
|
The agent’s reasoning, orchestration, and multi turn context are well placed in core. My concern is the client side execution. This PR introduces a duplicated Pyodide execution surface that already exists in the visualization framework, demonstrated in Vintent in a reusable and fully versioned way, and embeds a parallel runtime directly into core. That duplication is not required by the agent model. The agent can own reasoning and coordination while delegating browser side Pyodide execution to a visualization, and we should improve ChatGXY visualization integration rather than creating a second client execution surface in core. |
Integrate the “chat with your data” interactive tool into the agent framework, using Pyodide to run the generated code.
How to test the changes?
(Select all options that apply)
License