-
Notifications
You must be signed in to change notification settings - Fork 15.4k
feat(mcp): MCP Service - Phase 1 #33976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(mcp): MCP Service - Phase 1 #33976
Conversation
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"timestamp": datetime.now(timezone.utc) | ||
} | ||
serialized_response = serialize_mcp_response(response_data, MCPHealthResponseSchema) | ||
return jsonify(serialized_response), 503 |
Check warning
Code scanning / CodeQL
Information exposure through an exception
"timestamp": datetime.now(timezone.utc) | ||
} | ||
serialized_error = serialize_mcp_response(error_data, MCPErrorResponseSchema) | ||
return jsonify(serialized_error), 500 |
Check warning
Code scanning / CodeQL
Information exposure through an exception
"timestamp": datetime.now(timezone.utc) | ||
} | ||
serialized_error = serialize_mcp_response(error_data, MCPErrorResponseSchema) | ||
return jsonify(serialized_error), 500 |
Check warning
Code scanning / CodeQL
Information exposure through an exception
|
||
The Superset MCP (Model Context Protocol) service provides programmatic access to Superset dashboards through both REST API and FastMCP interfaces. | ||
|
||
## Architecture Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend mermaid
here, it's becoming well supported in markdown and we have some in our docs. AI should be able to generate a diagram that's easier to edit/maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asked gpt to translate to mermaid, not sure if it's 100% accurate:
flowchart TB
subgraph MCP_Service["MCP Service"]
direction TB
subgraph Flask_Stack[" "]
FS["Flask Server (Port 5008)"]
FRest["REST API Endpoints\n• /health\n• /list_dashboards\n• /dashboard/<id>"]
FAPI["API Layer (api/v1/)\n• Authentication\n• Request/Response\n• Error handling"]
FS --> FRest --> FAPI
end
subgraph FastMCP_Stack[" "]
FM["FastMCP Server (Port 5009)"]
FTools["FastMCP Tools\n• list_dashboards\n• get_dashboard\n• health_check"]
FClient["HTTP Client (requests)\n• Internal API calls\n• JSON parsing"]
FM --> FTools --> FClient
end
subgraph Proxy_Stack[" "]
PR["Proxy Scripts"]
PRRest["run_proxy.sh\n• Local proxy for free users"]
PRCore["simple_proxy.py\n• Background proxy process"]
PR --> PRRest --> PRCore
end
FAPI --> SupersetCore
FClient --> SupersetCore
PRCore --> SupersetCore
end
subgraph SupersetCore["Superset Core"]
DB["Database (SQLAlchemy)"]
Models["Models\n(Dashboard, Chart, etc.)"]
DAOs["DAOs"]
DB --> Models --> DAOs
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #33976 +/- ##
===========================================
+ Coverage 60.48% 71.23% +10.74%
===========================================
Files 1931 567 -1364
Lines 76236 41371 -34865
Branches 8568 4342 -4226
===========================================
- Hits 46114 29471 -16643
+ Misses 28017 10799 -17218
+ Partials 2105 1101 -1004
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -131,6 +131,7 @@ solr = ["sqlalchemy-solr >= 0.2.0"] | |||
elasticsearch = ["elasticsearch-dbapi>=0.2.9, <0.3.0"] | |||
exasol = ["sqlalchemy-exasol >= 2.4.0, <3.0"] | |||
excel = ["xlrd>=1.2.0, <1.3"] | |||
fastmcp = ["fastmcp>=2.8.1"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that we can probably add fastmcp
to requirements/development.in
-> https://github.com/apache/superset/blob/master/requirements/development.in#L19
Once you add it there, you have to run this script to pin it in requirements/development.txt
-> https://github.com/apache/superset/blob/master/scripts/uv-pip-compile.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any feature that we are planning to use from fastmcp that does not come with https://github.com/modelcontextprotocol/python-sdk out of the box? I think it's good to have it, just double checking if we are planning to use any of the additional features such as auth, clients, server proxying and composition, generating servers from REST APIs, dynamic tool rewriting, built-in testing tools, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have some requirements around logging, testing, and authentication, as described in the SIP-171. Features like pluggable auth hooks, audit logging, and robust testing are important for our use case. That said, I’m open to approaches—if the python-sdk evolves to support these needs, we could certainly consider it. For now, FastMCP (or a similarly feature-rich server) seems like the best fit, but I’m happy to revisit as things progress. Appreciate your thoughtfulness on this!
|
||
@mcp_api.route("/list_dashboards", methods=["GET", "POST"]) | ||
@requires_api_key | ||
def list_dashboards(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here I'm wondering how we allow a handful of useful filter params so the LLM can apply filters in larger environments and/or page through things without blowing up the context window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nevermind, I see lower in the code you parse the request object to get filter params. Now wondering how the LLM discovers the expected schema for the tool
, guessing I'll bump into it as I read through this PR ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so right now this double api thing, with rest and fastmcp both having similar functionality and running two separate processes basically even if we call it internally, I am thinking if all the functionality/magic that fab is taking care of can be done/refactored into the DAOs we can essentially get rid of these endpoints and just have it become the "core" module.
|
||
@mcp.tool() | ||
def list_dashboards( | ||
filters: Optional[List[Dict[str, Any]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh here it is :)
69ecee7
to
2705bfc
Compare
"timestamp": datetime.now(timezone.utc) | ||
} | ||
serialized_error = serialize_mcp_response(error_data, MCPErrorResponseSchema) | ||
return jsonify(serialized_error), 500 |
Check warning
Code scanning / CodeQL
Information exposure through an exception
dff2f3a
to
ffcb221
Compare
## How to Add a New Tool | ||
|
||
1. **Choose the Right Domain** | ||
- Place your tool in the appropriate subfolder under `tools/` (e.g., `tools/chart/`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the considerations about splitting into tools / resources / prompts. I guess if the need should arise to have that separation of concerns, then a structure like the following might help?
chart/tools
chart/resources
chart/prompts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have made the change in my working PR and will push changes up soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
### get_dashboard_info | ||
|
||
**Inputs:** | ||
- `dashboard_id`: `int` — Dashboard ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLMs are known to have some issues with handling IDs. Would using the slug instead of an ID be more LLM-friendly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for dashboard it makes sense, would we need both id and slug? Although for dataset and chart there is no slug right?
|
||
**Inputs:** | ||
- `filters`: `Optional[List[DashboardFilter]]` — List of filter objects | ||
- `columns`: `Optional[List[str]]` — Columns to include in the response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the LLM good at choosing what columns will be required? Would returning a set fixed of columns that cover most relevant use cases be a better option here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach we are going is the default response is limited to a set of columns that are set as default in the tool by us. Perhaps a prompt tool can help figure out to augment these columns if needed based on the user's need?
One thing I ran into is that its not possible to get a pydantic schema used by an mcp server to only include the attributes that are non null - at least i am still working on figuring that out. pydantic/pydantic#5461
@@ -131,6 +131,7 @@ solr = ["sqlalchemy-solr >= 0.2.0"] | |||
elasticsearch = ["elasticsearch-dbapi>=0.2.9, <0.3.0"] | |||
exasol = ["sqlalchemy-exasol >= 2.4.0, <3.0"] | |||
excel = ["xlrd>=1.2.0, <1.3"] | |||
fastmcp = ["fastmcp>=2.8.1"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any feature that we are planning to use from fastmcp that does not come with https://github.com/modelcontextprotocol/python-sdk out of the box? I think it's good to have it, just double checking if we are planning to use any of the additional features such as auth, clients, server proxying and composition, generating servers from REST APIs, dynamic tool rewriting, built-in testing tools, etc.?
datasource_type: Literal["table"] = Field("table", description="Datasource type (usually 'table')") | ||
metrics: List[str] = Field(..., description="List of metric names to display") | ||
dimensions: List[str] = Field(..., description="List of dimension (column) names to group by") | ||
filters: Optional[List[Dict[str, Any]]] = Field(None, description="List of filter objects (column, operator, value)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can get pretty complex. Do we have a sense of how good the LLM is good with these without additional prompting techniques?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems given the pydantic schemas kept as static as possible using concrete types instead of dicts and string where possible, that it does figure things out well but i did notice a couple of instances where it added extra attributes that didnt exist, received an error and then corrected it.
I gave it "list all dashboards with z at the end the name " and it ran this:
{
"filters": [
{
"col": "dashboard_title",
"opr": "ilike",
"value": "%z"
}
],
"select_columns": [
"id",
"dashboard_title"
],
"order_column": "dashboard_title",
"order_direction": "asc",
"page": 1,
"page_size": 100
}
"list all datasets related to population"
{
"filters": [
{
"col": "table_name",
"opr": "ilike",
"value": "%population%"
}
],
"select_columns": [
"id",
"table_name"
],
"order_column": "table_name",
"order_direction": "asc",
"page": 1,
"page_size": 100
}


42f23de
to
ddf7f49
Compare
|
||
mcp = FastMCP( | ||
"Superset MCP Server", | ||
instructions=""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we hook auth for example: BearerAuthProvider
? is it possible to use add_middleware
later on init_fastmcp_server
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question about auth integration!
Yes, our MCP service architecture is designed to support BearerAuthProvider integration. Looking at the FastMCP Bearer auth docs, we can integrate it in two ways:
Option 1: Server initialization (cleanest approach)
auth = BearerAuthProvider(
jwks_uri=os.getenv("MCP_JWKS_URI"),
issuer=os.getenv("MCP_JWT_ISSUER"),
audience="superset-mcp-server"
)
mcp = FastMCP("Superset MCP Server", auth=auth)
Option 2: Environment-based configuration
Since our server is already modular with middleware support, we can add auth as an optional feature
controlled by environment variables, making it easy to enable/disable per deployment.
The BearerAuthProvider supports JWT validation via JWKS endpoints, which aligns well with enterprise
SSO systems. We'd get access to user context via get_access_token() in our tools for fine-grained
permissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like option 1, can you make it configurable also? similar to: https://github.com/apache/superset/blob/master/superset/config.py#L1526.
Actually, we can't make it exactly like that since __init__
can take different parameters, WDYT about using a configurable factory function?
def create_auth(app):
jwks_uri = app.config["MCP_JWKS_URI"]
issuer = app.config["MCP_JWT_ISSUER"]
audience = app.config["MCP_JWT_AUDIENCE"]
return BearerAuthProvider(
jwks_uri=jwks_uri,
issuer=issuer,
audience=audience
)
config.py
MCP_AUTH_FACTORY: Callable[[Flask], Any] = create_auth
Then:
mcp = FastMCP("Superset MCP Server", auth=app.config["MCP_AUTH_FACTORY"](app))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar Thanks for the excellent suggestion! I've implemented the configurable factory pattern as you outlined.
The MCP service now uses app.config["MCP_AUTH_FACTORY"](app)
to create the auth provider, following the same pattern as MACHINE_AUTH_PROVIDER_CLASS.
Implementation details:
- Default factory in superset/mcp_service/config.py:
def create_default_mcp_auth_factory(app: Flask) -> Optional[Any]:
"""Default MCP auth factory that uses app.config values."""
if not app.config.get("MCP_AUTH_ENABLED", False):
return None
jwks_uri = app.config.get("MCP_JWKS_URI")
# ... create and return BearerAuthProvider
- Usage in the MCP service initialization:
auth_factory = app.config.get("MCP_AUTH_FACTORY")
if auth_factory and callable(auth_factory):
return auth_factory(app)
- User configuration in superset_config.py:
# Simple approach - just set values
MCP_AUTH_ENABLED = True
MCP_JWKS_URI = "https://your-provider.com/.well-known/jwks.json"
# Or provide custom factory
def create_auth(app):
jwks_uri = app.config["MCP_JWKS_URI"]
return BearerAuthProvider(jwks_uri=jwks_uri, ...)
MCP_AUTH_FACTORY = create_auth
superset/mcp_service/auth.py
Outdated
def wrapper(*args: Any, **kwargs: Any) -> Any: | ||
# --- Setup user context (was _setup_user_context) --- | ||
admin_username = current_app.config.get("MCP_ADMIN_USERNAME", "admin") | ||
admin_user = security_manager.get_user_by_username(admin_username) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, by default full admin is given as a sudoer
also, no auth process is inplace, for example JWT verification, using JWT claims to get user info.
Can you give an example on how can this be implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent observation! You're absolutely right - the current implementation hardcodes admin access without proper JWT verification. Here's how we can implement proper JWT-based authentication with user claims:
Option 1: FastMCP Bearer Token Integration
def get_user_from_request() -> Any:
"""Extract user from JWT token claims via FastMCP's BearerAuthProvider."""
from fastmcp.auth import get_access_token
from superset.extensions import security_manager
try:
# Get validated JWT token from FastMCP auth
access_token = get_access_token()
# Extract user identifier from JWT claims
username = access_token.subject # or access_token.client_id
user_email = access_token.claims.get("email")
# Look up actual Superset user
user = security_manager.get_user_by_username(username)
if not user and user_email:
user = security_manager.get_user_by_email(user_email)
return user or AnonymousUserMixin()
except Exception:
# Fallback to anonymous if no valid token
return AnonymousUserMixin()
Option 2: Enhanced RBAC with Scope-Based Permissions
def has_permission(user: Any, tool_func: Any) -> bool:
"""Check permissions using JWT scopes + Superset RBAC."""
from fastmcp.auth import get_access_token
try:
access_token = get_access_token()
user_scopes = access_token.scopes
# Map tool functions to required scopes
required_scopes = {
'list_dashboards': ['dashboard:read'],
'create_chart': ['chart:write'],
'get_dataset_info': ['dataset:read']
}
tool_name = tool_func.__name__
if required := required_scopes.get(tool_name):
if not any(scope in user_scopes for scope in required):
return False
# Also check Superset's native RBAC
return user and hasattr(user, 'is_active') and user.is_active
except Exception:
# No token = anonymous user, check Superset perms only
return user and hasattr(user, 'is_active') and user.is_active
Updated Wrapper Implementation
@functools.wraps(tool_func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
# Get authenticated user from JWT (replaces hardcoded admin)
user = get_user_from_request()
# Set Flask context with actual authenticated user
g.user = user
# Apply impersonation if requested and allowed
if run_as := kwargs.get("run_as"):
user = impersonate_user(user, run_as)
# Check both JWT scopes and Superset RBAC
if not has_permission(user, tool_func):
raise PermissionError(
f"User {getattr(user, 'username', 'anonymous')} lacks permission for
{tool_func.__name__}"
)
# Enhanced audit logging with JWT context
log_access(user, tool_func.__name__, args, kwargs)
return tool_func(*args, **kwargs)
This approach removes the hardcoded admin escalation and uses actual JWT-validated user identity with
proper scope-based authorization!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are valid approaches.
Superset auth is integrated with multiple providers and scenarios, so it's important to guarantee that this is configurable and flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dpgaspar Thanks for the feedback on authentication flexibility! I've already implemented the
JWT-based authentication as discussed, and now made it fully configurable as you suggested.
What's implemented:
- JWT User Authentication - No more hardcoded admin:
- Extracts user from JWT token claims via FastMCP's get_access_token()
- Maps JWT identity to actual Superset users in the database
- Configurable user resolver for different JWT claim structures - Scope-Based Permissions:
- JWT scopes mapped to tool permissions (e.g., dashboard:read, chart:write)
- Falls back gracefully when no JWT is present (dev mode)
- Enhanced audit logging with JWT context - Full Configurability via superset_config.py:
Configure auth factory
MCP_AUTH_FACTORY = my_custom_auth_factory
Configure how to extract username from JWT
def custom_user_resolver(access_token):
# Handle your specific JWT structure
return access_token.payload.get('preferred_username')
MCP_USER_RESOLVER = custom_user_resolver
Configure scopes, fallback users, etc.
MCP_REQUIRED_SCOPES = ["superset:read", "superset:admin"]
MCP_DEV_USERNAME = "dev_user" # For local development
- Flexible Integration:
- Works with any JWT provider (Auth0, Keycloak, Okta, etc.)
- Supports both JWKS and direct public key validation
- Compatible with Superset's existing auth providers
…full test coverage - Introduced generic `list` and `count` methods to BaseDAO for consistent querying and counting across all DAOs. - Both methods support filtering (including IN queries), ordering, pagination, search across columns, custom FAB-style filters, and always-on base filters. - Added comprehensive unit tests for `list` and `count` in `tests/unit_tests/dao/base_test.py`, covering: - Filtering (including boolean, None, and IN queries) - Ordering (asc/desc, multiple columns) - Pagination (including out-of-range) - Search across columns - Custom filter logic - Always-on base filter logic - Edge cases and skip_base_filter - Moved common test fixtures to `conftest.py` for reuse.
…st coverage - Improved the BaseDAO class to robustly handle column operator logic, ensuring all supported operators (eq, ne, sw, ew, in, nin, gt, gte, lt, lte, like, ilike, is_null, is_not_null) are consistently applied via ColumnOperatorEnum. - Refactored the apply_column_operators and list methods for clarity and reliability, including better handling of columns, relationships, and search. - Removed 1 index base page handing from list
…sts to use mcp client directly in the tests as recommended
… test coverage - Updated DatasetInfo schema to include columns and metrics fields, with new TableColumnInfo and SqlMetricInfo models. - Updated serialize_dataset_object to serialize columns and metrics for each dataset. - Modified list_datasets tool to use serialize_dataset_object and include columns/metrics by default. - Improved and fixed all related unit tests to use proper MagicMock objects for columns/metrics and to parse JSON responses. - Ensured LLM/OpenAPI compatibility for dataset listing and info tools.
…ocs and tests - Updates create_chart logic to automatically remove x_axis from groupby for ECharts timeseries charts, preventing duplicate dimension usage. - Updates and expands unit test to verify x_axis is excluded from groupby, using improved test mocks for accurate backend simulation. - Updates documentation (README.md, README_ARCHITECTURE.md, README_PHASE1_STATUS.md, README_SCHEMAS.md) to clarify create_chart tool behavior and schema, including new groupby/x_axis handling. - No breaking changes to tool signatures; behavior is now more robust and LLM-friendly.
… and table charts. example prompt: - can you use superset dataset 2 to plot popular baby names in 2024 - plot airline delay by day of week and group by airline use dataset 6
…nerate link to explore can use its code and just pass false
- Replace individual parameters with structured request schemas for list_datasets, list_charts, and list_dashboards - Fix validation issues where LLMs passed arrays/objects as strings - Add ListDatasetsRequest, ListChartsRequest, ListDashboardsRequest schemas - Fix object serialization (charts, dashboards, datasets) - Add validation to prevent search+filters conflicts - Update all tests and fix linting issues Resolves string/array validation ambiguity that caused tool failures when LLMs sent complex parameters incorrectly.
**Multi-Identifier Support:** - Enhance ModelGetInfoTool to support ID, UUID, and slug lookup - Add intelligent identifier detection for get_*_info tools - Dashboards: support ID, UUID, and slug via id_or_slug_filter - Datasets/Charts: support ID and UUID via direct database queries - Add GetDashboardInfoRequest, GetDatasetInfoRequest, GetChartInfoRequest schemas **Enhanced Default Columns & Metadata:** - Add uuid to default columns for all list tools (dashboards, datasets, charts) - Include uuid/slug in search columns for better discoverability - Fix columns_requested to accurately reflect user input vs defaults - Fix columns_loaded to show actual DAO columns requested vs serialized fields **Testing & Code Quality:** - Add multi-identifier tests for all get_*_info tools (ID/UUID/slug scenarios) - Remove unused serialized_columns variable in ModelListTool - Fix linting issues (line length, docstrings) This provides flexible identifier support while ensuring accurate metadata tracking for better LLM compatibility and debugging.
…UUID/slug functionality - Add UUID field to ChartInfo and DatasetInfo Pydantic schemas for complete serialization - Include UUID in chart and dataset serialization functions (serialize_chart_object, serialize_dataset_object) - UUID and slug are now included in default response columns for better discoverability: * Dashboards: UUID and slug in DEFAULT_DASHBOARD_COLUMNS and returned by default * Charts: UUID in DEFAULT_CHART_COLUMNS and returned by default * Datasets: UUID in DEFAULT_DATASET_COLUMNS and returned by default - Search functionality enhanced to include UUID/slug fields across all relevant tools - Add comprehensive test coverage for UUID/slug functionality: * Default column verification tests ensuring UUID/slug are in default responses * Response data verification tests confirming UUID/slug values are returned * Custom column selection tests for explicit UUID/slug requests * Metadata accuracy tests verifying columns_requested/columns_loaded tracking - Update documentation to reflect enhanced multi-identifier capabilities - All 132 tests pass with comprehensive verification of UUID/slug support
aa4d7d6
to
01a0f8b
Compare
Implement comprehensive JWT authentication system for production deployments: • Add BearerAuthProvider integration with FastMCP for RS256 token validation • Support both static public keys and JWKS endpoints for enterprise key rotation • Implement scope-based authorization (dashboard:read, chart:write, etc.) • Extract user identity from JWT sub claim for proper audit trails • Maintain backward compatibility with admin fallback when auth disabled • Add comprehensive test coverage for authentication flows • Update documentation with human-friendly security setup guide Authentication is disabled by default for development convenience. Configure via environment variables for production use.
Replace MCPUser wrapper with actual Flask-AppBuilder User from database to enable proper RBAC permission filtering. Fixes MCP service returning 0 counts due to empty permission queries.
- Implement configurable auth factory pattern as suggested by @dpgaspar - Add unit test coverage - Refactor auth.py for better pythonic patterns and error handling - Add proper type annotations and logging throughout - Split complex functions into focused helper functions - Add detailed architecture documentation with Mermaid diagrams - Improve JWT token extraction with robust fallback logic
SUMMARY
This PR implements Phase 1 of the Model Context Protocol (MCP) service for Apache Superset, as outlined in SIP-171. The MCP service provides a modular, schema-driven interface for programmatic access to Superset dashboards, charts, datasets, and instance metadata, designed for LLM agents and automation tools.
Key Features Implemented:
Core Infrastructure:
streamable-http
transportsuperset mcp run
with host/port/debug optionsAvailable Tools:
list_dashboards
,get_dashboard_info
,get_dashboard_available_filters
list_datasets
,get_dataset_info
,get_dataset_available_filters
(includes columns and metrics)list_charts
,get_chart_info
,get_chart_available_filters
,create_chart
get_superset_instance_info
,generate_explore_link
Advanced Features:
ModelListTool
abstractionTableColumnInfo
,SqlMetricInfo
)@mcp.tool
and@mcp_auth_hook
decorators for registration and authLoggingMiddleware
) and access control (PrivateToolMiddleware
)Developer Experience:
Technical Details:
fastmcp>=2.8.1
dependencypytest-asyncio
for async test supportBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Install and Setup:
Run the MCP Service:
Run Tests:
Test Individual Tools:
list_dashboards
get_dataset_info
with valid dataset IDcreate_chart
with ECharts parametersget_superset_instance_info
ADDITIONAL INFORMATION