feat(mcp): MCP Service POC - Phase 1#33976
feat(mcp): MCP Service POC - Phase 1#33976aminghadersohi wants to merge 158 commits intoapache:masterfrom
Conversation
|
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #33976 +/- ##
===========================================
+ Coverage 0 70.66% +70.66%
===========================================
Files 0 592 +592
Lines 0 42601 +42601
Branches 0 4446 +4446
===========================================
+ Hits 0 30106 +30106
- Misses 0 11351 +11351
- Partials 0 1144 +1144
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
69ecee7 to
2705bfc
Compare
dff2f3a to
ffcb221
Compare
|
|
||
| **Inputs:** | ||
| - `filters`: `Optional[List[DashboardFilter]]` — List of filter objects | ||
| - `columns`: `Optional[List[str]]` — Columns to include in the response |
There was a problem hiding this comment.
Is the LLM good at choosing what columns will be required? Would returning a set fixed of columns that cover most relevant use cases be a better option here?
There was a problem hiding this comment.
The current approach we are going is the default response is limited to a set of columns that are set as default in the tool by us. Perhaps a prompt tool can help figure out to augment these columns if needed based on the user's need?
One thing I ran into is that its not possible to get a pydantic schema used by an mcp server to only include the attributes that are non null - at least i am still working on figuring that out. pydantic/pydantic#5461
| datasource_type: Literal["table"] = Field("table", description="Datasource type (usually 'table')") | ||
| metrics: List[str] = Field(..., description="List of metric names to display") | ||
| dimensions: List[str] = Field(..., description="List of dimension (column) names to group by") | ||
| filters: Optional[List[Dict[str, Any]]] = Field(None, description="List of filter objects (column, operator, value)") |
There was a problem hiding this comment.
These can get pretty complex. Do we have a sense of how good the LLM is good with these without additional prompting techniques?
There was a problem hiding this comment.
seems given the pydantic schemas kept as static as possible using concrete types instead of dicts and string where possible, that it does figure things out well but i did notice a couple of instances where it added extra attributes that didnt exist, received an error and then corrected it.
I gave it "list all dashboards with z at the end the name " and it ran this:
{
"filters": [
{
"col": "dashboard_title",
"opr": "ilike",
"value": "%z"
}
],
"select_columns": [
"id",
"dashboard_title"
],
"order_column": "dashboard_title",
"order_direction": "asc",
"page": 1,
"page_size": 100
}
"list all datasets related to population"
{
"filters": [
{
"col": "table_name",
"opr": "ilike",
"value": "%population%"
}
],
"select_columns": [
"id",
"table_name"
],
"order_column": "table_name",
"order_direction": "asc",
"page": 1,
"page_size": 100
}
42f23de to
ddf7f49
Compare
superset/mcp_service/mcp_app.py
Outdated
|
|
||
| mcp = FastMCP( | ||
| "Superset MCP Server", | ||
| instructions=""" |
There was a problem hiding this comment.
how can we hook auth for example: BearerAuthProvider ? is it possible to use add_middleware later on init_fastmcp_server?
There was a problem hiding this comment.
Great question about auth integration!
Yes, our MCP service architecture is designed to support BearerAuthProvider integration. Looking at the FastMCP Bearer auth docs, we can integrate it in two ways:
Option 1: Server initialization (cleanest approach)
auth = BearerAuthProvider(
jwks_uri=os.getenv("MCP_JWKS_URI"),
issuer=os.getenv("MCP_JWT_ISSUER"),
audience="superset-mcp-server"
)
mcp = FastMCP("Superset MCP Server", auth=auth)
Option 2: Environment-based configuration
Since our server is already modular with middleware support, we can add auth as an optional feature
controlled by environment variables, making it easy to enable/disable per deployment.
The BearerAuthProvider supports JWT validation via JWKS endpoints, which aligns well with enterprise
SSO systems. We'd get access to user context via get_access_token() in our tools for fine-grained
permissions.
There was a problem hiding this comment.
I like option 1, can you make it configurable also? similar to: https://github.com/apache/superset/blob/master/superset/config.py#L1526.
Actually, we can't make it exactly like that since __init__ can take different parameters, WDYT about using a configurable factory function?
def create_auth(app):
jwks_uri = app.config["MCP_JWKS_URI"]
issuer = app.config["MCP_JWT_ISSUER"]
audience = app.config["MCP_JWT_AUDIENCE"]
return BearerAuthProvider(
jwks_uri=jwks_uri,
issuer=issuer,
audience=audience
)config.py
MCP_AUTH_FACTORY: Callable[[Flask], Any] = create_authThen:
mcp = FastMCP("Superset MCP Server", auth=app.config["MCP_AUTH_FACTORY"](app))There was a problem hiding this comment.
@dpgaspar Thanks for the excellent suggestion! I've implemented the configurable factory pattern as you outlined.
The MCP service now uses app.config["MCP_AUTH_FACTORY"](app) to create the auth provider, following the same pattern as MACHINE_AUTH_PROVIDER_CLASS.
Implementation details:
- Default factory in superset/mcp_service/config.py:
def create_default_mcp_auth_factory(app: Flask) -> Optional[Any]:
"""Default MCP auth factory that uses app.config values."""
if not app.config.get("MCP_AUTH_ENABLED", False):
return None
jwks_uri = app.config.get("MCP_JWKS_URI")
# ... create and return BearerAuthProvider
- Usage in the MCP service initialization:
auth_factory = app.config.get("MCP_AUTH_FACTORY")
if auth_factory and callable(auth_factory):
return auth_factory(app)
- User configuration in superset_config.py:
# Simple approach - just set values
MCP_AUTH_ENABLED = True
MCP_JWKS_URI = "https://your-provider.com/.well-known/jwks.json"
# Or provide custom factory
def create_auth(app):
jwks_uri = app.config["MCP_JWKS_URI"]
return BearerAuthProvider(jwks_uri=jwks_uri, ...)
MCP_AUTH_FACTORY = create_auth
superset/mcp_service/auth.py
Outdated
| def wrapper(*args: Any, **kwargs: Any) -> Any: | ||
| # --- Setup user context (was _setup_user_context) --- | ||
| admin_username = current_app.config.get("MCP_ADMIN_USERNAME", "admin") | ||
| admin_user = security_manager.get_user_by_username(admin_username) |
There was a problem hiding this comment.
To my understanding, by default full admin is given as a sudoer also, no auth process is inplace, for example JWT verification, using JWT claims to get user info.
Can you give an example on how can this be implemented?
There was a problem hiding this comment.
Excellent observation! You're absolutely right - the current implementation hardcodes admin access without proper JWT verification. Here's how we can implement proper JWT-based authentication with user claims:
Option 1: FastMCP Bearer Token Integration
def get_user_from_request() -> Any:
"""Extract user from JWT token claims via FastMCP's BearerAuthProvider."""
from fastmcp.auth import get_access_token
from superset.extensions import security_manager
try:
# Get validated JWT token from FastMCP auth
access_token = get_access_token()
# Extract user identifier from JWT claims
username = access_token.subject # or access_token.client_id
user_email = access_token.claims.get("email")
# Look up actual Superset user
user = security_manager.get_user_by_username(username)
if not user and user_email:
user = security_manager.get_user_by_email(user_email)
return user or AnonymousUserMixin()
except Exception:
# Fallback to anonymous if no valid token
return AnonymousUserMixin()
Option 2: Enhanced RBAC with Scope-Based Permissions
def has_permission(user: Any, tool_func: Any) -> bool:
"""Check permissions using JWT scopes + Superset RBAC."""
from fastmcp.auth import get_access_token
try:
access_token = get_access_token()
user_scopes = access_token.scopes
# Map tool functions to required scopes
required_scopes = {
'list_dashboards': ['dashboard:read'],
'create_chart': ['chart:write'],
'get_dataset_info': ['dataset:read']
}
tool_name = tool_func.__name__
if required := required_scopes.get(tool_name):
if not any(scope in user_scopes for scope in required):
return False
# Also check Superset's native RBAC
return user and hasattr(user, 'is_active') and user.is_active
except Exception:
# No token = anonymous user, check Superset perms only
return user and hasattr(user, 'is_active') and user.is_active
Updated Wrapper Implementation
@functools.wraps(tool_func)
def wrapper(*args: Any, **kwargs: Any) -> Any:
# Get authenticated user from JWT (replaces hardcoded admin)
user = get_user_from_request()
# Set Flask context with actual authenticated user
g.user = user
# Apply impersonation if requested and allowed
if run_as := kwargs.get("run_as"):
user = impersonate_user(user, run_as)
# Check both JWT scopes and Superset RBAC
if not has_permission(user, tool_func):
raise PermissionError(
f"User {getattr(user, 'username', 'anonymous')} lacks permission for
{tool_func.__name__}"
)
# Enhanced audit logging with JWT context
log_access(user, tool_func.__name__, args, kwargs)
return tool_func(*args, **kwargs)
This approach removes the hardcoded admin escalation and uses actual JWT-validated user identity with
proper scope-based authorization!
There was a problem hiding this comment.
Both are valid approaches.
Superset auth is integrated with multiple providers and scenarios, so it's important to guarantee that this is configurable and flexible
There was a problem hiding this comment.
@dpgaspar Thanks for the feedback on authentication flexibility! I've already implemented the
JWT-based authentication as discussed, and now made it fully configurable as you suggested.
What's implemented:
- JWT User Authentication - No more hardcoded admin:
- Extracts user from JWT token claims via FastMCP's get_access_token()
- Maps JWT identity to actual Superset users in the database
- Configurable user resolver for different JWT claim structures - Scope-Based Permissions:
- JWT scopes mapped to tool permissions (e.g., dashboard:read, chart:write)
- Falls back gracefully when no JWT is present (dev mode)
- Enhanced audit logging with JWT context - Full Configurability via superset_config.py:
Configure auth factory
MCP_AUTH_FACTORY = my_custom_auth_factory
Configure how to extract username from JWT
def custom_user_resolver(access_token):
# Handle your specific JWT structure
return access_token.payload.get('preferred_username')
MCP_USER_RESOLVER = custom_user_resolver
Configure scopes, fallback users, etc.
MCP_REQUIRED_SCOPES = ["superset:read", "superset:admin"]
MCP_DEV_USERNAME = "dev_user" # For local development
- Flexible Integration:
- Works with any JWT provider (Auth0, Keycloak, Okta, etc.)
- Supports both JWKS and direct public key validation
- Compatible with Superset's existing auth providers
aa4d7d6 to
01a0f8b
Compare
839bfed to
1a100f1
Compare
|
Would ❤️ instructions as to how to set this up locally to try it, maybe in the PR body, or better, in docs, maybe under |
|
oh also if/when you rebase I could add a new service in the new |
5f86802 to
714e21b
Compare
- Rename mcp_proxy.py → mcp_http_proxy.py for clarity - Update all imports and blueprint references - Simplify config_mcp.py for Preset-hosted environment: - Remove unnecessary settings handled by infrastructure - Keep only essential MCP connection settings - Remove CORS, auth, and advanced multi-tenant configs - Update related shell scripts and proxy files
Co-authored-by: bito-code-review[bot] <188872107+bito-code-review[bot]@users.noreply.github.com>
- Bump fastmcp version to match working environment - Add commented staging URL for future reference
80712b0 to
dd4e92a
Compare
- Downgrade fastmcp from >=2.12.3 to >=2.10.0 for compatibility - Convert async resource to sync in chart_configs.py - Update list_charts default page_size from 100 to 10 - Simplify chart generation docstrings and remove verbose documentation - Replace structured logging with string formatting for better performance - Add unit tests for list_charts, update_chart, and update_chart_preview - Update test schemas for dataset listing
MCP service was not properly setting Flask-Login's current_user after JWT authentication, causing RBAC filters to use the wrong user context. This resulted in users with limited permissions seeing all datasets instead of only those they have access to. The fix calls login_user() after authentication and impersonation to ensure current_user matches g.user, allowing DatasourceFilter and other RBAC filters to work correctly with JWT-based authentication.
…icated user" This reverts commit dcfe536.
fix test script
|
@aminghadersohi |
…data chore: Add form data to generate_chart tool response
|
closing this pr in favor of #35877 |
PR Summary: MCP Service Implementation
SUMMARY
This PR implements Phase 1 of the Model Context Protocol (MCP) service for Apache Superset, as outlined in SIP-171. The MCP service provides programmatic access to Superset dashboards, charts, datasets, and instance metadata via standardized tools and schemas.
Available Tools (20 total):
list_dashboards,get_dashboard_info,get_dashboard_available_filters,generate_dashboard,add_chart_to_existing_dashboardlist_datasets,get_dataset_info,get_dataset_available_filterslist_charts,get_chart_info,get_chart_available_filters,generate_chart,update_chart,update_chart_preview,get_chart_data,get_chart_previewgenerate_explore_linkget_superset_instance_infoopen_sql_lab_with_context,execute_sqlKey Features:
streamable-httptransportBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
Development Environment Setup:
git checkout mcp_service_amin_dev make mcp-setup # Complete setup (deps, DB, frontend)Start Services:
make mcp-run # Starts Superset if needed, then MCP serviceHealth Check:
make mcp-check # Verify configuration and servicesMCP Client Configuration:
Desktop App:
Add to
~/Library/Application Support/Claude/claude_desktop_config.json:{ "mcpServers": { "superset": { "command": "bash", "args": ["/path/to/superset/superset/mcp_service/scripts/run.sh"], "env": {"PYTHONPATH": "."} } } }Web Interface:
http://localhost:5008(Streamable HTTP)MCP Inspector:
Testing:
Cleanup:
make mcp-stop # Stop background processesADDITIONAL INFORMATION