Skip to content

feat(mcp): MCP Service - Phase 1 #33976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

aminghadersohi
Copy link

@aminghadersohi aminghadersohi commented Jun 30, 2025

SUMMARY

This PR implements Phase 1 of the Model Context Protocol (MCP) service for Apache Superset, as outlined in SIP-171. The MCP service provides a modular, schema-driven interface for programmatic access to Superset dashboards, charts, datasets, and instance metadata, designed for LLM agents and automation tools.

Key Features Implemented:

Core Infrastructure:

  • Standalone ASGI-based FastMCP server using streamable-http transport
  • CLI command superset mcp run with host/port/debug options
  • Modular tool architecture organized by domain (dashboard/, dataset/, chart/, system/)
  • Strong typing with Pydantic v2 schemas for all input/output
  • Auth/RBAC/logging hooks stubbed and ready for enterprise extension

Available Tools:

  • Dashboards: list_dashboards, get_dashboard_info, get_dashboard_available_filters
  • Datasets: list_datasets, get_dataset_info, get_dataset_available_filters (includes columns and metrics)
  • Charts: list_charts, get_chart_info, get_chart_available_filters, create_chart
  • System: get_superset_instance_info, generate_explore_link

Advanced Features:

  • Complex filtering and search across all listing tools using ModelListTool abstraction
  • Chart creation still under construction (the current schema for input turned out to be too complex, now going to make a simpler one)
  • Dataset responses now include full column and metric metadata (TableColumnInfo, SqlMetricInfo)
  • All tools use @mcp.tool and @mcp_auth_hook decorators for registration and auth
  • Middleware for logging (LoggingMiddleware) and access control (PrivateToolMiddleware)

Developer Experience:

  • Comprehensive unit tests (2,395 lines across 7 test files)
  • Detailed documentation (architecture, schemas, dev guides)
  • Easy setup and running via CLI
  • Extension points ready for Preset/enterprise integration

Technical Details:

  • Added fastmcp>=2.8.1 dependency
  • Added pytest-asyncio for async test support

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

PNG image PNG image

TESTING INSTRUCTIONS

  1. Install and Setup:

    python -m venv venv
    source venv/bin/activate
    pip install -r requirements/development.txt
    pip install -e .
  2. Run the MCP Service:

    superset mcp run --port 5008 --debug --sql-debug
  3. Run Tests:

    pytest tests/unit_tests/mcp_service/ --maxfail=1 -v
  4. Test Individual Tools:

    • Test dashboard listing: list_dashboards
    • Test dataset info with columns/metrics: get_dataset_info with valid dataset ID
    • Test chart creation: create_chart with ECharts parameters
    • Test system info: get_superset_instance_info

ADDITIONAL INFORMATION

  • Has associated issue: SIP-171: MCP Service Proposal
  • Required feature flags: None (service is standalone)
  • Changes UI: No UI changes
  • Includes DB Migration: No database changes
  • Introduces new feature or API: New MCP service with CLI and programmatic API
  • Removes existing feature or API: No removals

Copy link

korbit-ai bot commented Jun 30, 2025

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

"timestamp": datetime.now(timezone.utc)
}
serialized_response = serialize_mcp_response(response_data, MCPHealthResponseSchema)
return jsonify(serialized_response), 503

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.
"timestamp": datetime.now(timezone.utc)
}
serialized_error = serialize_mcp_response(error_data, MCPErrorResponseSchema)
return jsonify(serialized_error), 500

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.
"timestamp": datetime.now(timezone.utc)
}
serialized_error = serialize_mcp_response(error_data, MCPErrorResponseSchema)
return jsonify(serialized_error), 500

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.

The Superset MCP (Model Context Protocol) service provides programmatic access to Superset dashboards through both REST API and FastMCP interfaces.

## Architecture Overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend mermaid here, it's becoming well supported in markdown and we have some in our docs. AI should be able to generate a diagram that's easier to edit/maintain.

Copy link
Member

@mistercrunch mistercrunch Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asked gpt to translate to mermaid, not sure if it's 100% accurate:

flowchart TB
  subgraph MCP_Service["MCP Service"]
    direction TB

    subgraph Flask_Stack[" "]
      FS["Flask Server (Port 5008)"]
      FRest["REST API Endpoints\n• /health\n• /list_dashboards\n• /dashboard/<id>"]
      FAPI["API Layer (api/v1/)\n• Authentication\n• Request/Response\n• Error handling"]
      FS --> FRest --> FAPI
    end

    subgraph FastMCP_Stack[" "]
      FM["FastMCP Server (Port 5009)"]
      FTools["FastMCP Tools\n• list_dashboards\n• get_dashboard\n• health_check"]
      FClient["HTTP Client (requests)\n• Internal API calls\n• JSON parsing"]
      FM --> FTools --> FClient
    end

    subgraph Proxy_Stack[" "]
      PR["Proxy Scripts"]
      PRRest["run_proxy.sh\n• Local proxy for free users"]
      PRCore["simple_proxy.py\n• Background proxy process"]
      PR --> PRRest --> PRCore
    end

    FAPI --> SupersetCore
    FClient --> SupersetCore
    PRCore --> SupersetCore
  end

  subgraph SupersetCore["Superset Core"]
    DB["Database (SQLAlchemy)"]
    Models["Models\n(Dashboard, Chart, etc.)"]
    DAOs["DAOs"]
    DB --> Models --> DAOs
  end

Loading

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

codecov bot commented Jun 30, 2025

Codecov Report

Attention: Patch coverage is 0.84388% with 940 lines in your changes missing coverage. Please review.

Project coverage is 71.23%. Comparing base (76d897e) to head (dff2f3a).
Report is 2052 commits behind head on master.

Files with missing lines Patch % Lines
superset/mcp_service/api/v1/endpoints.py 0.00% 320 Missing ⚠️
superset/mcp_service/schemas.py 0.00% 237 Missing ⚠️
superset/mcp_service/fastmcp_server.py 0.00% 223 Missing ⚠️
superset/mcp_service/server.py 0.00% 60 Missing ⚠️
superset/daos/base.py 9.75% 37 Missing ⚠️
superset/mcp_service/simple_proxy.py 0.00% 36 Missing ⚠️
superset/cli/mcp.py 0.00% 16 Missing ⚠️
superset/mcp_service/api/__init__.py 0.00% 9 Missing ⚠️
superset/daos/dashboard.py 80.00% 1 Missing ⚠️
superset/mcp_service/__init__.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #33976       +/-   ##
===========================================
+ Coverage   60.48%   71.23%   +10.74%     
===========================================
  Files        1931      567     -1364     
  Lines       76236    41371    -34865     
  Branches     8568     4342     -4226     
===========================================
- Hits        46114    29471    -16643     
+ Misses      28017    10799    -17218     
+ Partials     2105     1101     -1004     
Flag Coverage Δ
hive 46.14% <0.84%> (-3.02%) ⬇️
javascript ?
mysql 70.24% <0.84%> (?)
postgres 70.30% <0.84%> (?)
presto 49.81% <0.84%> (-3.99%) ⬇️
python 71.19% <0.84%> (+7.69%) ⬆️
sqlite 69.83% <0.84%> (?)
unit 100.00% <ø> (+42.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@@ -131,6 +131,7 @@ solr = ["sqlalchemy-solr >= 0.2.0"]
elasticsearch = ["elasticsearch-dbapi>=0.2.9, <0.3.0"]
exasol = ["sqlalchemy-exasol >= 2.4.0, <3.0"]
excel = ["xlrd>=1.2.0, <1.3"]
fastmcp = ["fastmcp>=2.8.1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that we can probably add fastmcp to requirements/development.in -> https://github.com/apache/superset/blob/master/requirements/development.in#L19

Once you add it there, you have to run this script to pin it in requirements/development.txt -> https://github.com/apache/superset/blob/master/scripts/uv-pip-compile.sh

Copy link
Author

@aminghadersohi aminghadersohi Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any feature that we are planning to use from fastmcp that does not come with https://github.com/modelcontextprotocol/python-sdk out of the box? I think it's good to have it, just double checking if we are planning to use any of the additional features such as auth, clients, server proxying and composition, generating servers from REST APIs, dynamic tool rewriting, built-in testing tools, etc.?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have some requirements around logging, testing, and authentication, as described in the SIP-171. Features like pluggable auth hooks, audit logging, and robust testing are important for our use case. That said, I’m open to approaches—if the python-sdk evolves to support these needs, we could certainly consider it. For now, FastMCP (or a similarly feature-rich server) seems like the best fit, but I’m happy to revisit as things progress. Appreciate your thoughtfulness on this!


@mcp_api.route("/list_dashboards", methods=["GET", "POST"])
@requires_api_key
def list_dashboards():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I'm wondering how we allow a handful of useful filter params so the LLM can apply filters in larger environments and/or page through things without blowing up the context window.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nevermind, I see lower in the code you parse the request object to get filter params. Now wondering how the LLM discovers the expected schema for the tool, guessing I'll bump into it as I read through this PR ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so right now this double api thing, with rest and fastmcp both having similar functionality and running two separate processes basically even if we call it internally, I am thinking if all the functionality/magic that fab is taking care of can be done/refactored into the DAOs we can essentially get rid of these endpoints and just have it become the "core" module.


@mcp.tool()
def list_dashboards(
filters: Optional[List[Dict[str, Any]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh here it is :)

@aminghadersohi aminghadersohi changed the title Superset MCP Service - Initial Commit DRAFT feat(mcp): add initial MCP service scaffold with dashboard listing Jul 1, 2025
@aminghadersohi aminghadersohi force-pushed the mcp_service_amin_dev branch from 69ecee7 to 2705bfc Compare July 2, 2025 01:08
"timestamp": datetime.now(timezone.utc)
}
serialized_error = serialize_mcp_response(error_data, MCPErrorResponseSchema)
return jsonify(serialized_error), 500

Check warning

Code scanning / CodeQL

Information exposure through an exception

[Stack trace information](1) flows to this location and may be exposed to an external user.
@aminghadersohi aminghadersohi force-pushed the mcp_service_amin_dev branch from dff2f3a to ffcb221 Compare July 7, 2025 11:48
## How to Add a New Tool

1. **Choose the Right Domain**
- Place your tool in the appropriate subfolder under `tools/` (e.g., `tools/chart/`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the considerations about splitting into tools / resources / prompts. I guess if the need should arise to have that separation of concerns, then a structure like the following might help?

chart/tools
chart/resources
chart/prompts

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have made the change in my working PR and will push changes up soon

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### get_dashboard_info

**Inputs:**
- `dashboard_id`: `int` — Dashboard ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLMs are known to have some issues with handling IDs. Would using the slug instead of an ID be more LLM-friendly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for dashboard it makes sense, would we need both id and slug? Although for dataset and chart there is no slug right?


**Inputs:**
- `filters`: `Optional[List[DashboardFilter]]` — List of filter objects
- `columns`: `Optional[List[str]]` — Columns to include in the response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the LLM good at choosing what columns will be required? Would returning a set fixed of columns that cover most relevant use cases be a better option here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach we are going is the default response is limited to a set of columns that are set as default in the tool by us. Perhaps a prompt tool can help figure out to augment these columns if needed based on the user's need?
One thing I ran into is that its not possible to get a pydantic schema used by an mcp server to only include the attributes that are non null - at least i am still working on figuring that out. pydantic/pydantic#5461

@@ -131,6 +131,7 @@ solr = ["sqlalchemy-solr >= 0.2.0"]
elasticsearch = ["elasticsearch-dbapi>=0.2.9, <0.3.0"]
exasol = ["sqlalchemy-exasol >= 2.4.0, <3.0"]
excel = ["xlrd>=1.2.0, <1.3"]
fastmcp = ["fastmcp>=2.8.1"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any feature that we are planning to use from fastmcp that does not come with https://github.com/modelcontextprotocol/python-sdk out of the box? I think it's good to have it, just double checking if we are planning to use any of the additional features such as auth, clients, server proxying and composition, generating servers from REST APIs, dynamic tool rewriting, built-in testing tools, etc.?

datasource_type: Literal["table"] = Field("table", description="Datasource type (usually 'table')")
metrics: List[str] = Field(..., description="List of metric names to display")
dimensions: List[str] = Field(..., description="List of dimension (column) names to group by")
filters: Optional[List[Dict[str, Any]]] = Field(None, description="List of filter objects (column, operator, value)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can get pretty complex. Do we have a sense of how good the LLM is good with these without additional prompting techniques?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems given the pydantic schemas kept as static as possible using concrete types instead of dicts and string where possible, that it does figure things out well but i did notice a couple of instances where it added extra attributes that didnt exist, received an error and then corrected it.
I gave it "list all dashboards with z at the end the name " and it ran this:

{
  "filters": [
    {
      "col": "dashboard_title",
      "opr": "ilike",
      "value": "%z"
    }
  ],
  "select_columns": [
    "id",
    "dashboard_title"
  ],
  "order_column": "dashboard_title",
  "order_direction": "asc",
  "page": 1,
  "page_size": 100
}

"list all datasets related to population"

{
  "filters": [
    {
      "col": "table_name",
      "opr": "ilike",
      "value": "%population%"
    }
  ],
  "select_columns": [
    "id",
    "table_name"
  ],
  "order_column": "table_name",
  "order_direction": "asc",
  "page": 1,
  "page_size": 100
}
Screenshot 2025-07-16 at 11 05 41 AM Screenshot 2025-07-16 at 11 06 08 AM

@aminghadersohi aminghadersohi force-pushed the mcp_service_amin_dev branch 2 times, most recently from 42f23de to ddf7f49 Compare July 16, 2025 13:11

mcp = FastMCP(
"Superset MCP Server",
instructions="""
Copy link
Member

@dpgaspar dpgaspar Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we hook auth for example: BearerAuthProvider ? is it possible to use add_middleware later on init_fastmcp_server?

docs: https://gofastmcp.com/servers/auth/bearer

Copy link
Author

@aminghadersohi aminghadersohi Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question about auth integration!

Yes, our MCP service architecture is designed to support BearerAuthProvider integration. Looking at the FastMCP Bearer auth docs, we can integrate it in two ways:

Option 1: Server initialization (cleanest approach)

auth = BearerAuthProvider(
    jwks_uri=os.getenv("MCP_JWKS_URI"),
    issuer=os.getenv("MCP_JWT_ISSUER"),
    audience="superset-mcp-server"
)

mcp = FastMCP("Superset MCP Server", auth=auth)

Option 2: Environment-based configuration
Since our server is already modular with middleware support, we can add auth as an optional feature
controlled by environment variables, making it easy to enable/disable per deployment.

The BearerAuthProvider supports JWT validation via JWKS endpoints, which aligns well with enterprise
SSO systems. We'd get access to user context via get_access_token() in our tools for fine-grained
permissions.

Copy link
Member

@dpgaspar dpgaspar Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like option 1, can you make it configurable also? similar to: https://github.com/apache/superset/blob/master/superset/config.py#L1526.
Actually, we can't make it exactly like that since __init__ can take different parameters, WDYT about using a configurable factory function?

def create_auth(app):
    jwks_uri = app.config["MCP_JWKS_URI"]
    issuer = app.config["MCP_JWT_ISSUER"]
    audience = app.config["MCP_JWT_AUDIENCE"]

    return BearerAuthProvider(
        jwks_uri=jwks_uri,
        issuer=issuer,
        audience=audience
    )

config.py

MCP_AUTH_FACTORY: Callable[[Flask], Any] = create_auth

Then:

mcp = FastMCP("Superset MCP Server", auth=app.config["MCP_AUTH_FACTORY"](app))

Copy link
Author

@aminghadersohi aminghadersohi Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar Thanks for the excellent suggestion! I've implemented the configurable factory pattern as you outlined.

The MCP service now uses app.config["MCP_AUTH_FACTORY"](app) to create the auth provider, following the same pattern as MACHINE_AUTH_PROVIDER_CLASS.

Implementation details:

  1. Default factory in superset/mcp_service/config.py:
  def create_default_mcp_auth_factory(app: Flask) -> Optional[Any]:
      """Default MCP auth factory that uses app.config values."""
      if not app.config.get("MCP_AUTH_ENABLED", False):
          return None

      jwks_uri = app.config.get("MCP_JWKS_URI")
      # ... create and return BearerAuthProvider
  1. Usage in the MCP service initialization:
  auth_factory = app.config.get("MCP_AUTH_FACTORY")
  if auth_factory and callable(auth_factory):
      return auth_factory(app)
  1. User configuration in superset_config.py:
  # Simple approach - just set values
  MCP_AUTH_ENABLED = True
  MCP_JWKS_URI = "https://your-provider.com/.well-known/jwks.json"

  # Or provide custom factory
  def create_auth(app):
      jwks_uri = app.config["MCP_JWKS_URI"]
      return BearerAuthProvider(jwks_uri=jwks_uri, ...)

  MCP_AUTH_FACTORY = create_auth

def wrapper(*args: Any, **kwargs: Any) -> Any:
# --- Setup user context (was _setup_user_context) ---
admin_username = current_app.config.get("MCP_ADMIN_USERNAME", "admin")
admin_user = security_manager.get_user_by_username(admin_username)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, by default full admin is given as a sudoer also, no auth process is inplace, for example JWT verification, using JWT claims to get user info.

Can you give an example on how can this be implemented?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent observation! You're absolutely right - the current implementation hardcodes admin access without proper JWT verification. Here's how we can implement proper JWT-based authentication with user claims:

Option 1: FastMCP Bearer Token Integration

  def get_user_from_request() -> Any:
      """Extract user from JWT token claims via FastMCP's BearerAuthProvider."""
      from fastmcp.auth import get_access_token
      from superset.extensions import security_manager

      try:
          # Get validated JWT token from FastMCP auth
          access_token = get_access_token()

          # Extract user identifier from JWT claims
          username = access_token.subject  # or access_token.client_id
          user_email = access_token.claims.get("email")

          # Look up actual Superset user
          user = security_manager.get_user_by_username(username)
          if not user and user_email:
              user = security_manager.get_user_by_email(user_email)

          return user or AnonymousUserMixin()

      except Exception:
          # Fallback to anonymous if no valid token
          return AnonymousUserMixin()

Option 2: Enhanced RBAC with Scope-Based Permissions

  def has_permission(user: Any, tool_func: Any) -> bool:
      """Check permissions using JWT scopes + Superset RBAC."""
      from fastmcp.auth import get_access_token

      try:
          access_token = get_access_token()
          user_scopes = access_token.scopes

          # Map tool functions to required scopes
          required_scopes = {
              'list_dashboards': ['dashboard:read'],
              'create_chart': ['chart:write'],
              'get_dataset_info': ['dataset:read']
          }

          tool_name = tool_func.__name__
          if required := required_scopes.get(tool_name):
              if not any(scope in user_scopes for scope in required):
                  return False

          # Also check Superset's native RBAC
          return user and hasattr(user, 'is_active') and user.is_active

      except Exception:
          # No token = anonymous user, check Superset perms only
          return user and hasattr(user, 'is_active') and user.is_active

Updated Wrapper Implementation

  @functools.wraps(tool_func)
  def wrapper(*args: Any, **kwargs: Any) -> Any:
      # Get authenticated user from JWT (replaces hardcoded admin)
      user = get_user_from_request()

      # Set Flask context with actual authenticated user
      g.user = user

      # Apply impersonation if requested and allowed
      if run_as := kwargs.get("run_as"):
          user = impersonate_user(user, run_as)

      # Check both JWT scopes and Superset RBAC
      if not has_permission(user, tool_func):
          raise PermissionError(
              f"User {getattr(user, 'username', 'anonymous')} lacks permission for
  {tool_func.__name__}"
          )

      # Enhanced audit logging with JWT context
      log_access(user, tool_func.__name__, args, kwargs)
      return tool_func(*args, **kwargs)

This approach removes the hardcoded admin escalation and uses actual JWT-validated user identity with
proper scope-based authorization!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are valid approaches.
Superset auth is integrated with multiple providers and scenarios, so it's important to guarantee that this is configurable and flexible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpgaspar Thanks for the feedback on authentication flexibility! I've already implemented the
JWT-based authentication as discussed, and now made it fully configurable as you suggested.

What's implemented:

  1. JWT User Authentication - No more hardcoded admin:
    - Extracts user from JWT token claims via FastMCP's get_access_token()
    - Maps JWT identity to actual Superset users in the database
    - Configurable user resolver for different JWT claim structures
  2. Scope-Based Permissions:
    - JWT scopes mapped to tool permissions (e.g., dashboard:read, chart:write)
    - Falls back gracefully when no JWT is present (dev mode)
    - Enhanced audit logging with JWT context
  3. Full Configurability via superset_config.py:

Configure auth factory

  MCP_AUTH_FACTORY = my_custom_auth_factory

Configure how to extract username from JWT

  def custom_user_resolver(access_token):
      # Handle your specific JWT structure
      return access_token.payload.get('preferred_username')

  MCP_USER_RESOLVER = custom_user_resolver

Configure scopes, fallback users, etc.

  MCP_REQUIRED_SCOPES = ["superset:read", "superset:admin"]
  MCP_DEV_USERNAME = "dev_user"  # For local development
  1. Flexible Integration:
    - Works with any JWT provider (Auth0, Keycloak, Okta, etc.)
    - Supports both JWKS and direct public key validation
    - Compatible with Superset's existing auth providers

@aminghadersohi aminghadersohi changed the title feat(mcp): add initial MCP service scaffold with dashboard listing feat(mcp): MCP Service - Phase 1 Jul 23, 2025
…full test coverage

- Introduced generic `list` and `count` methods to BaseDAO for consistent querying and counting across all DAOs.
- Both methods support filtering (including IN queries), ordering, pagination, search across columns, custom FAB-style filters, and always-on base filters.
- Added comprehensive unit tests for `list` and `count` in `tests/unit_tests/dao/base_test.py`, covering:
  - Filtering (including boolean, None, and IN queries)
  - Ordering (asc/desc, multiple columns)
  - Pagination (including out-of-range)
  - Search across columns
  - Custom filter logic
  - Always-on base filter logic
  - Edge cases and skip_base_filter
- Moved common test fixtures to `conftest.py` for reuse.
…st coverage

- Improved the BaseDAO class to robustly handle column operator logic, ensuring all supported operators (eq, ne, sw, ew, in, nin, gt, gte, lt, lte, like, ilike, is_null, is_not_null) are consistently applied via ColumnOperatorEnum.
- Refactored the apply_column_operators and list methods for clarity and reliability, including better handling of columns, relationships, and search.
- Removed 1 index base page handing from list
…sts to use mcp client directly in the tests as recommended
… test coverage

- Updated DatasetInfo schema to include columns and metrics fields, with new TableColumnInfo and SqlMetricInfo models.
- Updated serialize_dataset_object to serialize columns and metrics for each dataset.
- Modified list_datasets tool to use serialize_dataset_object and include columns/metrics by default.
- Improved and fixed all related unit tests to use proper MagicMock objects for columns/metrics and to parse JSON responses.
- Ensured LLM/OpenAPI compatibility for dataset listing and info tools.
…ocs and tests

- Updates create_chart logic to automatically remove x_axis from groupby for ECharts timeseries charts, preventing duplicate dimension usage.
- Updates and expands unit test to verify x_axis is excluded from groupby, using improved test mocks for accurate backend simulation.
- Updates documentation (README.md, README_ARCHITECTURE.md, README_PHASE1_STATUS.md, README_SCHEMAS.md) to clarify create_chart tool behavior and schema, including new groupby/x_axis handling.
- No breaking changes to tool signatures; behavior is now more robust and LLM-friendly.
… and table charts.

example prompt:
- can you use superset dataset 2 to plot popular baby names in 2024

- plot airline delay by day of week and group by airline use dataset 6
…nerate link to explore can use its code and just pass false
  - Replace individual parameters with structured request schemas for list_datasets, list_charts, and
  list_dashboards
  - Fix validation issues where LLMs passed arrays/objects as strings
  - Add ListDatasetsRequest, ListChartsRequest, ListDashboardsRequest schemas
  - Fix object serialization (charts, dashboards, datasets)
  - Add validation to prevent search+filters conflicts
  - Update all tests and fix linting issues

  Resolves string/array validation ambiguity that caused tool failures when LLMs sent complex
  parameters incorrectly.
  **Multi-Identifier Support:**
  - Enhance ModelGetInfoTool to support ID, UUID, and slug lookup
  - Add intelligent identifier detection for get_*_info tools
  - Dashboards: support ID, UUID, and slug via id_or_slug_filter
  - Datasets/Charts: support ID and UUID via direct database queries
  - Add GetDashboardInfoRequest, GetDatasetInfoRequest, GetChartInfoRequest schemas

  **Enhanced Default Columns & Metadata:**
  - Add uuid to default columns for all list tools (dashboards, datasets, charts)
  - Include uuid/slug in search columns for better discoverability
  - Fix columns_requested to accurately reflect user input vs defaults
  - Fix columns_loaded to show actual DAO columns requested vs serialized fields

  **Testing & Code Quality:**
  - Add multi-identifier tests for all get_*_info tools (ID/UUID/slug scenarios)
  - Remove unused serialized_columns variable in ModelListTool
  - Fix linting issues (line length, docstrings)

  This provides flexible identifier support while ensuring accurate metadata
  tracking for better LLM compatibility and debugging.
…UUID/slug functionality

  - Add UUID field to ChartInfo and DatasetInfo Pydantic schemas for complete serialization
  - Include UUID in chart and dataset serialization functions (serialize_chart_object,
  serialize_dataset_object)
  - UUID and slug are now included in default response columns for better discoverability:
    * Dashboards: UUID and slug in DEFAULT_DASHBOARD_COLUMNS and returned by default
    * Charts: UUID in DEFAULT_CHART_COLUMNS and returned by default
    * Datasets: UUID in DEFAULT_DATASET_COLUMNS and returned by default
  - Search functionality enhanced to include UUID/slug fields across all relevant tools
  - Add comprehensive test coverage for UUID/slug functionality:
    * Default column verification tests ensuring UUID/slug are in default responses
    * Response data verification tests confirming UUID/slug values are returned
    * Custom column selection tests for explicit UUID/slug requests
    * Metadata accuracy tests verifying columns_requested/columns_loaded tracking
  - Update documentation to reflect enhanced multi-identifier capabilities
  - All 132 tests pass with comprehensive verification of UUID/slug support
  Implement comprehensive JWT authentication system for production deployments:

  • Add BearerAuthProvider integration with FastMCP for RS256 token validation
  • Support both static public keys and JWKS endpoints for enterprise key rotation
  • Implement scope-based authorization (dashboard:read, chart:write, etc.)
  • Extract user identity from JWT sub claim for proper audit trails
  • Maintain backward compatibility with admin fallback when auth disabled
  • Add comprehensive test coverage for authentication flows
  • Update documentation with human-friendly security setup guide

  Authentication is disabled by default for development convenience.
  Configure via environment variables for production use.
  Replace MCPUser wrapper with actual Flask-AppBuilder User from database
  to enable proper RBAC permission filtering. Fixes MCP service returning
  0 counts due to empty permission queries.
  - Implement configurable auth factory pattern as suggested by @dpgaspar
  - Add unit test coverage
  - Refactor auth.py for better pythonic patterns and error handling
  - Add proper type annotations and logging throughout
  - Split complex functions into focused helper functions
  - Add detailed architecture documentation with Mermaid diagrams
  - Improve JWT token extraction with robust fallback logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants