Skip to content

Convert IMcpEndpoint, IMcpClient, and IMcpServer types to classes. #355

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
eiriktsarpalis opened this issue Apr 25, 2025 · 7 comments
Open

Comments

@eiriktsarpalis
Copy link
Contributor

Interface types are generally more difficult to evolve since introducing new members is by definition a breaking change. We should consider converting these to classes so that we can incorporate new methods in the future without issue.

More concretely, I expect this would entail the following changes:

  1. Extension methods acting on IMcpEndpoint instances could be changed to be regular methods
  2. The methods on McpClientFactory could be replaced with static factories on the McpClient class directly.

@stephentoub @halter73 @PederHP thoughts?

@halter73
Copy link
Contributor

halter73 commented Apr 25, 2025

I agree that we should prefer classes over interfaces for what we call IMcpClient and IMcpServer today. Although we should rename those to say something about "connection" or "session" particularly in the client case where it's rarely just one server. We discussed this a bit in #314 (comment)

I think we should move away from a static McpClientFactory.CreateAsync method anywhere, even directly on an McpClient class, and instead take inspiration from HttpClient and have something like mcpClient.LaunchAsync(processOptions) and mcpClient.ConnectAsync(uri).

Before

So instead of this:

await using var sseClientTransport = new SseClientTransport(new()
{
    Endpoint = new("http://localhost:3001/sse"),
});
await using var client = await McpClientFactory.CreateAsync(sseClientTransport);

After

We'd have this:

using var mcpClient = new McpClient();
using var mcpConnection = await mcpClient.ConnectAsync("http://localhost:3001/sse");

Or this:

using var mcpConnection = await mcpClient.LaunchAsync("npx", "-y", "@modelcontextprotocol/server-everything");

In my opinion, "Factory" has no place in a high-level API like this. And I think we could take inspiration from minimal APIs for things like tool calls. We had a small discussion about that on this PR.

Before

Imagine instead of this:

builder.Services.AddMcpServer()
    .WithTools([
        McpServerTool.Create((string input) => $"1: {input}", new() { Name = "echo1" }),
        McpServerTool.Create((string input) => $"2: {input}", new() { Name = "echo2" }),
    ]);

Or this:

builder.Services.AddSingleton(McpServerTool.Create((string input) => $"1: {input}", new() { Name = "echo1" }));
builder.Services.AddSingleton(McpServerTool.Create((string input) => $"2: {input}", new() { Name = "echo2" }));

Or even this if you want to add a tool with a dynamic name based on a service:

builder.Services.AddOptions<McpServerOptions>()
    .Configure<MyToolService>((options, toolService) =>
    {
        options.Capabilities ??= new();
        options.Capabilities.Tools ??= new();
        options.Capabilities.Tools.ToolCollection = new();
        options.Capabilities.Tools.ToolCollection.Add(
            McpServerTool.Create(toolService.Handler, new() { Name = toolService.Name }));
    });

Regardless of what we think of "MapTool", we should make all these nested options non-nullable.

After

We'd have this:

var tools = app.MapMcp().WithTools();
// Or for stdio, var tools = app.GetRequiredService<ToolCollectionBuilder>();

tools.MapTool("echo1", (string input) => $"1: {input}");
tools.MapTool("echo2", (string input) => $"2: {input}");

Or this:

var tools = app.MapMcp().WithTools();
var myToolsService = app.GetRequiredService<MyToolService>();

tools.MapTool(myToolsService.Name, myToolsService.Handler);

I discussed that MapTool idea a bit before in #348 (comment).

I haven't started on these changes, because I want to make sure I have buy-in before potentially breaking a bunch of people. We could also consider trying to keep as much of the older API around as possible, but encouraging people to use the newer APIs inspired by HttpClient and Minimal APIs.

@PederHP
Copy link
Collaborator

PederHP commented Apr 26, 2025

Yes, I agree. I was never happy with the use of the Factory pattern even when I added it. That was just the easiest way for me to get things up and running quickly, as the first code I wrote was very much aimed at having some ready for use with a specific project as soon as possible. I agree with @halter73 that it doesn't really belong in this type of SDK.

McpEndpoint was a way to share JsonRpc specific code, but abstraction-wise it's weak, so it having an interface isn't great.

In general I think interfaces should be used where there can be multiple implementations or where it's meaningful for the library user to write a custom implementation. I don't think IMcpEndpoint, IMcpClient and IMcpServer fulfill those criteria, and the interfaces just serve to make testing easier, and there are other, better ways, to accomplish the same.

So, I'm all for this. Transports are really the only high-level abstraction in MCP that has a good use for interfaces (not considering integrations like using IChatClient with Sampling, etc.)

I think it's important that the rewrite makes low-level MCP use still valid (e.g. custom handlers), so that it's still possible to integrate with a service or application that isn't MEAI based (use of some abstractions is fine, but implementation-wise) or if there's a need to do something protocol-close.

Which also has the advantage of making it easy to work with the C# SDK for developers who have experience with the other MCP SDKs - whether to port code or just to be able transfer their experience directly.

So as long as it doesn't become too obfuscated what's happening, I am fully behind it. I don't think Connect as alias for SSE and Launch as alias for Stdio is a problem, but it's more a general worry that (valid) library design concerns end up moving the SDK too far away from the protocol and other SDKs.

@PederHP
Copy link
Collaborator

PederHP commented Apr 26, 2025

Just a small note on tools. One thing to mindful of is that production code almost never has this minimal tools without annotations. Using Descriptions for tools and for each parameter is extremely important for any non-trivial tool. So the normal use case, which one should make elegant and low-friction isn't something that looks like:

var tools = app.MapMcp().WithTools();
// Or for stdio, var tools = app.GetRequiredService<ToolCollectionBuilder>();

tools.MapTool("echo1", (string input) => $"1: {input}");
tools.MapTool("echo2", (string input) => $"2: {input}");

That's how it tends to look in examples, but actual tools can easily look like this in raw schema (Claude Computer Use):

{
    "properties": {
        "action": {
            "description": "The action to perform. The available actions are:\n"
            "* `key`: Press a key or key-combination on the keyboard.\n"
            "  - This supports xdotool's `key` syntax.\n"
            '  - Examples: "a", "Return", "alt+Tab", "ctrl+s", "Up", "KP_0" (for the numpad 0 key).\n'
            "* `hold_key`: Hold down a key or multiple keys for a specified duration (in seconds). Supports the same syntax as `key`.\n"
            "* `type`: Type a string of text on the keyboard.\n"
            "* `cursor_position`: Get the current (x, y) pixel coordinate of the cursor on the screen.\n"
            "* `mouse_move`: Move the cursor to a specified (x, y) pixel coordinate on the screen.\n"
            "* `left_mouse_down`: Press the left mouse button.\n"
            "* `left_mouse_up`: Release the left mouse button.\n"
            "* `left_click`: Click the left mouse button at the specified (x, y) pixel coordinate on the screen. You can also include a key combination to hold down while clicking using the `text` parameter.\n"
            "* `left_click_drag`: Click and drag the cursor from `start_coordinate` to a specified (x, y) pixel coordinate on the screen.\n"
            "* `right_click`: Click the right mouse button at the specified (x, y) pixel coordinate on the screen.\n"
            "* `middle_click`: Click the middle mouse button at the specified (x, y) pixel coordinate on the screen.\n"
            "* `double_click`: Double-click the left mouse button at the specified (x, y) pixel coordinate on the screen.\n"
            "* `triple_click`: Triple-click the left mouse button at the specified (x, y) pixel coordinate on the screen.\n"
            "* `scroll`: Scroll the screen in a specified direction by a specified amount of clicks of the scroll wheel, at the specified (x, y) pixel coordinate. DO NOT use PageUp/PageDown to scroll.\n"
            "* `wait`: Wait for a specified duration (in seconds).\n"
            "* `screenshot`: Take a screenshot of the screen.",
            "enum": [
                "key",
                "hold_key",
                "type",
                "cursor_position",
                "mouse_move",
                "left_mouse_down",
                "left_mouse_up",
                "left_click",
                "left_click_drag",
                "right_click",
                "middle_click",
                "double_click",
                "triple_click",
                "scroll",
                "wait",
                "screenshot",
            ],
            "type": "string",
        },
        "coordinate": {
            "description": "(x, y): The x (pixels from the left edge) and y (pixels from the top edge) coordinates to move the mouse to. Required only by `action=mouse_move` and `action=left_click_drag`.",
            "type": "array",
        },
        "duration": {
            "description": "The duration to hold the key down for. Required only by `action=hold_key` and `action=wait`.",
            "type": "integer",
        },
        "scroll_amount": {
            "description": "The number of 'clicks' to scroll. Required only by `action=scroll`.",
            "type": "integer",
        },
        "scroll_direction": {
            "description": "The direction to scroll the screen. Required only by `action=scroll`.",
            "enum": ["up", "down", "left", "right"],
            "type": "string",
        },
        "start_coordinate": {
            "description": "(x, y): The x (pixels from the left edge) and y (pixels from the top edge) coordinates to start the drag from. Required only by `action=left_click_drag`.",
            "type": "array",
        },
        "text": {
            "description": "Required only by `action=type`, `action=key`, and `action=hold_key`. Can also be used by click or scroll actions to hold down keys while clicking or scrolling.",
            "type": "string",
        },
    },
    "required": ["action"],
    "type": "object",
}

(and that's just the input schema, so the tool description itself is omitted, as it's really long at 1137 characters)

So really long descriptions and many parameters that are all annotated with descriptions.

That's somewhat different than web APIs, and is something to mindful of creating the best developer experience for defining server tools. Actual production code will be more about making it easy to iterate on and update the descriptions than mapping to the implementation.

Good tools are less about exposing existing code, and more about designing a good schema and writing good descriptions for the LLM on expects to use (as MCP sadly doesn't have a mechanism to vary descriptions based on model, which is actually a problem, but a different discussion).

@halter73
Copy link
Contributor

One thing to mindful of is that production code almost never has this minimal tools without annotations.

That's a very good point. I think the minimal API style has good metadata support using extension methods though, and I think we would emulate that. See MapMcp for a bit of what I'm talking about:

var mcpGroup = endpoints.MapGroup(pattern);
var streamableHttpGroup = mcpGroup.MapGroup("")
.WithDisplayName(b => $"MCP Streamable HTTP | {b.DisplayName}")
.WithMetadata(new ProducesResponseTypeMetadata(StatusCodes.Status404NotFound, typeof(JsonRpcError), contentTypes: ["application/json"]));
streamableHttpGroup.MapPost("", streamableHttpHandler.HandlePostRequestAsync)
.WithMetadata(new AcceptsMetadata(["application/json"]))
.WithMetadata(new ProducesResponseTypeMetadata(StatusCodes.Status200OK, contentTypes: ["text/event-stream"]))
.WithMetadata(new ProducesResponseTypeMetadata(StatusCodes.Status202Accepted));
streamableHttpGroup.MapGet("", streamableHttpHandler.HandleGetRequestAsync)
.WithMetadata(new ProducesResponseTypeMetadata(StatusCodes.Status200OK, contentTypes: ["text/event-stream"]));
streamableHttpGroup.MapDelete("", streamableHttpHandler.HandleDeleteRequestAsync);
// Map legacy HTTP with SSE endpoints.
var sseHandler = endpoints.ServiceProvider.GetRequiredService<SseHandler>();
var sseGroup = mcpGroup.MapGroup("")
.WithDisplayName(b => $"MCP HTTP with SSE | {b.DisplayName}");
sseGroup.MapGet("/sse", sseHandler.HandleSseRequestAsync)
.WithMetadata(new ProducesResponseTypeMetadata(StatusCodes.Status200OK, contentTypes: ["text/event-stream"]));
sseGroup.MapPost("/message", sseHandler.HandleMessageRequestAsync)
.WithMetadata(new AcceptsMetadata(["application/json"]))
.WithMetadata(new ProducesResponseTypeMetadata(StatusCodes.Status202Accepted));

It would look a lot cleaner if I didn't have to new up all the attributes for lack of higher-level APIs like .Accepts and .Produces since I'm using the RequestDelegate overload of MapGet/MapPost/etc...

C# has also supported attributes on lambdas since .NET 6 (C# 10) when we added the ability to infer the natural type of lambda expressions for use in minimal APIs. I'll admit that attributes on lambdas are typically a lot uglier than extension methods though.

You can see reference to these features https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-10.0/lambda-improvements. If you look closely, you'll see it shows minimal APIs how they were in early previews when I first named them "MapAction" (alluding to MVC action methods) 😆.

I'm definitely glad we changed it to overload the by then already-existing RequestDelegate-based MapGet/MapPost/etc. methods even if it does lead to weirdness like the previously-mentioned unavailability of .Accepts and .Produces on the builders returned by the RequestDelegate overload.

@eiriktsarpalis
Copy link
Contributor Author

eiriktsarpalis commented Apr 28, 2025

I think we should move away from a static McpClientFactory.CreateAsync method anywhere, even directly on an McpClient class, and instead take inspiration from HttpClient and have something like mcpClient.LaunchAsync(processOptions) and mcpClient.ConnectAsync(uri).

My only misgiving about this approach is that it introduces another state for instances that users need to contend with. The benefit of having a static McpClientFactory.CreateAsync (or McpClient.ConnectAsync if you prefer) is that having an instance of type McpClient in scope unambiguously means that it has been connected (or perhaps being in a disposed/disconnected state -- although this is less likely given how we generally handle disposable objects in the language).

@halter73
Copy link
Contributor

100% agreed. My proposal does not give you an mcpConnection before it is connected. Having a non-static ConnectAsync method is completely orthogonal to whether or not you want to add a preconnected state which I'm also against.

@stephentoub
Copy link
Contributor

@halter73, in your mock ups, where the mcpConnection is used presumably both client and server, how are APIs exposed that are only relevant to client or server? Are you envisioning an McpClientConnection and an McpServerConnection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants