Skip to content

.NET LaunchDarkly SDK incompatible with Orleans or other single-threaded TaskSchedulers #32

@alexn-tinwell

Description

@alexn-tinwell

Describe the bug

We've found that unless we call Events(Components.NoEvents) on the SDK, all Orleans mesh activity is halted at app startup around the same time that an LdClient singleton is first instantiated.

For context, Microsoft Orleans is a virtual actor / service mesh framework. Grain (actor) instances are executed with a custom task scheduler that is intentionally single-threaded so that the core framework managing the instance mesh can reason about the resource impact of each grain activation and balance them across nodes as required.

Unfortunately, turning off LaunchDarkly analytics events also appears to disable auto-creation of new ContextKinds. We can reference User contexts just fine, but when trying to create a custom context kind for backend services, the context is never shown in the LD dashboard unless we turn events back on, but in that case the app hangs shortly after boot.

I theorise that it's related to the internal implementation of EventProcessorInternal, which starts the main message loop using Task.Factory.StartNew (which uses TaskScheduler.Current internally). In scenarios where LdClient is instantiated in the context of an Orleans Grain, this could block the only thread on the scheduler and prevent other work from ever occurring.

Furthermore, the implementation of RunMainEventLoop is inefficient because it pegs that thread with no slippage or sleeping between loop iterations whatsoever, so that thread is unlikely ever to relinquish context to other waiting tasks even when there is no work to perform.

For reasons I can't quite understand, EventProcessor implements no interfaces and is sealed, so we can't resolve this ourselves short of forking the SDK.

To reproduce
Create any sample Orleans application with a couple of grains, declare LdClient as a singleton for the Silo, and observe communication timeouts between grains.

While I couldn't prove my theory by modifying the SDK code or overriding the implementation, I did find a correlating fix.

Spot the difference:

Hangs any app using a single-threaded task scheduler:

    public static void AddLaunchDarklyClient(this IServiceCollection collection, ISecrets secrets)
    {
        var launchDarklySdkKey = secrets.LaunchDarklySdkKey();
        if (launchDarklySdkKey.IsBlank())
            throw new ArgumentException(
                $"LaunchDarkly SDK key is blank. Please set the environment variable {Constants.LaunchDarkly.Secrets.SdkKey}.");
        
        collection.AddSingleton<ILdClient>(x =>
        {
            var builder = LaunchDarkly.Sdk.Server.Configuration
                .Builder(launchDarklySdkKey)
                .DiagnosticOptOut(true)
                .Build();
            
            return new LdClient(builder);
        });
    }

Does not hang the app because the client is instantiated on a threadpool thread using Task.Run:

    public static void AddLaunchDarklyClient(this IServiceCollection collection, ISecrets secrets)
    {
        var launchDarklySdkKey = secrets.LaunchDarklySdkKey();
        if (launchDarklySdkKey.IsBlank())
            throw new ArgumentException(
                $"LaunchDarkly SDK key is blank. Please set the environment variable {Constants.LaunchDarkly.Secrets.SdkKey}.");
        
        LaunchDarkly.Sdk.Server.Configuration? builder = null;
        LdClient? client = null;

        // This is a hack to get around the fact that LaunchDarkly expects to peg a thread on the current TaskScheduler, meaning
        // it starves Orleans
        Task.Run(() =>
        {
            builder = LaunchDarkly.Sdk.Server.Configuration
                .Builder(launchDarklySdkKey)
                .DiagnosticOptOut(true)
                .Build();
            
            client = new LdClient(builder);
        }).GetAwaiter().GetResult();

        collection.AddSingleton<ILdClient>(x =>
        {
            return client;
        });
    }

Expected behavior
Merely instantiating the SDK should have no adverse affects on the hosting application nor peg a CPU thread.

Logs
If applicable, add any log output related to your problem.

SDK version
8.0.0.0 but this was also present in 7.x. We ignored it until now because we weren't using events, but now want to reference custom ContextKinds and couldn't get them working until we turned events back on.

Language version, developer tools
C# / .NET 7

OS/platform
Any

Additional context
If you must start your own long-running threads inside an SDK instead of exposing an IHostedService, always pass in TaskScheduler.Default (and ideally use Task.Run instead of Task.Factory.StartNew)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions