Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres db command execution fails with npgsql and multihost (primary/replica) connection #1110

Open
rolyv opened this issue Oct 31, 2024 · 7 comments · Fixed by #1113
Open

Comments

@rolyv
Copy link

rolyv commented Oct 31, 2024

Describe the bug
We use multi host Postgres connection string that includes a primary and a read replica. We also have Marten ReadSessionPreference set to PreferStandby. Wolverine appears to have issues with this setup and we're seeing exceptions in the logs like this:

2024-10-31 12:52:07.773 EDT.    delete from public.wolverine_control_queue where expires < @p0;select body from public.wolverine_control_queue where node_id = @p1;
2024-10-31 12:52:07.773 EDT
---> Npgsql.PostgresException (0x80004005): 25006: cannot execute DELETE in a read-only transaction
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteDbDataReaderAsync(CommandBehavior behavior, CancellationToken cancellationToken)
at Wolverine.RDBMS.Polling.DatabaseOperationBatch.ExecuteAsync(IWolverineRuntime runtime, CancellationToken cancellationToken) in /home/runner/work/wolverine/wolverine/src/Persistence/Wolverine.RDBMS/Polling/DatabaseOperationBatch.cs:line 68

To Reproduce
Steps to reproduce the behavior:

  1. Use multi host connection string like: Host=primary-ip,read-replica-ip;Username=...;Password=...;Database=...;Load Balance Hosts=true
  2. Set Marten ReadSessionPreference to PreferStandby
  3. Add Wolverine
  4. Watch go boom

Expected behavior
Wolverine should be able to execute commands against primary

Additional context
Using Marten 7.29.0
Wolverine 3.1.0
Npgsql 8.0.4
I'm registering NpgsqlDataSource via the Aspire.Npgsql nuget package 8.2.0 extension method

@rolyv
Copy link
Author

rolyv commented Nov 5, 2024

Thanks for the quick fix. I think there might still be an issue with the initial schema creation though. I'm still seeing this exception when I start up my project, but I can't tell if this is Wolverine, Marten, or both.

Npgsql.PostgresException (0x80004005): 25006: cannot execute CREATE FUNCTION in a read-only transaction
         at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
         at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
         at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
         at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
         at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
         at Weasel.Postgresql.PostgresqlMigrator.executeDelta(SchemaMigration migration, DbConnection conn, AutoCreate autoCreate, IMigrationLogger logger, CancellationToken ct)
        Exception data:
          Severity: ERROR
          SqlState: 25006
          MessageText: cannot execute CREATE FUNCTION in a read-only transaction
          File: utility.c
          Line: 414
          Routine: PreventCommandIfReadOnly
         --- End of inner exception stack trace ---
         at Marten.StoreOptions.Weasel.Core.Migrations.IMigrationLogger.OnFailure(DbCommand command, Exception ex)
         at Weasel.Postgresql.PostgresqlMigrator.executeDelta(SchemaMigration migration, DbConnection conn, AutoCreate autoCreate, IMigrationLogger logger, CancellationToken ct)
         at Weasel.Core.Migrations.DatabaseBase`1.ApplyAllConfiguredChangesToDatabaseAsync(IGlobalLock`1 globalLock, Nullable`1 override, ReconnectionOptions reconnectionOptions, CancellationToken ct)
         at Weasel.Core.Migrations.DatabaseBase`1.ApplyAllConfiguredChangesToDatabaseAsync(IGlobalLock`1 globalLock, Nullable`1 override, ReconnectionOptions reconnectionOptions, CancellationToken ct)
         at Oakton.Resources.ResourceSetupHostService.<>c__DisplayClass5_0.<<StartAsync>g__execute|0>d.MoveNext()
         --- End of inner exception stack trace ---
         --- End of inner exception stack trace ---
         at Oakton.Resources.ResourceSetupHostService.StartAsync(CancellationToken cancellationToken)
         at Microsoft.Extensions.Hosting.Internal.Host.<StartAsync>b__15_1(IHostedService service, CancellationToken token)
         at Microsoft.Extensions.Hosting.Internal.Host.ForeachService[T](IEnumerable`1 services, CancellationToken token, Boolean concurrent, Boolean abortOnFirstException, List`1 exceptions, Func`3 operation)

It looks like both projects depend on Weasel for applying db changes, so perhaps there's something missing there? I was looking thru Weasel and thought maybe this file needs to specify the Target Session Attribute: https://github.com/JasperFx/weasel/blob/master/src/Weasel.Postgresql/Connections/DefaultNpgsqlDataSourceFactory.cs

If you agree, I'm happy to submit a PR.

@Hawxy Hawxy reopened this Nov 7, 2024
@Hawxy
Copy link
Contributor

Hawxy commented Nov 7, 2024

Damn, I thought we only had a single entry-point for npgsql DS usage. I'll properly test this end to end over the weekend.

@Hawxy
Copy link
Contributor

Hawxy commented Nov 7, 2024

Could you try using AddMultiHostNpgsqlDataSource from Npgsql.DependencyInjection instead of Aspire.Npgsql? I don't see anything in the Aspire implementation that points to a multi-host data source being supported.

@jeremydmiller
Copy link
Member

@rolyv I think we might need a little more information about how you're bootstrapping. Wolverine would be taking the NpgsqlDataSource from Marten's usage if you were using AddMarten()IntegrateWithWolverine()

@rolyv
Copy link
Author

rolyv commented Nov 7, 2024

I am using AddMarten().IntegrateWithWolverine(). I'm not sure why I didn't run into this before adding Wolverine.

I was able to confirm that the NpgsqlDataSourceBuilder returns an instance of NpgsqlMultiHostDataSource if the connection string has multiple hosts: https://github.com/npgsql/npgsql/blob/main/src/Npgsql/NpgsqlSlimDataSourceBuilder.cs#L645

Output from debug console when trying to get NpgsqlDataSource:

app.Services.GetService<NpgsqlDataSource>()
  {Npgsql.NpgsqlMultiHostDataSource}
    ConnectionString: "Host=localhost:5432,localhost:5433;Username=postgres;Database=test;Load Balance Hosts=True"

@Hawxy
Copy link
Contributor

Hawxy commented Nov 12, 2024

Would you be able to provide a reproduction? I've configured a wolverine + marten environment with a multi-host configuration and I'm unable to get it to error out. You might be using a feature I haven't accounted for.

@rolyv
Copy link
Author

rolyv commented Nov 14, 2024

Sorry for the delay. It took me a while to come up with a consistent repro. I created a simple app that uses a lot of the same components I'm using in my real app. I'm guessing there is some randomness in how the host is picked per command. When I had the primary first in the connection string, I wasn't encountering the exception. But when I switched the connection string to have the read replica first, it started happening consistently. But it's not consistent on which command is the one that fails. Sometimes it would be a drop command that would fail. Sometimes it's the create function. I created this repo with the app I used to repro the issue. Hope this helps: https://github.com/rolyv/marten-repro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants