Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatabaseClient write transactions timeout error after running job for a few hours #3315

Open
iceriver123 opened this issue Aug 30, 2024 · 1 comment
Assignees
Labels
api: spanner Issues related to the googleapis/java-spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.

Comments

@iceriver123
Copy link

iceriver123 commented Aug 30, 2024

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please include as much information as possible:

Environment details

  1. Specify the API at the beginning of the title. For example, "BigQuery: ...").
    General, Core, and Other are also allowed as types
  2. OS type and version: GCP kubernetes cluster
  3. Java version: 17
  4. version(s): 6.65.0

Steps to reproduce

When we upgrade spanner library from 6.57.0 to 6.65.0, the jobs start to have spanner write timeout error after running for 6 to 18 hours. This happens usually on massive writes with high writes-per-second (e.g., hundreds of writes per seconds). We don't know if this has anything to do with session leaks or not, but 1) We already enabled setWarnAndCloseIfInactiveTransactions on SessionPool options. 2) We don't see this problem on 6.57.0. Our guess is some code between 6.58.0 to 6.65.0 causing the issue.

Code example

For instance, the following code will trigger it

spannerClient
.readWriteTransaction(Options.tag("method=writeToTable"))
.run(
transaction -> {
transaction.buffer(
toInsertOrUpdateMutation(
deviceId,
uploadId,
orgId,
mergedMeasure,
measureQueryTimestamp,
/* dedupBucketSizeSeconds= / null,
/ randomNumber= */ null));
return null;
});

Stack trace

We have tried multiple versions: This stack trace could be from 6.72, 6.69, 6.66 or 6.65

com.google.cloud.spanner.SpannerException: DEADLINE_EXCEEDED: io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 29.999968223s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[buffered_nanos=30000285535, waiting_for_connection]]]
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:291)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerExceptionPreformatted(SpannerExceptionFactory.java:297)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:170)
at com.google.cloud.spanner.SpannerExceptionFactory.newSpannerException(SpannerExceptionFactory.java:110)
at com.google.cloud.spanner.TransactionRunnerImpl$TransactionContextImpl.commit(TransactionRunnerImpl.java:327)
at com.google.cloud.spanner.TransactionRunnerImpl.lambda$runInternal$0(TransactionRunnerImpl.java:1167)
at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:102)
at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted(SpannerRetryHelper.java:79)
at com.google.cloud.spanner.SpannerRetryHelper.runTxWithRetriesOnAborted(SpannerRetryHelper.java:68)
at com.google.cloud.spanner.TransactionRunnerImpl.runInternal(TransactionRunnerImpl.java:1183)
at com.google.cloud.spanner.TransactionRunnerImpl.run(TransactionRunnerImpl.java:1096)
at com.google.cloud.spanner.SessionPool$SessionPoolTransactionRunner.run(SessionPool.java:1002)
...

External references such as API reference guides

  • ?

Any additional information below

Following these steps guarantees the quickest resolution possible.

Thanks!

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/java-spanner API. label Aug 30, 2024
@rahul2393 rahul2393 added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Sep 9, 2024
@iceriver123
Copy link
Author

Some update: We started to notice this issue when upgrading spanner client from 6.57 to 6.58. Maybe it is related to the max_delay setting introduced in 6.58?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/java-spanner API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.
Projects
None yet
Development

No branches or pull requests

2 participants