Skip to content

11622 : OutlierDetection should use Ticker, not TimeProvider #12110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion netty/src/main/java/io/grpc/netty/AbstractNettyHandler.java
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,8 @@ public void updateWindow() throws Http2Exception {
pingReturn++;
setPinging(false);

long elapsedTime = (ticker.read() - lastPingTime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fine I believe, this doesn't need any change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the review comments

long currentTickerTimeNanos = ticker.read();
long elapsedTime = Math.max(0L,(currentTickerTimeNanos - lastPingTime));
if (elapsedTime == 0) {
elapsedTime = 1;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
import static java.util.concurrent.TimeUnit.NANOSECONDS;

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Ticker;
import com.google.common.collect.ForwardingMap;
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableSet;
Expand All @@ -39,7 +40,6 @@
import io.grpc.Status;
import io.grpc.SynchronizationContext;
import io.grpc.SynchronizationContext.ScheduledHandle;
import io.grpc.internal.TimeProvider;
import java.net.SocketAddress;
import java.util.ArrayList;
import java.util.Collection;
Expand Down Expand Up @@ -82,7 +82,7 @@ public final class OutlierDetectionLoadBalancer extends LoadBalancer {
private final SynchronizationContext syncContext;
private final Helper childHelper;
private final GracefulSwitchLoadBalancer switchLb;
private TimeProvider timeProvider;
private Ticker ticker;
private final ScheduledExecutorService timeService;
private ScheduledHandle detectionTimerHandle;
private Long detectionTimerStartNanos;
Expand All @@ -95,14 +95,14 @@ public final class OutlierDetectionLoadBalancer extends LoadBalancer {
/**
* Creates a new instance of {@link OutlierDetectionLoadBalancer}.
*/
public OutlierDetectionLoadBalancer(Helper helper, TimeProvider timeProvider) {
public OutlierDetectionLoadBalancer(Helper helper, Ticker ticker) {
logger = helper.getChannelLogger();
childHelper = new ChildHelper(checkNotNull(helper, "helper"));
switchLb = new GracefulSwitchLoadBalancer(childHelper);
endpointTrackerMap = new EndpointTrackerMap();
this.syncContext = checkNotNull(helper.getSynchronizationContext(), "syncContext");
this.timeService = checkNotNull(helper.getScheduledExecutorService(), "timeService");
this.timeProvider = timeProvider;
this.ticker = ticker;
logger.log(ChannelLogLevel.DEBUG, "OutlierDetection lb created.");
}

Expand Down Expand Up @@ -148,10 +148,12 @@ public Status acceptResolvedAddresses(ResolvedAddresses resolvedAddresses) {
// On the first go we use the configured interval.
initialDelayNanos = config.intervalNanos;
} else {
long currentTickerTimeNanos = ticker.read();
long elapsedTimeNanos = Math.max(0L,(currentTickerTimeNanos - detectionTimerStartNanos));
// If a timer has started earlier we cancel it and use the difference between the start
// time and now as the interval.
initialDelayNanos = Math.max(0L,
config.intervalNanos - (timeProvider.currentTimeNanos() - detectionTimerStartNanos));
config.intervalNanos - elapsedTimeNanos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep everything in-line here and above to make it less confusing. Storing in a variable doesn't change anything here, but sure you need to do Math.max(0L, ...) to not have any negative value there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure , I have addressed the review comments and added negative values check and moved everything to single line but's build has failed as it's reaching to 100 characters in line hence I am planning split it two lines

Screenshot 2025-06-06 8 03 45 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max is unnecessary. For it to overflow would require the process to be running for 292 years.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the review comments

}

// If a timer has been previously created we need to cancel it and reset all the call counters
Expand Down Expand Up @@ -201,7 +203,7 @@ class DetectionTimer implements Runnable {

@Override
public void run() {
detectionTimerStartNanos = timeProvider.currentTimeNanos();
detectionTimerStartNanos = ticker.read();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you need to audit fully, if somewhere in inner methods it is being used in unwanted way - like it is mentioned in javadoc. The comparison should always be done doing subtraction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have taken the report using grep commands and verified with the IDE search with ticker.read() but I did not see the scenario where the ticker.read() is directly used in finding the difference without doing the subtraction and We are frequently using the this ticker.read() in junit's but not seen it's used in unwanted way

I can see the ticker.read() in a few below implementation classes but not seen it's used in an unwanted way , please find the attached Audit_TR.txt for Your reference.

Aduit_ticker_read.txt

AbstractNettyHandler.java
NettyServerHandler.java
CachingRlsLbClient.java
LinkedHashLruCache.java
AdaptiveThrottler.java
PingTracker.java
OutlierDetectionLoadBalancer.java

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to audit all the files. This single file has such a bug. I'd outright tell you, but then I have to audit it for any other missed cases, because clearly you aren't finding them, and that is literally the only interesting part of this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and the guidance , I have done a careful audit on the OutlierDetectionLoadBalancer method calls and noticed ticker.read() has invoked two api's

  1. acceptResolvedAddresses(ResolvedAddresses resolvedAddresses)

We have already had discussion on the below API and we are good with existing code as it's already been using subtraction to find the time difference instead of using > or < directly in finding the time difference and we are intended to the System.nanoTime() documentation

initialDelayNanos = Math.max(0L,
config.intervalNanos - (ticker.read() - detectionTimerStartNanos));

  1. maxEjectionTimeElapsed(long currentTimeNanos)

I'm hoping our long discussions on finding the bug in this API and I have observed the maybeUnejectOutliers method has invoked it in run() with detectionTimerStartNanos and which is assigned with ticker.read() and using the currentTimeNanos > maxEjectionTimeNanos expression while returning the boolean value if the currentTimeNanos is after the maxEjectionTimeNanos on maxEjectionTimeElapsed

endpointTrackerMap.maybeUnejectOutliers(detectionTimerStartNanos);

endpointTrackerMap.maybeUnejectOutliers(detectionTimerStartNanos);

I have addressed this issue in the latest commit , please review it and let me know Your thoughts if I have missed to find any other bugs in the OutlierDetectionLoadBalancer


endpointTrackerMap.swapCounters();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,14 @@

package io.grpc.util;

import com.google.common.base.Ticker;
import io.grpc.Internal;
import io.grpc.LoadBalancer;
import io.grpc.LoadBalancer.Helper;
import io.grpc.LoadBalancerProvider;
import io.grpc.NameResolver.ConfigOrError;
import io.grpc.Status;
import io.grpc.internal.JsonUtil;
import io.grpc.internal.TimeProvider;
import io.grpc.util.OutlierDetectionLoadBalancer.OutlierDetectionLoadBalancerConfig;
import io.grpc.util.OutlierDetectionLoadBalancer.OutlierDetectionLoadBalancerConfig.FailurePercentageEjection;
import io.grpc.util.OutlierDetectionLoadBalancer.OutlierDetectionLoadBalancerConfig.SuccessRateEjection;
Expand All @@ -34,7 +34,7 @@ public final class OutlierDetectionLoadBalancerProvider extends LoadBalancerProv

@Override
public LoadBalancer newLoadBalancer(Helper helper) {
return new OutlierDetectionLoadBalancer(helper, TimeProvider.SYSTEM_TIME_PROVIDER);
return new OutlierDetectionLoadBalancer(helper, Ticker.systemTicker());
}

@Override
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ public Void answer(InvocationOnMock invocation) throws Throwable {
when(mockStreamTracerFactory.newClientStreamTracer(any(),
any())).thenReturn(mockStreamTracer);

loadBalancer = new OutlierDetectionLoadBalancer(mockHelper, fakeClock.getTimeProvider());
loadBalancer = new OutlierDetectionLoadBalancer(mockHelper, fakeClock.getTicker());
}

@Test
Expand Down