-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Describe the bug
When using Gateway mode, the Java SDK always returns a generic CosmosException regardless of the HTTP status code. In contrast, Direct mode correctly returns specific exception subclasses like NotFoundException, BadRequestException, ForbiddenException, etc., based on the status code.
This causes inconsistent behavior between connection modes and breaks code that relies on catching specific exception types:
try {
container.readItem(id, new PartitionKey(pk), MyClass.class);
} catch (NotFoundException e) {
// This works in Direct mode but NOT in Gateway mode
return Optional.empty();
}Root Cause Analysis
The issue is in the RxGatewayStoreModel.validateOrThrow() method (RxGatewayStoreModel.java#L663-L691):
private void validateOrThrow(RxDocumentServiceRequest request,
HttpResponseStatus status,
HttpHeaders headers,
ByteBuf retainedBodyAsByteBuf) {
int statusCode = status.code();
if (statusCode >= HttpConstants.StatusCodes.MINIMUM_STATUSCODE_AS_ERROR_GATEWAY) {
// ... error parsing ...
// Always creates generic CosmosException regardless of status code
CosmosException dce = BridgeInternal.createCosmosException(
request.requestContext.resourcePhysicalAddress,
statusCode,
cosmosError,
headers.toLowerCaseMap());
BridgeInternal.setRequestHeaders(dce, request.getHeaders());
throw dce;
}
}In contrast, Direct mode (both TCP/RNTBD and HTTP) uses a switch statement to create appropriate exception types:
- RNTBD (Direct TCP): RntbdRequestManager.java#L1050-L1150
- Direct HTTP: HttpTransportClient.java#L793-L940
Example from RntbdRequestManager:
switch (status.code()) {
case StatusCodes.BADREQUEST:
cause = new BadRequestException(error, lsn, partitionKeyRangeId, responseHeaders);
break;
case StatusCodes.NOTFOUND:
cause = new NotFoundException(error, lsn, partitionKeyRangeId, responseHeaders);
break;
// ... other status codes properly mapped
}Exception or Stack Trace
When reading a non-existent document in Gateway mode:
com.azure.cosmos.CosmosException: {"code":"NotFound","message":"Document does not exist..."}
statusCode = 404
When reading a non-existent document in Direct mode:
com.azure.cosmos.implementation.NotFoundException: {"Errors":["Resource Not Found..."]}
statusCode = 404
To Reproduce
- Create a
CosmosClientwith Gateway mode enabled:
CosmosClient client = new CosmosClientBuilder()
.endpoint(endpoint)
.key(key)
.gatewayMode() // Force gateway mode
.buildClient();- Attempt to read a non-existent document:
try {
container.readItem("non-existent-id", new PartitionKey("pk"), MyClass.class);
} catch (NotFoundException e) {
// This catch block is NEVER entered in Gateway mode
System.out.println("Caught NotFoundException");
} catch (CosmosException e) {
// This is always caught instead
System.out.println("Caught generic CosmosException: " + e.getClass().getSimpleName());
System.out.println("Status code: " + e.getStatusCode());
}Code Snippet
Minimal reproduction:
import com.azure.cosmos.*;
import com.azure.cosmos.implementation.NotFoundException;
import com.azure.cosmos.models.PartitionKey;
public class GatewayExceptionBugDemo {
public static void main(String[] args) {
CosmosClient client = new CosmosClientBuilder()
.endpoint("<your-endpoint>")
.key("<your-key>")
.gatewayMode()
.buildClient();
CosmosContainer container = client
.getDatabase("testdb")
.getContainer("testcontainer");
try {
container.readItem("non-existent-id", new PartitionKey("pk"), Object.class);
} catch (NotFoundException e) {
System.out.println("SUCCESS: Caught NotFoundException");
} catch (CosmosException e) {
System.out.println("BUG: Caught " + e.getClass().getSimpleName() +
" instead of NotFoundException. Status: " + e.getStatusCode());
}
client.close();
}
}Expected behavior
Gateway mode should return the same specific exception types as Direct mode based on HTTP status codes:
| Status Code | Expected Exception Type |
|---|---|
| 400 | BadRequestException |
| 401 | UnauthorizedException |
| 403 | ForbiddenException |
| 404 | NotFoundException |
| 405 | MethodNotAllowedException |
| 409 | ConflictException |
| 410 | GoneException (or subtypes based on sub-status) |
| 412 | PreconditionFailedException |
| 413 | RequestEntityTooLargeException |
| 423 | LockedException |
| 429 | RequestRateTooLargeException |
| 449 | RetryWithException |
| 500 | InternalServerErrorException |
| 503 | ServiceUnavailableException |
Proposed Fix
Update RxGatewayStoreModel.validateOrThrow() to use a switch statement similar to RntbdRequestManager and HttpTransportClient:
private void validateOrThrow(RxDocumentServiceRequest request,
HttpResponseStatus status,
HttpHeaders headers,
ByteBuf retainedBodyAsByteBuf) {
int statusCode = status.code();
if (statusCode >= HttpConstants.StatusCodes.MINIMUM_STATUSCODE_AS_ERROR_GATEWAY) {
// ... existing error parsing code ...
CosmosException exception;
Map<String, String> headersMap = headers.toLowerCaseMap();
switch (statusCode) {
case HttpConstants.StatusCodes.BADREQUEST:
exception = new BadRequestException(cosmosError.getMessage(), null, headersMap, null);
break;
case HttpConstants.StatusCodes.UNAUTHORIZED:
exception = new UnauthorizedException(cosmosError.getMessage(), null, headersMap, null);
break;
case HttpConstants.StatusCodes.FORBIDDEN:
exception = new ForbiddenException(cosmosError.getMessage(), null, headersMap, null);
break;
case HttpConstants.StatusCodes.NOTFOUND:
exception = new NotFoundException(cosmosError.getMessage(), null, headersMap, null);
break;
// ... other status codes ...
default:
exception = BridgeInternal.createCosmosException(
request.requestContext.resourcePhysicalAddress,
statusCode,
cosmosError,
headersMap);
break;
}
BridgeInternal.setRequestHeaders(exception, request.getHeaders());
throw exception;
}
}Setup:
- OS: Windows
- IDE: IntelliJ IDEA (Jetbrains)
- Library/Libraries:
com.azure:azure-cosmos:4.x.x(all versions affected) - Java version: 8+
- App Server/Environment: N/A - Core SDK issue (reproducible in any environment)
- Frameworks: N/A - Core SDK issue (not framework-specific)
Additional context
This issue was originally surfaced by users of the vNext Cosmos DB Emulator (see azure-cosmos-db-emulator-docker#262), which only supports Gateway mode. However, this is actually an SDK bug that affects all Gateway mode usage, not just the emulator.
The bug has been "hidden" for most users because:
- Direct mode is the default connection mode
- Most production deployments use Direct mode for performance reasons
- The vNext emulator only supporting Gateway mode exposed this inconsistency
Impact
- Code that catches specific exception types (e.g.,
NotFoundException) will not work correctly in Gateway mode - Users cannot write connection-mode-agnostic exception handling code
- Breaks behavioral compatibility between Direct and Gateway modes
Related Issues
Information Checklist
- Bug Description Added
- Repro Steps Added
- Setup information Added