Skip to content

Issues with base64 encoded protobuf descriptors as schemas #7066

@abhimoondra

Description

@abhimoondra

Description

Registry Version: 3.1.6
Persistence type: postgres

When storing base64 encoded protobuf descriptors into schema registry, there are two things that don't work:

  • Issue 1: If schema validation is enabled (i.e. Validity rule is FULL), then registering protobuf descriptors with references doesn't work. This is because the parsing logic fails at multiple places. 1 2 3 4 5
  • Issue 2: Apicurio SerDes libraries don't work with base64 encoded protobuf schemas, again because of parsing issues. 1

Environment

Running apicurio schema registry locally in a docker container. Using the APIcurio provided UI (3.1.6) and serdes libraries (3.1.6) for interacting with schema registry.
Validity Rule : FULL
Compatibility Rule : BACKWARD_TRANSISTIVE
Integrity Rule : FULL

Steps to Reproduce : reproducer.txt

Issue 1: Schema syntax validation fails for schemas with references

dep.proto

syntax = "proto3";
package test;

message Dep {
  string name = 1;
}

root.proto

syntax = "proto3";
package test;
import "dep.proto";

message Root {
  Dep d = 1;
}

Execute the curl commands in the attached file in the sequence that they are present.

Issue 2:
Register the schemas in the registry using the curl commands in the uploaded file, generate the java stubs of the proto files and use them in a producer and consumer using Apicurio SerDes libraries.

Expected vs Actual Behaviour

Expected

Issue 1 : Schema validation must pass for schemas with references when a base64 encoded protobuf schema descriptor is uploaded
Issue 2 : SerDes libraries should work with base64 encoded protobuf schema descriptors

Actual

Issue1 : Schema validation fails for schema with references when a base64 encoded protobuf schema descriptor is uploaded
Issue 2 : SerDes libraries don't work with base64 encoded protobuf schema descriptors

Notes:

Background / Context

We are planning to use Apicurio Schema Registry for managing event schemas in our platform. Our ecosystem includes both Java and Go services, all of which interact with the registry via language-specific SerDes libraries.

I’m sharing some background on how we arrived at the issue described above.

Initial Approach

Initially, we stored text .proto files directly in the registry. Schemas were registered using the following flow:

GitHub repository (text .proto files)
  → middleware to fetch schemas and detect updates
  → sync to Apicurio Schema Registry

However, we encountered issues with the Apicurio SerDes libraries where compatibility checks for MESSAGE field types began to fail. (I can provide additional details if helpful.)

Descriptor-Based Approach

To work around this, we moved away from storing text .proto files and introduced an intermediate descriptor-based step:

GitHub repository (text .proto files)
  → generate protobuf descriptors
  → upload descriptors to S3
  → middleware fetches descriptors
  → convert descriptors back to text .proto using square-wireschema
  → register schemas in Apicurio

This resolved the compatibility issues and worked correctly for both Java and Go services initially.

Issue with map Types in Go

During further testing with a wider variety of schemas, we discovered that Go services failed for schemas containing map fields.

This turned out to be caused by:

  • How square-wireschema converts descriptors back into text .proto
  • Limitations in the Go-side parser we use (jhump/protoreflect), which was unable to parse the generated .proto files correctly. This same parser is used in confluent-kafka-go as well.

Current Direction

As a result, we decided to stop storing text .proto files altogether and instead store base64-encoded protobuf descriptors directly in the registry.

At this point, we started encountering the issues described above.

Additional Context

I’ve reviewed the following PR and discussion:

#6833

Suggestion by @EricWittmann to convert descriptors to text .proto (e.g., via square-wireschema) before storing them

While this approach appears to work well for Java services, based on our experience there is a risk of breaking other language implementations, such as Go, due to parser differences.

Next Steps

I’m happy to contribute a PR to address this, potentially following a similar approach to:

#6833

Before proceeding, I’d appreciate guidance on the preferred direction and whether this aligns with the project’s expectations.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions