Skip to content

Missing Field Validators for String-to-Enum Conversion in REST/JSON APIs #114

@aniketpalu

Description

@aniketpalu

Describe the bug
When protobuf enums are serialized to JSON (common in REST APIs), they are typically represented as strings (e.g., "BATCH_FILE") rather than integers. The generated Pydantic models don't include field validators to handle this standard protobuf JSON serialization format, causing deserialization failures.
Dependencies

python version: sys.version_info(major=3, minor=11, micro=8, releaselevel='final', serial=0)

############# dependencies ############## 
    grpc:            1.62.3
    pydantic:        2.10.6

########## Expand dependencies ########## 
    mypy-protobuf:   3.3.0
    toml:            0.10.2

########## Format dependencies ########## 
    autoflake:       Not Install
    black:           Not Install
    isort:           Not Install

Protobuf File Content

Filename: feast/core/DataSource.proto

syntax = "proto3";
package feast.core;

message DataSource {
  enum SourceType {
    INVALID = 0;
    BATCH_FILE = 1;
    BATCH_BIGQUERY = 2;
    STREAM_KAFKA = 3;
  }

  SourceType type = 1;
  string name = 2;
}

CLI (if use plugin mode)

python -m grpc_tools.protoc \
-I. \
--protobuf-to-pydantic_out=. \
feast/core/DataSource.proto

Output content

Filename: feast/core/DataSource_p2p.py

from pydantic import BaseModel, Field
from enum import IntEnum

class DataSource(BaseModel):
  class SourceType(IntEnum):
    INVALID = 0
    BATCH_FILE = 1
    BATCH_BIGQUERY = 2
    STREAM_KAFKA = 3

  type: "DataSource.SourceType" = Field(default=0)
  name: str = Field(default="")

Expected behavior
The generated model should include a field validator to handle both integer and string representations:

from pydantic import BaseModel, Field, field_validator
from enum import IntEnum

class DataSource(BaseModel):
    class SourceType(IntEnum):
        INVALID = 0
        BATCH_FILE = 1
        BATCH_BIGQUERY = 2
        STREAM_KAFKA = 3
    
    type: "DataSource.SourceType" = Field(default=0)
    name: str = Field(default="")
    
    @field_validator('type', mode='before')
    @classmethod
    def validate_type(cls, v):
        if isinstance(v, str):
            # Convert string enum names to values
            return cls.SourceType[v]
        return v

Reproduction:

# JSON from REST API (standard protobuf JSON format)
json_data = {"type": "BATCH_FILE", "name": "my_source"}

# Current behavior: FAILS
ds = DataSource(**json_data)
# ValidationError: Input should be a valid integer

# Expected behavior: SUCCEEDS
ds = DataSource(**json_data)
assert ds.type == DataSource.SourceType.BATCH_FILE

Additional context
According to the protobuf JSON mapping spec, enums are serialized as strings in JSON format. This is the standard behavior for protobuf REST APIs, so generated Pydantic models should support this out of the box.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions