Skip to content

Conversation

@duckki
Copy link
Contributor

@duckki duckki commented Oct 23, 2025

Motivation

This PR fixes an issue where iter_origins() methods fail to return the Definition origin, even if the element has a (non-extension) definition. This can happen with schema definition and most type definitions.

Example

            type T # empty type definition

            extend type T { # an extension with a field
                field: Boolean
            }

iter_origin() method on the type T only returns an Extension, not Definition. That's because type T does not have fields nor directive applications and the current implementation does not record origins at all in this case.

Fix

This PR adds definition_origin field to SchemaDefinition and other type definition structs like Scalar and ObjectType. The definition_origin field is expected to have Some((ComponentOrigin::Definition) value if a schema element has a non-extension definition. Otherwise, it is expected to have None value.

The field actually hold a ComponentOrigin value, instead of being bool type, because iter_origins() method is expected to return a reference to a ComponentOrigin value and the field allows the method to return a reference to an object held by self.

Note: Unfortunately, this PR would be a breaking change.

Downstream fix PR: apollographql/router#8475

pub directives: DirectiveList,
/// Non-extension definition origin, if exists.
/// - We hold the origin here, so its reference can be returned from `iter_origins()`.
pub definition_origin: Option<ComponentOrigin>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you already called out, this will be a breaking change to the API. One thing I'd like to explore is if this is the right pattern for including this. Typically, we encode a Node plus a ComponentOrigin by wrapping a type in Component<T>, which is why we can get the origin for most of the children in the AST. Since the concept of origin is somewhat orthogonal to the data stored in each type, I think we should consider changing this to the following:

  • Schema::schema_definition should be a Component<SchemaDefinition> instead of Node<SchemaDefinition>
  • Similarly, we should update ExtendedType to hold Component<T> instead of Node<T> for each type

I think that would be better aligned with the current implementation of this library. I also have a couple other notes that I'm curious if you have an opinion on:

  1. Knowing how we're consuming this downstream, I think the actual thing we want to check is if the ExtendedType we get came from an extension or not (i.e. I think it's a logical bug, or at least a wasted check, to check the origins of the children in the AST). This may or may not inform the final solution here, but I just wanted to call that out.
  2. One unexpected thing I came across yesterday was that ExtensionId is actually an Arc-wrapped source span. I naively assumed it would be something like an AtomicUsize. This doesn't affect the current changes, but it does have bearing on my next comment.
  3. Since we're adding ComponentOrigin to more of the places where we are holding Node<T>, and whatever solution we come to will likely be a breaking API change, I'm wondering if this is the right time to consolidate Component<T> and Node<T>. I don't think naively adding the origin as-is to the current node header would be performant enough, since historically we've cared about how many bytes of overhead we add to each Node. However, I could see some AtomicUsize or even a simple extension/non-extension boolean flag giving us enough information for our use case.

I'll stop there. Curious to get your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also some context on a similar issue regarding DirectiveList in #851

Copy link
Contributor Author

@duckki duckki Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing up the #851.

Schema elements combine implicit def, explicit def and multiple extensions, thus naturally can have multiple origins or nothing at all (if it's all implicit). We could elect one origin to represent the Component, but unfortunately Component has no option to be "implicit" at this time.

If we used Component across the board, then the origin should be optional or has a new variant like Builtin and/or Implicit.

BTW, the origin election would be on this order: builtin/implicit < extension < definition. So, the highest origin will represent the element. That's enough info to fix this PR's issue without adding definition_origin field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what you mean by implicit in this case. At least in terms of how ComponentOrigin is currently set up, the only options are Definition or Extension. In the usual case, a type's origin should always be Definition, but we run into the case where it may only have extension definitions when we use the adopt_orphan_extensions option.

What I'm proposing is that we use ComponentOrigin::ExtensionId(_) to capture that case instead of definition_origin: None. I think that's more idiomatic with the current patterns in this crate.

Copy link
Contributor Author

@duckki duckki Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A schema element like built-in types may not have definition nor extensions. We won't even have an extension id to store.

schema_definition: Node::new(SchemaDefinition {

Alternatively, we can treat implicit definitions to be the "Definition", but distinguished by Node::is_built-in(). But, we need to check that in iter_origins().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, it will work without adding built-in/implicit variants.

Previously, I thought a built-in type like String could be extended with or without a base definition and only the latter is EXTENSION_WITH_NO_BASE error. That would mean that we need to tell built-in definition from explicit base definition.

However, I realized that situation won't happen:

  • Schema definitions are always allowed to be extended without a base
    • Thus, (built-in def + extension) and (explicit def + extension) behave the same.
  • Built-in types are not allowed to be extended at all (at least JS federation prevents it).
    • So, built-in types won't cause EXTENSION_WITH_NO_BASE errors anyways.
    • Rust composition allows to extend built-in scalar types at the moment, but built-in types are excluded from EXTENSION_WITH_NO_BASE checks.

@duckki
Copy link
Contributor Author

duckki commented Oct 29, 2025

#1012 is an alternative proposal that does not break API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apollo-compiler-2.0 Potential breaking API changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants