Skip to content

Conversation

@ElliottjPierce
Copy link
Contributor

Objective

Works towards #18276.
This does not do system initialization with &World.

This makes WorldQuery::init_state only take &World, and by extension QueryState::new, etc.

Solution

  • Make Bundle::component_ids also work with queued component registration.
  • Change WorldQuery::init_state from &mut World to &World.
  • Change implementations to use queued component registration.
  • Initialize AssetChanges resource in init_asset instead of AssetChanged::init_state.

Testing

  • Ci

@ElliottjPierce ElliottjPierce added A-ECS Entities, components, systems, and events C-Usability A targeted quality-of-life change that makes Bevy easier to use S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Jan 23, 2026
@hymm
Copy link
Contributor

hymm commented Jan 23, 2026

Am I right that queueing the reservations is going to be slower? Do we have any idea how much slower this is for initializing the state?

@ElliottjPierce
Copy link
Contributor Author

Am I right that queueing the reservations is going to be slower? Do we have any idea how much slower this is for initializing the state?

Regardless of using queued vs un-queued registration, it first looks to see if the component is already registered. In the un-queued case, if not, it starts the whole registration process, which could take a while. In the queued case, it acquires a lock the the queue, reserves an id, and drops the lock. The registration happens after in World::flush.

So, I think the only way this could have perf problems is if a ton of query states are made about components that are not registered, and nothing flushes the world, spawns the component, or does anything in between to register it. And even if all that does happen, the only real perf cost is getting a RwLock, which is negligible since this isn't really ever happening in a tight loop IIRC.

@ElliottjPierce
Copy link
Contributor Author

When it comes to bevy_trait_query, I think they can work with this too. Looking at their impl, they only use mutable world access to initialize a resource and seal it. Sealing it is just setting a bool flag, so that can be done with atomics. If the resource doesn't exist, this initializes it, but it also warns because if the resource wasn't there, the trait query is useless anyway. IIUC, bevy_trait_query might not have as good warning and stuff, but it would still work without mutable world access I think.

@alice-i-cecile alice-i-cecile added X-Uncontroversial This work is generally agreed upon D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes labels Jan 25, 2026
///
/// - Asset changes are registered in the [`AssetEventSystems`] system set.
/// - Removed assets are not detected.
/// - The asset must be initialized ([`App::init_asset`](crate::AssetApp::init_asset)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a totally fine limitation: this filter is useless without initialized assets.

@alice-i-cecile alice-i-cecile added the M-Release-Note Work that should be called out in the blog due to impact label Jan 25, 2026
@github-actions
Copy link
Contributor

It looks like your PR has been selected for a highlight in the next release blog post, but you didn't provide a release note.

Please review the instructions for writing release notes, then expand or revise the content in the release notes directory to showcase your changes.

@alice-i-cecile alice-i-cecile added the M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide label Jan 25, 2026
@github-actions
Copy link
Contributor

It looks like your PR is a breaking change, but you didn't provide a migration guide.

Please review the instructions for writing migration guides, then expand or revise the content in the migration guides directory to reflect your changes.

@alice-i-cecile
Copy link
Member

The ability to do World::query with only &World is a major papercut resolved. I'd like a very brief release note celebrating this.

There's also some docs and deprecation work to do, but please ping me for a re-review when those comments are addressed.

@chengts95
Copy link

chengts95 commented Jan 25, 2026

I have two concerns:

  1. RwLock is used for multi-thread synchronization; now it seems that it is used for only bypassing mutable reference. Using synchronization primitives just to turn a &self into a 'pseudo-mutable' state is questionable. If the World must eventually be mutated, the API should reflect that reality. It is uncertain if the API, now looking thread-safe, can ensure safety in multi-thread calls.

  2. Directly altering the World::query signature constitutes a significant breaking change. Deprecating try_queue does not mitigate the fact that existing call sites for World::query will fail to compile. We should consider a phased migration or a parallel API to maintain backward compatibility.

My suggestion would be: keep World::query unchanged, Deprecating try_queue, and add read_only_queue for this. This is because if the query indeed needs to mutate World, we should not hide it, and the & mut World will make the compiler check the exclusive access as usual.

@alice-i-cecile
Copy link
Member

RwLock is used for multi-thread synchronization; now it seems that it is used for only bypassing mutable reference. Using synchronization primitives just to turn a &self into a 'pseudo-mutable' state is questionable. If the World must eventually be mutated, the API should reflect that reality. It is uncertain if the API, now looking thread-safe, can ensure safety in multi-thread calls.

This is a sensible concern and should be discussed / debated. Tests to verify that this works during multithreaded operation would be appreciated at the least.

Directly altering the World::query signature constitutes a significant breaking change. Deprecating try_queue does not mitigate the fact that existing call sites for World::query will fail to compile. We should consider a phased migration or a parallel API to maintain backward compatibility.

I am completely fine with this as a breaking change given Bevy's level of stability. The migration is very easy, and that's what migration guides are for.

@alice-i-cecile alice-i-cecile added D-Complex Quite challenging from either a design or technical perspective. Ask for help! and removed X-Uncontroversial This work is generally agreed upon D-Modest A "normal" level of difficulty; suitable for simple features or challenging fixes labels Jan 26, 2026
@alice-i-cecile alice-i-cecile added the X-Contentious There are nontrivial implications that should be thought through label Jan 26, 2026
@ElliottjPierce
Copy link
Contributor Author

  1. RwLock is used for multi-thread synchronization; now it seems that it is used for only bypassing mutable reference. Using synchronization primitives just to turn a &self into a 'pseudo-mutable' state is questionable. If the World must eventually be mutated, the API should reflect that reality.

Well, this probably will be used for multi-thread synchronization since systems that take &World can run in parallel, but I get your point.

The short answer is that we consider things like component registrations, new tables, etc not as conceptual mutations. That is, even though they typically need &mut World, they are not conceptual changes in state. A conceptual state change is, for example, changing the component on an entity, despawning entities, etc.

From a user's perspective, a World is just a collection of entities with their components. It's easy to understand why interacting with those entities or component values mutably requires mutable world access. But when we require &mut World for things like "What components does component T require?" or "Which entities have component T?", users view the &mut World as a technical limitation since these questions are readonly in essence and only require mutations due to their implementation.

At least, this is what I gathered from discord conversations that ultimately led toward #18173.

It is uncertain if the API, now looking thread-safe, can ensure safety in multi-thread calls.

Multithreading does make things a bit more complex, but I don't think we have to worry about it here. The only RwLock this uses in the component registration in #18173 which had lots of good eyes go over it. When the lock is used, nothing in the guard's path can re-lock or panic, aside from maybe an OOM, which would be a moot point. And the lock is private, so it is impossible to cause a deadlock or poisoning here. I could be wrong, but I think this is thread safe.

I'd be happy to write some tests to double check lock contention, but I don't think there's any way to create that situation in a test. Still, I understand your concern; I'm just not sure how to definitively prove its correctness here. Was there anything in particular in #18173 that looks suspicious to you?

@chengts95
Copy link

chengts95 commented Jan 26, 2026

  1. RwLock is used for multi-thread synchronization; now it seems that it is used for only bypassing mutable reference. Using synchronization primitives just to turn a &self into a 'pseudo-mutable' state is questionable. If the World must eventually be mutated, the API should reflect that reality.

Well, this probably will be used for multi-thread synchronization since systems that take &World can run in parallel, but I get your point.

The short answer is that we consider things like component registrations, new tables, etc not as conceptual mutations. That is, even though they typically need &mut World, they are not conceptual changes in state. A conceptual state change is, for example, changing the component on an entity, despawning entities, etc.

From a user's perspective, a World is just a collection of entities with their components. It's easy to understand why interacting with those entities or component values mutably requires mutable world access. But when we require &mut World for things like "What components does component T require?" or "Which entities have component T?", users view the &mut World as a technical limitation since these questions are readonly in essence and only require mutations due to their implementation.

At least, this is what I gathered from discord conversations that ultimately led toward #18173.

It is uncertain if the API, now looking thread-safe, can ensure safety in multi-thread calls.

Multithreading does make things a bit more complex, but I don't think we have to worry about it here. The only RwLock this uses in the component registration in #18173 which had lots of good eyes go over it. When the lock is used, nothing in the guard's path can re-lock or panic, aside from maybe an OOM, which would be a moot point. And the lock is private, so it is impossible to cause a deadlock or poisoning here. I could be wrong, but I think this is thread safe.

I'd be happy to write some tests to double-check lock contention, but I don't think there's any way to create that situation in a test. Still, I understand your concern; I'm just not sure how to definitively prove its correctness here. Was there anything in particular in #18173 that looks suspicious to you?

What I was mentioning is that & mut is not only used for mutable but also ensures the & mut world is exclusively accessed, compiler will not share this in multiple threads. I believe the component registering is eventually sequential anyway in this RwLock design, so this should be documented with warnings to users.

In fact, from my parallel computing experience, implicit locks and sync points are not desired in the base framework. The API query does not imply a lock semantic meaning. Now the sync point is hard to identify because there are many layers. It seems that it will happen both when queueing new component IDs and in the flush stage. The original API is explicit and deterministic because it enforced exclusive access, but the new query behavior is quite different. That is one major reason I believe the original API should not be removed to avoid unintended side effects. With the whole plan to make read-only world access, the locks of individual resources may cause further non-determinism in parallel execution because these locks are implicitly injected into original lock-free APIs.

In addition, the query may return & mut references to components in a read-only and immutable & context, which may cause confusion and should be mentioned in the documentation of the whole design intention (since not everybody knows the internals of an ECS). Therefore, I was in favor of a staged transition for new APIs instead of directly modifying old APIs.

If everyone is comfortable with this, then I consider the PR good to go.

@ElliottjPierce
Copy link
Contributor Author

What I was mentioning is that & mut is not only used for mutable but also ensures the & mut world is exclusively accessed, compiler will not share this in multiple threads. I believe the component registering is eventually sequential anyway in this RwLock design, so this should be documented with warnings to users.

In fact, from my parallel computing experience, implicit locks and sync points are not desired in the base framework. The API query does not imply a lock semantic meaning. Now the sync point is hard to identify because there are many layers. It seems that it will happen both when queueing new component IDs and in the flush stage. The original API is explicit and deterministic because it enforced exclusive access, but the new query behavior is quite different. That is one major reason I believe the original API should not be removed to avoid unintended side effects. With the whole plan to make read-only world access, the locks of individual resources may cause further non-determinism in parallel execution because these locks are implicitly injected into original lock-free APIs.

You're absolutely right about the compiler here, but I'm not sure what kind of warning would be appropriate. The only time this locks is the first time a world sees a component type. There's no locking during flushing or anything, and most queries would not lock besides those made during startup. This shouldn't cause any non-determinism because component registration is order independent.

I do get your concern though. Choosing which API is best is not up to me, but I think this is a win for most users. If anyone is worried about the order their components are registered in, they are probably registering them manually already.

In addition, the query may return & mut references to components in a read-only and immutable & context, which may cause confusion and should be mentioned in the documentation of the whole design intention (since not everybody knows the internals of an ECS). Therefore, I was in favor of a staged transition for new APIs instead of directly modifying old APIs. But if that what people desire,

Maybe I'm not understanding, but there is no way to get mutable world data from &World through a query without unsound unsafe code. I don't think there's anything to worry about here, but if I'm missing something, do let me know.

@chengts95
Copy link

chengts95 commented Jan 26, 2026

Maybe I'm not understanding, but there is no way to get mutable world data from &World through a query without unsound unsafe code. I don't think there's anything to worry about here, but if I'm missing something, do let me know.

I have forgotten if we could query & mut from & mut world. query, sorry, my bad. There is no such issue for query.

@chescock
Copy link
Contributor

So, I think the only way this could have perf problems is if a ton of query states are made about components that are not registered, and nothing flushes the world, spawns the component, or does anything in between to register it.

I think the issue is that this is exactly what happens when initializing the schedule for the first time! Components that get spawned by the startup schedule will be registered on spawn, but most other components are registered for the first time when we initialize a system that has a Query parameter, and so will be affected by this change.

Copy link
Contributor

@chescock chescock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, I'm glad this is happening!

I think someone needs to check the performance implications, both of always initializing AssetChanges<A> and of the extra atomic RwLocks during system initialization. But I don't feel qualified to know what's acceptable there, so I get to click Approve :).

));
}
self.insert_resource(assets)
.init_resource::<AssetChanges<A>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this used to be initialized only in the AssetChanged filter is to avoid the overhead of updating the resource if nothing was using it. There's an asset_events system that takes Option<ResMut<AssetChanges<A>>> and only updates it if the resource exists:

asset_changes: Option<ResMut<AssetChanges<A>>>,

I don't know enough about assets to evaluate how important that is, though.


If we do need to preserve that behavior, I think there are still ways to do it with only &World. Maybe a resource with a ConcurrentQueue<fn(&mut World)>? AssetChanged::init_state would push |world| world.init_resource::<AssetChanges<A>>() and an exclusive system that runs before(AssetEventSystems) would drain the queue and run the function pointers? AssetChanged filters are rare enough that the extra atomics during initialization shouldn't be too expensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not ideal. I'd like more information about the perf implications here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here. It's quite unfortunate to have this running for every asset type though - some asset types may mutate frequently (e.g., Materials) but those asset types are also most likely to actually take advantage of this feature.

I wouldn't block on preserving this unless folks complain (which seems unlikely). With assets-as-entities unblocked, we may also throw out this type entirely, so putting the work in to fix it seems a little early.

let event_key = world.register_event_key::<E>();
let components = B::component_ids(&mut world.components_registrator());
let components = B::component_ids(&mut world.components_registrator())
.collect::<smallvec::SmallVec<[ComponentId; 16]>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, this collect is because component_ids captures the lifetime of the ComponentsRegistrator type now that it's a generic parameter, which means this now conflicts with the world borrow in world.get_mut::<Observer>. And the compiler forces it to capture that lifetime, even though the return types are always actually 'static and the use<> syntax looks like it would support leaving some parameters out.

But this only actually allocates when creating an observer with more than 16 components, which should basically never happen and which is already allocating to box the system, so the cost is pretty low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. We could monomorphize component_ids ourselves to fix this, but that didn't feel worth it. And this should be fixable once we can specify the use<> bounds. I also tried adding a 'static bound, but that didn't work either.

@alice-i-cecile
Copy link
Member

@andriyDev, can I have your opinions here on the asset changes?

@ElliottjPierce
Copy link
Contributor Author

I think at this point we only have three unresolved questions:

Performance impact of queued component registration

For components that have not been registered, this adds a RwLock lock. The only time this would affect performance in a meaningful way is during the app's startup. I don't see this being an issue, but who knows.

Is this area's performance important? If so, I'll add some benches for it. (Right now, nothing benches app startup costs like registering systems.)

We can also consider World::flush_components in Query's SystemPara::init_state, which would ensure the lock is only acquired once per component type. But I honestly would guess the lock would be faster than checking if the queue is empty.

Does app startup performance matter enough to create benchmarks for it?

Performance impact of AssetChanges resource

The resource now always exists, meaning asset changes are now always tracked even when nothing is using that information. Is this used often enough or is this expensive enough for this to impact performance? I would guess that the few kinds of assets being changed (that would have these events) would also use the query filter, but I really don't know.

If it is a performance problem, we could do a concurrent queue to queue a command that would add the resource like @chescock suggested. We could also make a helper in AssetApp and ask users to enable asset change tracking before using the query filter. There might also be a way to use atomics within the resource to enable or disable it, but I'd prefer to avoid that.

Are there enough frequently changed asset types that don't use AssetChagnes for these changes to have an impact and if so, what solution would we like?

Impact on bevy-trait-query

I've looked at their implementation, and I think it should be pretty easy for them to migrate that lib. But I'd appreciate a second opinion. That's an important crate, and I'd hate to break it here.

@alice-i-cecile
Copy link
Member

Does app startup performance matter enough to create benchmarks for it?

For Bevy, IMO no. At least not unless we're talking about seconds. I also expect that windowing and rendering costs are going to absolutely dwarf micro-optimizations inside the ECS.

The resource now always exists, meaning asset changes are now always tracked even when nothing is using that information. Is this used often enough or is this expensive enough for this to impact performance? I would guess that the few kinds of assets being changed (that would have these events) would also use the query filter, but I really don't know.

My feeling is that this is in the noise. Virtually every asset type should be using AssetChanged, although not all of them currently are, leading to subtle bugs when they're updated.

If it is a performance problem, we could do a concurrent queue to queue a command that would add the resource like @chescock suggested. We could also make a helper in AssetApp and ask users to enable asset change tracking before using the query filter. There might also be a way to use atomics within the resource to enable or disable it, but I'd prefer to avoid that.

We should avoid adding complexity unless we have concrete evidence that this is a problem we need to solve. Even then, we should probably split that into a separate PR.

I've looked at their implementation, and I think it should be pretty easy for them to migrate that lib. But I'd appreciate a second opinion. That's an important crate, and I'd hate to break it here.

We could open a draft PR for the crate, targeting this branch, and see if we can get it to compile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ECS Entities, components, systems, and events C-Usability A targeted quality-of-life change that makes Bevy easier to use D-Complex Quite challenging from either a design or technical perspective. Ask for help! M-Migration-Guide A breaking change to Bevy's public API that needs to be noted in a migration guide M-Release-Note Work that should be called out in the blog due to impact S-Needs-Review Needs reviewer attention (from anyone!) to move forward X-Contentious There are nontrivial implications that should be thought through

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants