Skip to content

Conversation

@csyshing
Copy link
Collaborator

@csyshing csyshing commented Jan 8, 2026

Hi,

Our production teams have recently reported performance issues when loading large scenes or activating/deactivating the parent prim of extensive environment sets.

While PR-4275 has provided significant improvement, the loading performance remains suboptimal from a user experience perspective, primarily because our production scenes often contain hundreds or thousands of instances and point instancers.

We believe that, in addition to PR-4275, further improvements can be made in two other places. This PR here as a starting point for review and discussion:

  • Confirming the validity and relevance of the problems outlined here.
  • Looking for feedback for the changes we made, that have any potential risks or overlooked issues, anything missing, and how to continue refining the code.

The example scene in this PR is a moderately sized scene, which is typical of our production assets, featuring a significant number of instances and point instancers, the scene has over 50k+ prims in the hierarchy. We are not able to share the actual production scene, but the issues described here should be reproducible with any USD file with instances, such as ALab.

1. Unnecessary UFE notifications are being triggered by root prototypes like "/__Prototype_xxxx".

The Problem:

  • These root prototypes cause a performance hit, taking about 2 seconds in this example scene (around 1000+).
  • In typical production scenes with many more instances, this loading time will increase proportionally.
  • The issue is observed in the stageChanged() function:
01_BEFORE_root_prototypes_reloading_stage.mp4
  • The issue is also observed when activating the parent prim of two environment sets:
01_BEFORE_root_prototypes_activate_prims.mp4

Since these prototypes are implicitly managed by USD, they are not user-editable, and are invisible in the Maya Outliner, it seems to be reasonable to ignore them to reduce the number of UFE notifications (commit b7838fe):

01_AFTER_ignore_root_prototypes_reloading_stage.mp4

2. Inefficient UsdHierarchy::children() performance

Closer examination of sendObjectAdd() and sendSubtreeInvalidate() reveals that UsdHierarchy::children() is called frequently to traverse the hierarchy to gather parent-child relationship information, notably, trace data indicates this process is currently running in a single-threaded manner:

02_BEFORE_children_call.mp4

We attempted to investigate optimization options, but found that the direct caller of UsdHierarchy::children() originates from libShared.so. This makes it difficult to control or modify within the maya-usd codebase. Furthermore, there is insufficient information regarding the expected operation and return behavior of UsdHierarchy::children():

(gdb) bt
#0  MayaUsd_v0::ufe::UsdHierarchy::children[abi:cxx11]() const (this=0x7f3373930270) at maya-usd/lib/usdUfe/ufe/UsdHierarchy.cpp
#1  0x00007f397dc7212b in  () at /path/to/maya/lib/libShared.so
#2  0x00007f397dc74e47 in  () at /path/to/maya/lib/libShared.so
#3  0x00007f397284b226 in Ufe_v4::Subject::notify(Ufe_v4::Notification const&) (this=this@entry=0x7f3936313610, notification=...) at ufe/ufe-full-python3.10-rhel8-linux/ufe/src/subject.cpp
#4  0x00007f397283e336 in Ufe_v4::Scene::notify(Ufe_v4::Notification const&) (this=0x7f3936313610, notification=...) at ufe/ufe-full-python3.10-rhel8-linux/ufe/src/scene.cpp
#5  0x00007f38a36b56f2 in UsdUfe_v0::StagesSubject::sendObjectAdd(std::shared_ptr<Ufe_v4::SceneItem> const&) const (this=<optimized out>, sceneItem=std::shared_ptr<Ufe_v4::SceneItem> (use count 6, weak count 0) = {...}) at maya-usd/lib/usdUfe/ufe/StagesSubject.cpp
#6  0x00007f38a36b779f in UsdUfe_v0::StagesSubject::stageChanged(pxrInternal_v0_24__pxrReserved__::UsdNotice::ObjectsChanged const&, pxrInternal_v0_24__pxrReserved__::TfWeakPtr<pxrInternal_v0_24__pxrReserved__::UsdStage> const&) (this=<optimized out>, notice=..., sender=...) at maya-usd/lib/usdUfe/ufe/StagesSubject.cpp

To address the performance bottleneck in traversing the USD hierarchy, particularly within the opaque call sequence between Ufe_v4::Subject::notify() and MayaUsd_v0::ufe::UsdHierarchy::children(), we implemented a two-pronged workaround: parallel pre-caching and refining traversal filters.

- Parallel Hierarchy Pre-caching

We introduced an upfront parallel caching mechanism for the USD hierarchy instead of the single threaded implementation:

  • Initial Call: On the first call to UsdHierarchy::children(), the proxy shape prim path is retrieved (it is assumed to be stable after onStageSet() or stageChanged()).
  • Parallel Build: The entire USD parent-child hierarchy is then pre-cached in parallel.
  • Subsequent Calls: All subsequent calls to UsdHierarchy::children() serve the requested data directly from this cache.
- Tweak Traversal Filters

We modified two specific traversal filters to improve performance:

  • UsdGeomPointInstancer Check:
    • We simplified the check to a direct prim.IsA<PXR_NS::UsdGeomPointInstancer>(), the original logic in UsdSceneItem::isPointInstance() included a check against _instanceIndex (first commit here). We suspect this check is irrelevant outside of a Hydra/Usd imaging context, as the _instanceIndex (defaulting to UsdImagingDelegate::ALL_INSTANCES, or -1) would cause `UsdSceneItem::isPointInstance() to always return false in a USD stage context.
  • Instance Prims Traversal:
    • We removed UsdTraverseInstanceProxies() from the traversal logic, unlike the original implementation in getUSDFilteredChildren().
    • This UsdTraverseInstanceProxies() was initially added here to fix an outliner display issue 6 years ago, but we are not able to reproduce that specific problem after removing UsdTraverseInstanceProxies(), so we are skipping the traversal of instance proxies to improve traversal performance.
outliner_displaying_for_instances_and_noninstances

The end result, as shown in commit 9011124, demonstrates a significant performance improvement. The hierarchy traversal time dropped sharply from approximately 11 seconds to less than 100ms, with the cache::getChildren() execution running in a multi-threaded environment:

02_AFTER_improve_children_onStageSet.mp4
02_AFTER_improve_children_stageChanged.mp4

We are seeking clarification on several points regarding the use of UsdHierarchy::children(), as information about its direct callers is quite limited.

Our specific questions are:

  • Caching Design:
    • We implemented begin/end guards around onStageSet() and stageChanged() to limit the scope of children caching to these two operations. This was done because we lack a precise way to determine when a sequence of UsdHierarchy::children() calls concludes. Ideally, this control would come from the caller side (likely within libShared.so). We would appreciate feedback on whether this current begin/end caching design can be improved with more detailed knowledge of its usage, also seeking confirmation about the expected return value for UsdHierarchy::children().
  • Observed Behavior After Multi-threading:
    • Theoretically, subsequent calls to UsdHierarchy::children() should still execute as before. However, after moving the traversal into a multi-threaded context, we observed a significant reduction in the number of UsdHierarchy::children() calls.
    • While the final result appears correct, we are unsure if our "fix" truly addressed the intended problem or if we have fundamentally misunderstood the role or mechanism of UsdHierarchy::children(). This is our main question for Autodesk.
    • We have been using the environment variable MAYAUSD_DEBUG_HIERARCHY_CHILDREN_CACHE to compare single-threaded vs. multi-threaded results and have not found discrepancies, with the exception of instances and point instancers.
    • All existing unit tests are passing, which suggests the change hasn't introduced regressions.

Thanks and looking forward to receiving feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Maya UFE : Child Prims Not Displayed in Maya Outliner Under Instanceable Prim

1 participant