Editorial: Use the dfs stack instead of [[DFSAncestorIndex]] #3637

nicolo-ribaudo · 2025-07-04T14:01:38Z

This is an attempt at removing [[DFSAncestorIndex]], similarly to how #3625 removed [[DFSIndex]], by slightly modifying Tarjan's algorithm to use the stack depth instead of the module discovery index.

Even though they do not represent how I came up with this different approach, I tried to split the changes in 5 separate commits that can be reviewed one-by-one for correctness. The rest of the commits to the same change for InnerModuleEvaluation.

This changes the editorial aspects but does not simplify implementations: they'd still need to keep track of a per-traversal number (just now it's the stack length rather than the discovery length) and a per-module number (unless they prefer to do an O(number of nodes in a SCC) loop for each cycle detected while traversing the graph, to find an index in a list).

Again, like for #3635, this change is not objectively good and I'm curious to se what editors think about it.

You can find a demo of the updated algorithm at https://nicolo-ribaudo.github.io/es-module-evaluation/#s=ICBBCiBCIEMKICBECiBFIEkKRiAgSgpIIEc%3D&c=QSAtPiBCCkIgLT4gQwpCIC0%2BIEQKQyAtPiBBCkMgLT4gRApEIC0%2BIEUKRSAtPiBGCkYgLT4gRwpHIC0%2BIEgKRyAtPiBFCkggLT4gRgpEIC0%2BIEkKSSAtPiBKCkogLT4gSQ%3D%3D&a=&f=

Editorial: Use stack index instead of traverse index for SCC detection

To detect strongly connected components in module graphs, we need to
use a number that increases as we go further away from the graph root
on any branch of the DFS tree. This is so that when a node has a child
that (1) is still not finalized yet, and (2) has a lower number than
the node itself, we know that the child is an ancestor of the node
and thus we are in a strongly connected component. Strongly connected
components roots are thus nodes from which it is not possible to reach
a non-finalized node with a lower number.

For this Tarjan's algorithm uses the discovery index of each node, but
other indexes that have the same property are:

the depth of the node in the DFS tree

the index of the node in the stack that contains yet-to-be-finalized
nodes.

This commit updates the SCC discovery logic to use this last option,
rather than carrying around the discovery index.

(Note: this first commit is the most complex one. A proof of its correctness is in #3637 (comment))

Editorial: Do not constantly update [[DFSAncestorIndex]] as we find lower values

Instead, only update it at the end of InnerModuleLinking if the current
module is a non-root node of a SCC. We do not need to update its value
in the loop through a module's dependencies because, even if there
was a loop leading back to the node currently being processed, the
[[DFSAncestorIndex]] value propagated back across the loop would not
cause the module's [[DFSAncestorIndex]] to decrease.

Editorial: return the SCC ancestor index from InnerModuleLinking

Rather than returning ~~unused~~ and then reading it from the
[[DFSAncestorIndex]] slot. Also return the index for modules that
are not participating in cycles (returning the module index itself),
to avoid having a separate path in the InnerModuleLinking loop.

Editorial: Do not update [[DFSAncestorIndex]] with the final low index

The lowest possible [[DFSAncestorIndex]] will already be propagated back
to the SCC root through one of the SCC branches. InnerModuleLinking
does not need modules to have their actually lowest possible
[[DFSAncestorIndex]] set to know that they are a non-root node of a SCC:
it's enough to know that from that node it's possible to reach one
other node with a lower index, and that's signaled by the return value.

Editorial: Do not use [[DFSAncestorIndex]] in InnerModuleLinking

[[DFSAncestorIndex]] index is only set once per module during a given
traversal process, and it represents the module's index in stack.
Instead explicitly storing it, we can compute it given stack when
needed (i.e. when we process an edge that "closes" a loop).

guybedford · 2025-07-04T15:06:10Z

First some background - strongly connected components affect execution in the module system in only subtle ways:

Error state propagation together as a single unit
Cycle root handling in supporting TLA attachment when multiple top-level entry points try to execute the same strongly connected component, allowing them to coordinate execution state through the same shared component root module [[CycleRoot]].

That's it! Otherwise strongly connected components are indeed relatively unobservable to end users. But correctly classifying strongly connected components is a fundamental invariant.

Now for (1) we don't formally propagate errors through the entire strongly connected component, we actually only propagate errors up the current stack of a strongly connected component.

Your simplification based on this observation does seem to be unobservable so far as (1) is concerned.

But, as mentioned when we discused this, your change very much does break the Tarjan algorithm's detection in being able to identify the strongly connected components correctly per Tarjan's algorithm.

Simple counter example where [1,1] indicates the final DFSIndex 1, DFSAncestorIndex 1 states.

A [0, 0]
A -> B [1,1]
A -> C [2, 1]
A -> D [3, 3]
D -> E [4, 3]

Under your scheme you would not correctly classify the two separate strongly connected components { B, C } and { D, E }, and instead classify both B and D as having the same ancestor index 1 since you are incorrectly conflating the component root index with the stack index.

That is you would transition all 4 modules together instead of handling them separately. This would also incorrectly set the cycle root in the evaluation code. The comparison between sccAncestorIndex and moduleIndex is also invalid as the stack indexing diverges from the module indexing for large parallel graphs.

As a side note - the onus is on you here to prove your modification or Tarjan's algorithm handles the edge cases, not on me to point out the counter examples.

nicolo-ribaudo · 2025-07-04T15:42:00Z

As a side note - the onus is on you here to prove your modification or Tarjan's algorithm handles the edge cases, not on me to point out the counter examples.

I understand, that's why I split the change in small commits each of them trying to carefully explain why the change is correct 😅

Assuming that the example you mean is the following, which yields those [dfs index, dfs ancestor index] pairs and those two SCCs:

This is what happens with the updated algorithm:

InnerModuleEvaluation(A):
- stack depth: 0
- A's _sccAncestorIndex_: 0
- recursion (stack: [A]):
  - InnerModuleEvaluation(B):
    - stack depth: 1
    - B's _sccAncestorIndex_: 1
    - recursion (stack: [A, B]):
      - InnerModuleEvaluation(C):
        
        stack depth: 2
        
        C's _sccAncestorIndex_: 2
        
        recursion (stack: [A, B, C]):
        
        InnerModuleEvaluation(B):
        
        returns B's stack depth: 1 → C's _sccAncestorIndex_ is updated to 1
        
        returns C's _sccAncestorIndex_: 1 → B's _sccAncestorIndex_ is not updated
    - B's _sccAncestorIndex_ is the same as its stack depth (1): we detected a SCC!. The member of the SCC are B and all the subsequent elements in the stack (so B,C). We reset the stack to [A].
    - returns B's _sccAncestorIndex_: 1 → A's _sccAncestorIndex_ is not updated
  - InnerModuleEvaluation(D):
    - stack depth: 1
    - D's _sccAncestorIndex_: 1
    - recursion (stack: [A, D]):
      - InnerModuleEvaluation(E):
        
        stack depth: 2
        
        E's _sccAncestorIndex_: 2
        
        recursion (stack: [A, D, E]):
        
        InnerModuleEvaluation(D):
        
        returns D's stack depth: 1 → E's _sccAncestorIndex_ is updated to 1
        
        returns E's _sccAncestorIndex_: 1 → D's _sccAncestorIndex_ is not updated
    - D's _sccAncestorIndex_ is the same as its stack depth (1): we detected a SCC!. The member of the SCC are D and all the subsequent elements in the stack (so D,E). We reset the stack to [A].
    - returns D's _sccAncestorIndex_: 1 → A's _sccAncestorIndex_ is not updated
- A's _sccAncestorIndex_ is the same as its stack depth (1): we detected a SCC!. The members of the SCC are A and all the subsequente elements in the stack (so just A). We reset the stack to [].

The updated algorithm correctly detects the three SCCs: BC, DE, and A.

Note that, both before and after this PR, the SCC detection rule is not "a set of module with the same associated number is a SCC". Already before these changes, multiple modules in the same SCC can have different [[DFSAncestorIndex]] values. For example (interactive):

Here the [[DFSIndex]], [[DFSAncestorIndex]] values are

A: [0,0]
B: [1,1]
C: [2,1]
D: [3,2]
E: [4,1]

Even though D is in the same SCC as B/C/E.

Instead, the rule to tell that a module X is not a SCC root is that from X's outgoing edges you can reach a node with index lower than X. Then, if a module is a SCC root (so if the lowest index you can reach from X is X's index), the contents of the SCC is the list of modules in the stack starting from X.

The comparison between sccAncestorIndex and moduleIndex is also invalid as the stack indexing diverges from the module indexing for large parallel graphs.

Yes, and that's fine. When you have large parallel graphs (that are not in the same SCC), one of them is going to be processed before the other. What happens in the first of them has no effect at all on what SCCs there are in the second one. We can forget that we actually already used some indexes for that first completed graph part, and we can reuse them.

Edit: I'll publish a forked version of https://nicolo-ribaudo.github.io/es-module-evaluation with these changes.

nicolo-ribaudo · 2025-07-09T09:23:12Z

I have a more precise proof that using the stack depth is equivalent to using the discovery index.

Consider a module graph that contains a module M, with the stack at the time M is first discovered (right after pushing M to it) being « s₀, s₁, ..., s_n-1, M = s_n ».

For a given module X, the discovery index of X is D(X) and the depth of the stack right before pushing it is S(X). Next(X) is the module that is discovered right after X (that is, D(Next(X)) = D(X)+1).

[*] There are two possible cases. One is the trivial one, when we are going down the left-most branch of the DFS tree, so D(s₀) = 0, D(s₁) = 1, ..., D(s_n) = n. In this case obviously D(s_i)=S(s_i) for each i.

If we are not in that situation, then there is a smallest j such that D(s_j) ≠ S(s_j) (that is, D(s_j) ≠ j). j > 0, because s₀ is the root module of the graph, which is pushed in the stack first and removed at the end of the whole process. Thus we have D(s₀)=S(s₀)=0, D(s₁)=S(s₁)=1, ..., D(s_j-1)=S(s_j-1)=j-1.

Now, we know that s_j ≠ Next(s_j-1) (otherwise D(s_j)=D(s_j-1)+1=(j-1)+1=j, which violates the condition on which we chose j). We also know that s_j is not reachable from Next(s_j-1), otherwise Next(s_j-1) would be on the stack between s_j-1 and s_j (which is impossible, since they are by definition consecutive in the stack).

Let { O₁...O_k } be the set of all k modules reachable from Next(s_j-1), Next(s_j-1) included, ordered such that D(O₁) < ... < D(O_k).

[1] s_j is not in that set, because it's not reachable from Next(s_j-1). Given that s_j is reachable from all of s₀,...,s_j-1, all of those modules are also not in that set.
[2] Given that j has been chosen to be minimal, s₀,s₁,...,s_j-1,O₁ are the beginning of the left-most branch of the DFS tree for s₀'s evaluation.

O₁...O_k are all discovered before s_j, so

D(s_j-1) = S(s_j-1) = j-1
D(O₁) = D(s_j-1) + 1 (by definition of O₁)
D(O₂) = D(s_j-1) + 2
...
D(O_k) = D(s_j-1) + k
D(s_j) > D(O_k)

Given [1] and [2], Evaluation(s₀) is equivalent to first doing Evaluation(O₁) (which evaluates all and only the modules { O₁...O_k }) followed then by Evaluation(s₀). This has no effect on evaluation order, SCC detection, or on the modules in the stack when discovering a module not in the { O₁...O_k } set.

The effect of Evaluation(O₁) is that when we will then do Evaluation(s₀) all the modules { O₁...O_k } will have already been evaluated, thus we skip them in the DFS tree. Let's call D'(...) the discovery indexes of this Evaluation(s₀) that happens after Evaluation(O₁).

We have that for each module P:

if D(P) > D(O_k), then D'(P) = D(P) - k < D(P).
if D(P) < D(O₁), then D'(P) = D(P).

Thus we have that:

D'(s₀)=D(s₀)=S(s₀), ..., D'(s_j-1)=D(s_j-1)=S(s_j-1)
D'(s_j-1) < D'(s_j) < D(s_j)
D'(s_j+1) < D(s_j+1)
...
D'(M) < D(M)

Now, we repeat from [*]:

either D'(M)=S(M), in which case we are done because we reached an equivalent situation in which the discovery index and the stack depth index of M correspond
or D'(M)>S(M), in which case we do the whole process again to pre-evaluate a subtree while keeping the equivalence, obtaining a D''(M)<D'(M)<D(M). Iterating, we keep decreasing the discovery index of M, until when it must converge to being the same as S(M) (because we are decreasing it by at least 1 at each iteration).

guybedford · 2025-07-17T01:07:10Z

Took a brief look at this today, didn't have time for a full review again but some initial thoughts:

We also know that sj is not reachable from Next(sj-1), otherwise Next(sj-1) would be on the stack between sj-1 and sj (which is impossible, since they are by definition consecutive in the stack).

This isn't true - in the simplest case if sj has a dependency on sj-1, then it cyclically can reach back to Next(sj-1). Unless you have another definition of reachability in mind?

What would really help me is to see how your algorithm can classify strongly connected components such that it fully detects them all despite not tracking dfs ancestor index. It might also help here to implement the algorithm and have it spit out the list of strongly connected components and we could then fuzz on that various edge cases. Specifically the problem is not the execution order - it is correctly identifying the strongly connected components. It does seem to me likely that execution order is likely unaffected here, while I'm still pretty sure strongly connected component detection is.

Otherwise happy to do it the slow way as well in carefully reviewing your approach, it just will take me more time to dig into this review thoroughly.

guybedford · 2025-07-17T01:33:14Z

Thinking about this some more... the important point is that per your first counter example case, because we skip over modules already linked, any reaching back to formerly visited graphs is always excluded from cycle analysis and correctly by strongly connected component isolation. That was the part I missed - that linked modules don't reflect their numberings anymore, since they are already discarded as we're done with their strongly connected component detection per your mathematical argument.

Definitely important to review further carefully, but this may well actually be correct. I don't mean to sound so surprised, but it would be a very impressive simplification...!

spec.html

nicolo-ribaudo · 2025-07-17T13:22:59Z

@guybedford Thanks for taking a look again :)

I updated https://nicolo-ribaudo.github.io/es-module-evaluation to match this PR, and to list the SCCs as they are "completed". This is an example with some interesting cycles.

That demo uses the Evaluate() method rather than Link(), so it doesn't exactly match this PR and the algorithm step numbers it lists might be slightly off (they match the current status of the spec). You can read its code at https://github.com/nicolo-ribaudo/es-module-evaluation/blob/f072b67c076e6e26073c0cb4e8a21fb33cad746b/src/utils/ecma262/evaluation.js#L79.

The version in the main branch matches the current ECMA-262 version. If you want to try it locally, just serve the root of the repo. There are no dependencies or build steps.

We also know that s_j is not reachable from Next(s_j-1), otherwise Next(s_j-1) would be on the stack between s_j-1 and s_j (which is impossible, since they are by definition consecutive in the stack).

This isn't true - in the simplest case if s_j has a dependency on s_j-1, then it cyclically can reach back to Next(s_j-1). Unless you have another definition of reachability in mind?

In that case we would have that Next(s_j-1) is potentially reachable from s_j, not that s_j is potentially reachable from Next(s_j-1):

In the flipped case, where s_j-1 depends on Next(s_j-1) and on s_j, and Next(s_j-1), we have the contradiction: when visiting s_j the stack would be « ..., s_j-1, Next(s_j-1), ..., s_j », because:

we wouldn't pop Next(s_j-1) when we are "done" with it, because it has a dependency on one of its ancestors
we would only pop s_j-1's SCC (which includes Next(s_j-1)) when we are done visiting all of s_j-1's dependencies, so only after visiting s_j and pushing it on the stack

guybedford · 2025-07-17T14:29:12Z

To clarify - I mean in the combined cycle cases where both cycles in your images appear.

Another interesting cycle case is when successive direct dependencies each import a higher stack item. So say A -> B -> C -> D, and then D -> E1, E2, E3, E4, where E1 -> D, E2 -> C, E3 -> B. The stack just continues to append along the dependencies so they still get transitioned together so I think that proves out the last of it?

nicolo-ribaudo · 2025-07-17T14:37:07Z

To clarify - I mean in the combined cycle cases where both cycles in your images appear.

Even in this case, when we visit s_j we would still have Next(s_j-1) on the stack.

Another interesting cycle case is when successive direct dependencies each import a higher stack item. So say A -> B -> C -> D, and then D -> E1, E2, E3, E4, where E1 -> D, E2 -> C, E3 -> B. The stack just continues to append along the dependencies so they still get transitioned together so I think that proves out the last of it?

Right, exactly. Demo: https://nicolo-ribaudo.github.io/es-module-evaluation/#s=IEEKIEIKIEMKICAgRTEgRTIgRTMKIEQ%3D&c=QSAtPiBCCkIgLT4gQwpDIC0%2BIEQKRCAtPiBFMQpEIC0%2BIEUyCkQgLT4gRTMKRTEgLT4gRApFMiAtPiBDCkUzIC0%2BIEI%3D&a=&f=

To detect strongly connected components in module graphs, we need to use a number that increases as we go further away from the graph root on any branch of the DFS tree. This is so that when a node has a child that (1) is still not finalized yet, and (2) has a lower number than the node itself, we know that the child is an ancestor of the node and thus we are in a strongly connected component. Strongly connected components roots are thus nodes from which it is not possible to reach a non-finalized node with a lower number. For this Tarjan's algorithm uses the discovery index of each node, but other indexes that have the same property are: - the depth of the node in the DFS tree - the index of the node in the stack that contains yet-to-be-finalized nodes. This commit updates the SCC discovery logic to use this last option, rather than carrying around the discovery index.

…ower values Instead, only update it at the end of InnerModuleLinking if the current module is a non-root node of a SCC. We do not need to update its value in the loop through a module's dependencies because, even if there was a loop leading back to the node currently being processed, the [[DFSAncestorIndex]] value propagated back across the loop would not cause the module's [[DFSAncestorIndex]] to decrease.

Rather than returning ~unused~ and then reading it from the [[DFSAncestorIndex]] slot. Also return the index for modules that are not participating in cycles (returning the module index itself), to avoid having a separate path in the InnerModuleLinking loop.

The lowest possible [[DFSAncestorIndex]] will already be propagated back to the SCC root through _one_ of the SCC branches. InnerModuleLinking does not need modules to have their actually lowest possible [[DFSAncestorIndex]] set to know that they are a non-root node of a SCC: it's enough to know that from that node it's possible to reach _one_ other node with a lower index, and that's signaled by the return value.

[[DFSAncestorIndex]] index is only set once per module during a given traversal process, and it represents the module's index in _stack_. Instead explicitly storing it, we can compute it given _stack_ when needed (i.e. when we process an edge that "closes" a loop).

guybedford reviewed Jul 17, 2025

View reviewed changes

spec.html Show resolved Hide resolved

nicolo-ribaudo changed the title ~~Editorial: Do not use [[DFSAncestorIndex]] in InnerModuleLinking~~ Editorial: Use the dfs stack instead of [[DFSAncestorIndex]] Jul 18, 2025

nicolo-ribaudo added 8 commits August 7, 2025 17:54

Same thing for InnerModuleEvaluation

6018e17

Remove [[DFSAncestorIndex]]

dbcac86

Add assertion per Guy's review

6649775

nicolo-ribaudo force-pushed the remove-dfsancestordepth branch from 347b9ca to 6649775 Compare August 7, 2025 15:55

nicolo-ribaudo marked this pull request as ready for review August 7, 2025 17:43

nicolo-ribaudo mentioned this pull request Aug 7, 2025

Editorial: dedupe and isolate modules traversal logic #3635

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Editorial: Use the dfs stack instead of [[DFSAncestorIndex]] #3637

Editorial: Use the dfs stack instead of [[DFSAncestorIndex]] #3637

nicolo-ribaudo commented Jul 4, 2025 •

edited

Loading

Uh oh!

guybedford commented Jul 4, 2025

Uh oh!

nicolo-ribaudo commented Jul 4, 2025 •

edited

Loading

Uh oh!

nicolo-ribaudo commented Jul 9, 2025 •

edited

Loading

Uh oh!

guybedford commented Jul 17, 2025 •

edited

Loading

Uh oh!

guybedford commented Jul 17, 2025

Uh oh!

Uh oh!

nicolo-ribaudo commented Jul 17, 2025

Uh oh!

guybedford commented Jul 17, 2025

Uh oh!

nicolo-ribaudo commented Jul 17, 2025

Uh oh!

Uh oh!

Editorial: Use the dfs stack instead of [[DFSAncestorIndex]] #3637

Are you sure you want to change the base?

Editorial: Use the dfs stack instead of [[DFSAncestorIndex]] #3637

Conversation

nicolo-ribaudo commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guybedford commented Jul 4, 2025

Uh oh!

nicolo-ribaudo commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicolo-ribaudo commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guybedford commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guybedford commented Jul 17, 2025

Uh oh!

Uh oh!

nicolo-ribaudo commented Jul 17, 2025

Uh oh!

guybedford commented Jul 17, 2025

Uh oh!

nicolo-ribaudo commented Jul 17, 2025

Uh oh!

Uh oh!

nicolo-ribaudo commented Jul 4, 2025 •

edited

Loading

nicolo-ribaudo commented Jul 4, 2025 •

edited

Loading

nicolo-ribaudo commented Jul 9, 2025 •

edited

Loading

guybedford commented Jul 17, 2025 •

edited

Loading