Skip to content

Commit 3db5f9b

Browse files
randleeclaude
andcommitted
feat: Implement recursive tree diff algorithm (BUG-003 fix)
Replace flat extraction algorithm with unified recursive tree diff that: - Processes each node exactly once at its natural tree level - Produces hierarchical output matching code structure - Uses O(n) hash-based sibling matching - Supports parallel subtree comparison via ValueTask<T> - Skips identical subtrees for O(1) early termination - Includes cancellation token support throughout New files: - ITreeComparer interface with async/sync comparison methods - RecursiveTreeComparer implementing the recursive algorithm - ChangeExtensions with Flatten(), CountAll(), FindByName(), OfKind() - RecursiveTreeComparerTests with 12 comprehensive tests Changes: - SyntaxComparer now delegates to RecursiveTreeComparer - NodeMatcher gains ExtractImmediateStructuralChildren method - Old ExtractStructuralNodes marked [Obsolete] - Updated 17 test methods to use Flatten() for hierarchical output - Fixed validation test infrastructure for hierarchical changes Performance: 50-60x faster than thresholds (18ms for 50 methods) Test results: 760 passed, 10 failed (cross-format consistency, not BUG-003) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent d77104b commit 3db5f9b

File tree

22 files changed

+2840
-314
lines changed

22 files changed

+2840
-314
lines changed

README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -365,6 +365,54 @@ var output = formatter.FormatResult(result, new OutputOptions
365365

366366
See [docs/api.md](docs/api.md) for complete API documentation.
367367

368+
## Migration Notes
369+
370+
### Hierarchical Output (v2.0+)
371+
372+
The diff engine now uses a recursive tree algorithm that produces **hierarchical output**. Changes to nested members (methods, properties) appear in the `Children` property of their parent change.
373+
374+
**Before (flat output):**
375+
```json
376+
{
377+
"changes": [
378+
{ "kind": "Class", "name": "Calculator", "type": "Modified" },
379+
{ "kind": "Method", "name": "Add", "type": "Modified" },
380+
{ "kind": "Method", "name": "Multiply", "type": "Added" }
381+
]
382+
}
383+
```
384+
385+
**After (hierarchical output):**
386+
```json
387+
{
388+
"changes": [
389+
{
390+
"kind": "Class",
391+
"name": "Calculator",
392+
"type": "Modified",
393+
"children": [
394+
{ "kind": "Method", "name": "Add", "type": "Modified" },
395+
{ "kind": "Method", "name": "Multiply", "type": "Added" }
396+
]
397+
}
398+
]
399+
}
400+
```
401+
402+
**For backward compatibility**, use `Flatten()` to get flat output:
403+
404+
```csharp
405+
using RoslynDiff.Core.Models;
406+
407+
// Get hierarchical changes
408+
var changes = differ.Compare(oldContent, newContent, options).FileChanges[0].Changes;
409+
410+
// Flatten for backward compatibility
411+
var flatChanges = changes.Flatten().ToList();
412+
```
413+
414+
This change fixes BUG-003 where duplicate nodes could be reported when using the old flat extraction method.
415+
368416
## Documentation
369417

370418
- [Usage Guide](docs/usage.md) - Detailed CLI usage and examples

docs/api.md

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -544,6 +544,215 @@ public record OutputOptions
544544

545545
---
546546

547+
## Tree Comparison
548+
549+
### ITreeComparer Interface
550+
551+
Interface for recursive tree comparison with async support.
552+
553+
**Namespace:** `RoslynDiff.Core.Comparison`
554+
555+
```csharp
556+
public interface ITreeComparer
557+
{
558+
ValueTask<IReadOnlyList<Change>> CompareAsync(
559+
SyntaxTree oldTree,
560+
SyntaxTree newTree,
561+
DiffOptions options,
562+
CancellationToken cancellationToken = default);
563+
564+
IReadOnlyList<Change> Compare(
565+
SyntaxTree oldTree,
566+
SyntaxTree newTree,
567+
DiffOptions options);
568+
}
569+
```
570+
571+
#### Methods
572+
573+
##### CompareAsync
574+
575+
```csharp
576+
ValueTask<IReadOnlyList<Change>> CompareAsync(
577+
SyntaxTree oldTree,
578+
SyntaxTree newTree,
579+
DiffOptions options,
580+
CancellationToken cancellationToken = default)
581+
```
582+
583+
Compares two syntax trees asynchronously using recursive tree diff.
584+
585+
**Parameters:**
586+
- `oldTree` - The original syntax tree
587+
- `newTree` - The new syntax tree to compare
588+
- `options` - Options controlling comparison behavior
589+
- `cancellationToken` - Token to cancel the operation
590+
591+
**Returns:** Hierarchical list of `Change` objects representing differences
592+
593+
##### Compare
594+
595+
```csharp
596+
IReadOnlyList<Change> Compare(SyntaxTree oldTree, SyntaxTree newTree, DiffOptions options)
597+
```
598+
599+
Synchronous version of `CompareAsync`. Prefer async for large trees.
600+
601+
---
602+
603+
### RecursiveTreeComparer
604+
605+
Level-by-level recursive tree comparison implementation. Addresses BUG-003 (duplicate node detection).
606+
607+
**Namespace:** `RoslynDiff.Core.Comparison`
608+
609+
```csharp
610+
public sealed class RecursiveTreeComparer : ITreeComparer
611+
{
612+
public RecursiveTreeComparer();
613+
public RecursiveTreeComparer(NodeMatcher matcher, ParallelOptions parallelOptions);
614+
}
615+
```
616+
617+
#### Constructors
618+
619+
##### Default Constructor
620+
621+
```csharp
622+
public RecursiveTreeComparer()
623+
```
624+
625+
Creates instance with default `NodeMatcher` and parallel options based on processor count.
626+
627+
##### Custom Constructor
628+
629+
```csharp
630+
public RecursiveTreeComparer(NodeMatcher matcher, ParallelOptions parallelOptions)
631+
```
632+
633+
Creates instance with custom node matcher and parallel processing options.
634+
635+
**Parameters:**
636+
- `matcher` - Custom node matcher for comparing trees
637+
- `parallelOptions` - Options controlling parallelism (e.g., `MaxDegreeOfParallelism`)
638+
639+
#### Key Characteristics
640+
641+
- **O(n) complexity** with early termination for identical subtrees
642+
- **Hierarchical output** matching code structure
643+
- **Parallel subtree comparison** via `ValueTask` when beneficial
644+
- **Cancellation support** for long-running comparisons
645+
646+
#### Example
647+
648+
```csharp
649+
using RoslynDiff.Core.Comparison;
650+
using RoslynDiff.Core.Models;
651+
using Microsoft.CodeAnalysis.CSharp;
652+
653+
var oldTree = CSharpSyntaxTree.ParseText(oldContent);
654+
var newTree = CSharpSyntaxTree.ParseText(newContent);
655+
656+
var comparer = new RecursiveTreeComparer();
657+
var options = new DiffOptions { OldPath = "old.cs", NewPath = "new.cs" };
658+
659+
// Async (preferred for large files)
660+
var changes = await comparer.CompareAsync(oldTree, newTree, options);
661+
662+
// Sync (simpler use cases)
663+
var changes = comparer.Compare(oldTree, newTree, options);
664+
665+
// Changes are hierarchical - class changes contain method changes as Children
666+
foreach (var change in changes)
667+
{
668+
Console.WriteLine($"{change.Kind}: {change.Name} ({change.Type})");
669+
if (change.Children != null)
670+
{
671+
foreach (var child in change.Children)
672+
{
673+
Console.WriteLine($" - {child.Kind}: {child.Name} ({child.Type})");
674+
}
675+
}
676+
}
677+
```
678+
679+
---
680+
681+
### ChangeExtensions
682+
683+
Extension methods for working with hierarchical `Change` structures.
684+
685+
**Namespace:** `RoslynDiff.Core.Models`
686+
687+
```csharp
688+
public static class ChangeExtensions
689+
{
690+
public static IEnumerable<Change> Flatten(this IEnumerable<Change> changes);
691+
public static int CountAll(this IEnumerable<Change> changes);
692+
public static Change? FindByName(this IEnumerable<Change> changes, string name);
693+
public static IEnumerable<Change> OfKind(this IEnumerable<Change> changes, ChangeKind kind);
694+
}
695+
```
696+
697+
#### Methods
698+
699+
##### Flatten
700+
701+
```csharp
702+
public static IEnumerable<Change> Flatten(this IEnumerable<Change> changes)
703+
```
704+
705+
Flattens hierarchical changes into a single-level enumerable (depth-first).
706+
707+
**Use Case:** Backward compatibility with code expecting flat change lists.
708+
709+
```csharp
710+
// Hierarchical changes from RecursiveTreeComparer
711+
var hierarchicalChanges = comparer.Compare(oldTree, newTree, options);
712+
713+
// Flatten for backward compatibility
714+
var flatChanges = hierarchicalChanges.Flatten().ToList();
715+
```
716+
717+
##### CountAll
718+
719+
```csharp
720+
public static int CountAll(this IEnumerable<Change> changes)
721+
```
722+
723+
Counts all changes including nested children.
724+
725+
```csharp
726+
var totalChanges = changes.CountAll();
727+
```
728+
729+
##### FindByName
730+
731+
```csharp
732+
public static Change? FindByName(this IEnumerable<Change> changes, string name)
733+
```
734+
735+
Finds a change by name at any nesting level.
736+
737+
```csharp
738+
var methodChange = changes.FindByName("Calculate");
739+
```
740+
741+
##### OfKind
742+
743+
```csharp
744+
public static IEnumerable<Change> OfKind(this IEnumerable<Change> changes, ChangeKind kind)
745+
```
746+
747+
Gets all changes of a specific kind at any nesting level.
748+
749+
```csharp
750+
var methodChanges = changes.OfKind(ChangeKind.Method).ToList();
751+
var propertyChanges = changes.OfKind(ChangeKind.Property).ToList();
752+
```
753+
754+
---
755+
547756
## Class Matching
548757

549758
### ClassMatcher

docs/architecture.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -467,3 +467,87 @@ The CLI application using Spectre.Console.Cli.
467467
- No external dependencies
468468
- Works offline
469469
- Portable across systems
470+
471+
### 9. Recursive Tree Diff Algorithm (BUG-003 Fix)
472+
473+
**Decision:** Use recursive tree comparison instead of flat node extraction.
474+
475+
**Rationale:**
476+
- Fixes BUG-003 (duplicate node detection) where the old flat extraction method could report the same node multiple times
477+
- Each node is processed exactly once at its natural tree level
478+
- Produces hierarchical output that mirrors the code structure
479+
- Enables early termination when subtrees are identical (O(n) complexity)
480+
481+
**Old Approach (Flat Extraction):**
482+
```
483+
1. Extract ALL structural nodes from both trees (recursive descent)
484+
2. Match extracted nodes using name/signature
485+
3. Compare matched pairs
486+
```
487+
Problem: A method inside a class would be extracted both as a child of the class AND independently, causing duplicate detection.
488+
489+
**New Approach (Recursive Tree Comparison):**
490+
```
491+
1. Extract IMMEDIATE children only at each level
492+
2. Match siblings using O(n) hash-based lookup
493+
3. Recurse into matched pairs
494+
4. Report additions/removals at their natural level
495+
```
496+
Benefit: Each node is visited exactly once, and changes are reported hierarchically.
497+
498+
## Recursive Tree Comparison
499+
500+
The `RecursiveTreeComparer` implements level-by-level tree diffing:
501+
502+
```
503+
┌─────────────────────────────────────────────────────────────────┐
504+
│ RecursiveTreeComparer │
505+
├─────────────────────────────────────────────────────────────────┤
506+
│ │
507+
│ Old Tree New Tree │
508+
│ ──────── ──────── │
509+
│ Namespace Namespace │
510+
│ └── Class └── Class │
511+
│ ├── Method1 ├── Method1 (mod) │
512+
│ ├── Method2 ├── Method3 (new) │
513+
│ └── Property └── Property │
514+
│ │
515+
│ Algorithm: │
516+
│ 1. Extract immediate children at level N │
517+
│ 2. Match by (name, kind, signature) hash │
518+
│ 3. For matched pairs: compare → recurse if different │
519+
│ 4. Unmatched old = Removed, Unmatched new = Added │
520+
│ 5. Changes are nested: Class.Children contains Method changes │
521+
│ │
522+
├─────────────────────────────────────────────────────────────────┤
523+
│ Output: Hierarchical Changes │
524+
│ ───────────────────────── │
525+
│ Change(Class, Modified) │
526+
│ └── Children: │
527+
│ ├── Change(Method1, Modified) │
528+
│ ├── Change(Method2, Removed) │
529+
│ └── Change(Method3, Added) │
530+
└─────────────────────────────────────────────────────────────────┘
531+
```
532+
533+
### Key Classes
534+
535+
| Class | Responsibility |
536+
|-------|----------------|
537+
| `ITreeComparer` | Interface for tree comparison with sync/async support |
538+
| `RecursiveTreeComparer` | Level-by-level recursive comparison implementation |
539+
| `ChangeExtensions` | Helper methods for working with hierarchical changes |
540+
541+
### Hierarchical vs Flat Output
542+
543+
The recursive algorithm produces hierarchical `Change` objects where nested changes appear in the `Children` property. For backward compatibility, use `ChangeExtensions.Flatten()`:
544+
545+
```csharp
546+
// Hierarchical (new default)
547+
var changes = comparer.Compare(oldTree, newTree, options);
548+
// Changes[0].Children contains nested method/property changes
549+
550+
// Flat (backward compatible)
551+
var flatChanges = changes.Flatten().ToList();
552+
// All changes at same level, like the old behavior
553+
```

0 commit comments

Comments
 (0)