[Feature]: Matched file filtering via IN (...) doesn't scale for large file sets

### Is your feature request related to a problem?

PR #4111 adds `scan_files_where_matches()` to identify files containing matching rows for DML rewrites (UPDATE now, MERGE/DELETE next). The second scan is restricted to matched files via a plan-time `IN (...)` predicate:
```rust
let file_list = valid_files
    .iter()
    .flat_map(|arr| arr.iter().flatten().map(|v| v.to_string()))
    .map(|v| lit(wrap_partition_value_in_dict(ScalarValue::Utf8(Some(v)))))
    .collect_vec();

let plan = LogicalPlanBuilder::scan("source", table_source, None)?
    .filter(col(FILE_ID_COLUMN_DEFAULT).in_list(file_list, false))?
    // ...
```

This builds one `Expr::Literal` per matched file. For rewrites touching thousands of files, that means large expression trees, slow planning, and memory overhead. This is a file level restriction implemented as a DF expression over the synthetic file-id column, not a Parquet row predicate.

### Describe the solution you'd like

**Scan-planning allowlist:** 

Instead of constructing `Expr::InList` with one literal per matched file, plumb an allowlist of matched file ids into the scan planning path and filter the candidate files before building Parquet `FileGroups` / `PartitionedFiles`. This avoids plan-time bloat while preserving true file pruning (don't read Parquet for excluded files).

**Optional safety net:** 

exec-time membership filtering in `DeltaScanExec` (e.g., per run via `split_by_file_id_runs()` from #4112) to guard against mixed batch edge cases. However, this happens after Parquet reads and doesn't provide I/O savings.

**Something like:**

1. Add `file_id_allowlist: Option<Arc<HashSet<String>>>` to `DeltaScanConfig`
2. Add `with_file_id_allowlist()` builder method
3. Filter in scan planning when assembling file list / file groups (before Parquet read plan)
4. Optionally also guard in `DeltaScanExec` for mixed-batch correctness

### Describe alternatives you've considered

Alternatives considered:
- **Semi-join against MemTable** - avoids literal list, uses DF join machinery. 
- **IN (subquery)** - depends on DF planner behavior
- **Chunked IN (...)** - still builds large expressions, just delays the cliff

### Priority

Medium - Would be helpful

### Additional context

**Related:**
- #4111 
- #4112

### Contribution

- [x] I'm willing to submit a pull request for this feature
- [x] I can help with testing this feature
- [x] I can help with documentation for this feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Matched file filtering via IN (...) doesn't scale for large file sets #4113

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Priority

Additional context

Contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Matched file filtering via IN (...) doesn't scale for large file sets #4113

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Priority

Additional context

Contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions