Skip to content

FileAlreadyExistsException in CopyTable with Position Delete Files #14589

@krisnaru

Description

@krisnaru

Apache Iceberg version

1.5.0

Query engine

Spark

Please describe the bug 🐞

Problem
When copying Iceberg tables with position delete files using SparkActions.copyTable(), the operation fails with FileAlreadyExistsException during parallel processing:
org.apache.hadoop.fs.FileAlreadyExistsException: /staging/00001-deletes.parquet already exists

This issue is reproducible when:
Multiple manifests reference the same position delete file (e.g., after manifest compaction)
Different position delete files have the same filename in different directories

Root Cause
The stagingPath() method only used the filename to generate staging paths:

// OLD CODE
private static String stagingPath(String originalPath, String stagingLocation) {
return stagingLocation + fileName(originalPath); // Only filename!
}

Collision scenarios:
dir1/00001-deletes.parquet → staging/00001-deletes.parquet
dir2/00001-deletes.parquet → staging/00001-deletes.parquet ❌ COLLISION
When Spark processes manifests in parallel (mapPartitions at line 784), multiple tasks simultaneously try to write to the same staging path, causing the exception.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions