Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileExistsError during extraction with Netgear firmware #754

Closed
AndrewFasano opened this issue Feb 11, 2024 · 4 comments · Fixed by #756
Closed

FileExistsError during extraction with Netgear firmware #754

AndrewFasano opened this issue Feb 11, 2024 · 4 comments · Fixed by #756
Assignees
Labels
bug Something isn't working format:archive

Comments

@AndrewFasano
Copy link

AndrewFasano commented Feb 11, 2024

Describe the bug
During extraction of at least 29 NETGEAR firmware images, unblob may try creating the same output file twice triggering an exception. As a result, some files that should be extracted are not.

To Reproduce
Steps to reproduce the behavior:

  1. Download a sample firmware to trigger the bug with: wget https://www.downloads.netgear.com/files/GDC/M4100/M4100-V10.0.2.20.zip
  2. Launch unblob with command unblob -v M4100-V10.0.2.20.zip
  3. See error:
2024-02-10 23:39.13 [error    ] Unknown error happened while extracting chunk pid=2295991
Traceback (most recent call last):
  File "/unblob/unblob/processing.py", line 607, in _extract_chunk
    if result := chunk.extract(inpath, extract_dir):
  File "/unblob/unblob/models.py", line 115, in extract
    return self.handler.extract(inpath, outdir)
  File "/unblob/unblob/models.py", line 452, in extract
    return self.EXTRACTOR.extract(inpath, outdir)
  File "/unblob/unblob/handlers/archive/cpio.py", line 384, in extract
    parser.dump_entries(fs)
  File "/unblob/unblob/handlers/archive/cpio.py", line 215, in dump_entries
    fs.carve(entry.path, self.file, entry.start_offset, entry.size, mode=entry.mode & 0o777)
  File "/unblob/unblob/file_utils.py", line 511, in carve
    carve(safe_path, file, start_offset, size, mode=mode)
  File "/unblob/unblob/file_utils.py", line 294, in carve
    with carve_path.open("xb") as f:
  File "/usr/lib/python3.10/pathlib.py", line 1119, in open
    return self._accessor.open(self, mode, buffering, encoding, errors,
FileExistsError: [Errno 17] File exists: '/tmp/tmp1151iav4/M4100_V10.0.2.20.zip_extract/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed_extract/lib/libthread_db-1.0.so'

Expected behavior
This error should not be raised, instead additional files should be extracted. I made a simpel change in file_utils.py's carve method (see below) to return early if the target file already exists and with this change an extra 75 files are created in [extract_dir]/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed_extract. I doubt this is the right fix, but it shows that this bug prevents some files from being extracted.

Environment information:

  • OS: Ubuntu 22.04
  • Docker
Linux b4935d734f27 6.2.2 #3 SMP PREEMPT_DYNAMIC Wed Mar  8 12:03:22 EST 2023 x86_64 x86_64 x86_64 GNU/Linux

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

The following executables found installed, which are needed by unblob:
    7z                          ✓
    debugfs                     ✓
    jefferson                   ✓
    lz4                         ✓
    lziprecover                 ✓
    lzop                        ✓
    sasquatch                   ✓
    sasquatch-v4be              ✓
    simg2img                    ✓
    ubireader_extract_files     ✓
    ubireader_extract_images    ✓
    unar                        ✓
    zstd                        ✓

Additional context
I found this bug while doing some large-scale evaluations of filesystems produced by binwalk and unblob using fw2tar.

My (likely-incorrect) patch that results in additional files being created:

diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 21e887b..3db4b98 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py
@@ -291,6 +291,9 @@ def carve(carve_path: Path, file: File, start_offset: int, size: int):
     """Extract part of a file."""
     carve_path.parent.mkdir(parents=True, exist_ok=True)

+    if carve_path.exists():
+        print(f"Warning not replacing {carve_path}")
+        return
     with carve_path.open("xb") as f:
         for data in iterate_file(file, start_offset, size):
             f.write(data)

After fixing this, I got another error along the same vein in file_utils which I patched with:

diff --git a/unblob/file_utils.py b/unblob/file_utils.py
index 21e887b..3db4b98 100644
--- a/unblob/file_utils.py
+++ b/unblob/file_utils.py

@@ -579,7 +582,8 @@ class FileSystem:
         if safe_link:
             dst = safe_link.dst.absolute_path
             self._ensure_parent_dir(dst)
-            dst.symlink_to(src)
+            if not dst.exists():
+                dst.symlink_to(src)

     def create_hardlink(self, src: Path, dst: Path):
         """Create a new hardlink dst to the existing file src."""
@qkaiser qkaiser self-assigned this Feb 11, 2024
@qkaiser qkaiser added bug Something isn't working format:archive labels Feb 11, 2024
@qkaiser
Copy link
Contributor

qkaiser commented Feb 11, 2024

Thank you for the very detailed report. I had a quick look and it's probably a bug in the CPIO extractor. I'll keep you posted.

@qkaiser
Copy link
Contributor

qkaiser commented Feb 11, 2024

The bug is triggered by a CPIO archive with the same entry stored twice:

 7z l sample.cpio | grep thread_db-1.0
2012-01-11 23:17:40 .....        32220        32220  /lib/libthread_db-1.0.so
2012-01-11 23:17:40 .....        32220        32220  /lib/libthread_db-1.0.so

@qkaiser
Copy link
Contributor

qkaiser commented Feb 11, 2024

@AndrewFasano I opened the discussion on duplicate entries with a draft fix at #756 , your feedback is very welcomed :)

@e3krisztian
Copy link
Contributor

$ cpio -t < 'M4100-V10.0.2.20.zip_extract/m4100v10.0.2.20.stk_extract/1201148-2097967.lzma_extract/lzma.uncompressed'| sort | uniq -c | sort -n
...
      1 /var/run
      1 /var/run/utmp
      2 /lib/libthread_db-1.0.so
      2 /sbin/cfe_env
      2 /usr/bin/sort
      2 /usr/bin/tail
      2 /usr/bin/test
      2 /usr/bin/tftp
      2 /usr/bin/top
      2 /usr/bin/traceroute
      2 /usr/bin/uptime
      2 /usr/bin/wc
      2 /usr/bin/which
      2 /usr/bin/xargs
      2 /usr/bin/yes
      2 /usr/sbin/chroot

It looks like only a couple of binaries got patched during the build process (and maybe some previously non-existing added), so I think, the proper solution would be to overwrite duplicate entries on extracting cpio archives.

I would also limit the solution to the cpio extractor, and would not make general behavior change in FileSystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working format:archive
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants