Skip to content

Commit

Permalink
Rewrite symlink sanitization logic
Browse files Browse the repository at this point in the history
We can rewrite symlinks to ensure they are always relative
and remain within the extraction directory.
  • Loading branch information
Andrew Fasano committed Feb 12, 2024
1 parent 49de1c6 commit 954c1cd
Showing 1 changed file with 61 additions and 56 deletions.
117 changes: 61 additions & 56 deletions unblob/extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,68 +47,73 @@ def is_recursive_link(path: Path) -> bool:
return False


def fix_symlink(path: Path, outdir: Path, task_result: TaskResult) -> Path:
"""Rewrites absolute symlinks to point within the extraction directory (outdir).
If it's not a relative symlink it is either removed it it attempts
to traverse outside of the extraction directory or rewritten to be
fully portable (no mention of the extraction directory in the link
value).
"""
if is_recursive_link(path):
logger.error("Symlink loop identified, removing", path=path)
error_report = MaliciousSymlinkRemoved(
link=path.as_posix(), target=os.readlink(path)
)
task_result.add_report(error_report)
path.unlink()
return path

raw_target = os.readlink(path)
if not raw_target:
logger.error("Symlink with empty target, removing.")
path.unlink()
return path

target = Path(raw_target)

if target.is_absolute():
target = Path(target.as_posix().lstrip("/"))
def sanitize_symlink_target(base_dir, current_dir, target):
# Normalize all paths to their absolute forms
base_dir_abs = os.path.abspath(base_dir)
current_dir_abs = os.path.abspath(current_dir)
target_abs = os.path.abspath(os.path.join(current_dir, target)) \
if not os.path.isabs(target) else os.path.abspath(target)

# Check if the target is absolute and within the base_dir
if os.path.isabs(target):
if target_abs.startswith(base_dir_abs):
return os.path.relpath(target_abs, current_dir_abs)
else:
# Target is absolute but outside base_dir - we'll pretend base_dir is our root
# and adjust the target to be within base_dir
abs = base_dir + "/" + os.path.relpath(target_abs, os.path.commonpath([target_abs, base_dir_abs]))
# We want to return the relative path from current_dir to the adjusted target
return os.path.relpath(abs, current_dir_abs)
else:
target = path.resolve()

safe = is_safe_path(outdir, target)

if not safe:
logger.error("Path traversal attempt through symlink, removing", target=target)
error_report = MaliciousSymlinkRemoved(
link=path.as_posix(), target=target.as_posix()
)
task_result.add_report(error_report)
path.unlink()
else:
relative_target = os.path.relpath(outdir.joinpath(target), start=path.parent)
path.unlink()
path.symlink_to(relative_target)
return path

# Target is relative
if target_abs.startswith(base_dir_abs):
# Target is relative and does not escape base_dir
return os.path.relpath(target_abs, current_dir_abs)
else:
# Target is relative and escapes base_dir
# Say we have foo/passwd -> ../../../etc/passwd with root at /host/test_archive
# from /host/test_archive/foo/passwd, we want to return ../etc/passwd which is the
# relative path from /host/test_archive/foo to /host/test_archive/etc/passwd
# without escaping /host/test_archive

for drop_count in range(0, len(target.split('/'))):
# We drop '..'s from the target by prepending placeholder directories until we get something valid
abs = current_dir + "/" + "/".join(["foo"] * drop_count) + target
resolved = os.path.abspath(abs)
if resolved.startswith(base_dir_abs):
break
else:
raise ValueError(f"Could not resolve symlink target {target} within base_dir {base_dir}")

# We need to add the /placeholder to the relative path because we need
# to act like a file within base_dir is our root (as opposed to base_dir itself)
return os.path.relpath(resolved, base_dir_abs + '/placeholder')

def fix_extracted_directory(outdir: Path, task_result: TaskResult):
def _fix_extracted_directory(directory: Path):
if not directory.exists():
return
for path in (directory / p for p in os.listdir(directory)):
try:
fix_permission(path)
if path.is_symlink():
fix_symlink(path, outdir, task_result)
continue
if path.is_dir():
_fix_extracted_directory(path)
except OSError as e:
if e.errno == errno.ENAMETOOLONG:
continue
raise e from None

base_dir = os.path.abspath(outdir)
for root, dirs, files in os.walk(base_dir, topdown=True):
fix_permission(Path(root))
for name in dirs+files:
try:
full_path = os.path.join(root, name)
if os.path.islink(full_path):
# Make symlinks relative and constrain them to the base_dir
target = os.readlink(full_path)
new_target = sanitize_symlink_target(base_dir, root, target)
if new_target != target:
os.remove(full_path)
os.symlink(new_target, full_path)
logger.info("Updated symlink", path=full_path, target=new_target)
else:
logger.debug("Symlink is already sanitized", path=full_path, target=new_target)
except OSError as e:
if e.errno == errno.ENAMETOOLONG:
continue
raise e from None

fix_permission(outdir)
_fix_extracted_directory(outdir)
Expand Down

2 comments on commit 954c1cd

@e3krisztian
Copy link
Contributor

@e3krisztian e3krisztian commented on 954c1cd Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea to rewrite offensive symlinks to point to a safe place instead of deleting them, as some heuristics could save them.
Cherry-picking this commit, and adding a simple fix_symlinks implementation just for the tests, it is quite close to pass existing test with only 2 test failures, and some strange warnings.

The implementation is also somewhat complex to validate due to the many cases, and is tricky to get right, as detailed below.
Tests could help to document what is happening with symlinks in certain cases (even if comments do help a lot), as it is quite complex code.
So this commit still needs some further work.


Unblob wants to ensure with fix_extracted_directory, that after unblob finished, no symlinks are pointing outside the directory.
This is a rather hard thing to ensure, as paths can have symlink "directory" paths, which could twist the string based logic, and can be exploited:

echo interesting > passwd
mkdir escaped-dir
(
  cd escaped-dir
  mkdir dir
  touch dir/passwd
  ln -s .. dir/root
  ln -s dir/root/../passwd passwd
  cat passwd
)

would create a tree

.
├── escaped-dir
│   ├── dir
│   │   ├── passwd
│   │   └── root -> ..
│   └── passwd -> dir/root/../passwd
└── passwd

would access ../passwd, while both of the symlinks looks valid and safe in isolation.
We try to catch things like above with the use of is_safe_path, which internally uses os.path.realpath and not abspath which uses normpath and is strings based, which can be tricked like above.

in escaped-dir, the difference is this:

>>> os.chdir("escaped-dir")
>>> Path("dir/root/../passwd").resolve()
PosixPath('/*/demo/passwd')
>>> os.path.realpath("dir/root/../passwd")
'/*/demo/passwd'
>>> os.path.abspath("dir/root/../passwd")
'/*/demo/escaped-dir/dir/passwd'

This is a very subtle and brittle difference, as realpath works like this only when there is a matching filesystem, as once the file path part no longer has a matching file-system object, it falls back to string manipulation.

Also, we had a lot of tests in tests/test_extractor.py that have checked problem cases with symlinks, these can be revived by adding the below to this file.

def fix_symlink(path: Path, outdir: Path, task_result: TaskResult) -> Path:
    # This is a temporary function for existing unit tests in tests/test_extractor.py
    fix_extracted_directory(outdir, task_result)
    return path

With the above fix_symlink, 2 tests are failing:

$ pytest
...
==================================================== short test summary info ====================================================
FAILED tests/test_extractor.py::test_fix_extracted_directory - AssertionError: assert (16512 & 511) == 509
FAILED tests/test_extractor.py::test_fix_symlink_chain_traversal - ValueError: Could not resolve symlink target .. within base...
=========================================== 2 failed, 874 passed in 80.65s (0:01:20) ============================================

However, there are new warnings reported, that pytest could not remove some directories. Looking them up, they have directories with strange permissions:

$ ls -ld /tmp/pytest-of-??/garbage-ef734404-d0cd-4d65-85e7-ec9c1bf5ba5e/test_fix_extracted_directory0/testdir2/
d-w------- 2 ?? ?? 4096 febr  14 15:45 /tmp/pytest-of-??/garbage-ef734404-d0cd-4d65-85e7-ec9c1bf5ba5e/test_fix_extracted_directory0/testdir2/

@e3krisztian
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.