NAME

Overview of sandboxing with Syd

SANDBOXING

The list of available sandboxing categories is given below:

stat	Confine file metadata accesses. This sandboxing category may be used to effectively hide files and directories from the sandbox process. List of filtered system calls are access(2), faccessat(2), faccessat2(2), getdents64(2), readlink(2), readlinkat(2) stat(2), fstat(2), lstat(2), statx(2), newfstatat(2), getxattr(2), getxattrat(2), lgetxattr(2), fgetxattr(2), listxattr(2), listxattrat(2), flistxattr(2), llistxattr(2), statfs(2), statfs64(2), fstatfs(2), fstatfs64(2), fanotify_mark(2), and inotify_add_watch(2). In addition, paths may be masked using the mask command. In this case, all filtered system calls on the path will be executed on the character device /dev/null instead. See the description of the mask command in syd(2) manual page for more information.
walk	Confine path traversals. This sandboxing category is used during path canonicalization to confine path traversals. As such, its arguments are not necessarily fully canonicalized paths but they're guaranteed to be absolute paths without any . (dot) or .. (dotdot) components. It has been split from the stat category as of version 3.39.0. Together with the stat category, path hiding provides a full implementation resilient against attempts to unhide otherwise hidden paths by passing through them during path canonicalization. Notably, OpenBSD's unveil(2) pioneered similar capabilities and remains a widely respected, mature reference implementation.
read	Confine file reads. List of filtered system calls are open(2), openat(2) and openat2(2) with the O_RDONLY or O_RDWR flags.
write	Confine file writes. List of filtered system calls are open(2), openat(2) and openat2(2) with the O_WRONLY or O_RDWR flags.
exec	Confine binary execution and dynamic library loading. The list of filtered system calls are execve(2), execveat(2), mmap(2), mmap2(2), and memfd_create(2). For scripts access check is done for both the script and the interpreter binary. As of version 3.16.3, Syd checks the paths of the dynamic libraries an executable is linked against for exec access as well. This only works for ELF binaries. As of version 3.21.2, Syd seals memory file descriptors as non-executable by default, therefore memory file descriptors are not checked for exec access unless the option trace/allow_unsafe_memfd:1 is set to lift this restriction. As of version 3.21.3, Syd hooks into mmap(2) and mmap2(2) system calls and checks the file descriptor for exec access when the memory protection mode includes PROT_EXEC and flags does not include MAP_ANONYMOUS which typically indicates a dlopen(3). Therefore, libraries dynamically loaded at runtime are checked for exec access as well. In addition, SegvGuard is used to deny execution if binary is crashing repeatedly which is similar to the implementation of Grsecurity & HardenedBSD. See the SegvGuard section for more information.
ioctl	Confine ioctl(2) requests. Use lock/ioctl to confine ioctl(2) system call for filesystem access. This feature may be used to effectively access GPU, PTY, DRM, and KVM etc. safely. ioctl(2) requests may be allowed or denied by adding them to the respective list using the options allow/ioctl+ and deny/ioctl+. As of version 3.38.0, architecture-agnostic ioctl(2) decoding was introduced, allowing ioctls to be specified by name in addition to numeric values. See the syd(2) manual page for more information.
create	Confine creation of regular files and memory file descriptors. List of filtered system calls are creat(2), mknod(2), mknodat(2), and memfd_create(2). In addition, open system calls open(2), openat(2), and openat2(2) are filtered if the flag O_CREAT is set and the flag O_TMPFILE is not set in arguments. memfd_create(2) name argument is prepended with !memfd: before access check. Use e.g. deny/create+!memfd:** to deny access to memory file descriptors regardless of name. As of version 3.37.0, memfd_create(2) name argument is prepended with !memfd-hugetlb: before access check in case flags include MFD_HUGETLB.
delete	Confine file deletions. List of filtered system calls are unlink(2) and unlinkat(2). As of version 3.33.0, unlinkat(2) is confined by this category if and only if AT_REMOVEDIR is not set in flags, otherwise it's confined by the rmdir category.
rename	Confine file renames and hard links. List of filtered system calls are rename(2), renameat(2), renameat2(2), link(2), and linkat(2).
symlink	Confine creation of symbolic links. List of filtered system calls are symlink(2) and symlinkat(2).
truncate	Confine file truncations. List of filtered system calls are truncate(2), truncate64(2), ftruncate(2), ftruncate64(2), and fallocate(2). In addition, open system calls open(2), openat(2), and openat2(2) are filtered if the flag O_TRUNC is set in arguments and the flags O_TMPFILE or O_CREAT are not set in arguments.
chdir	Confine directory changes. List of filtered system calls are chdir(2) and fchdir(2). Additional hardening may be achieved using the trace/deny_dotdot:1 option to deny parent directory traversals. It is possible to set this option at runtime before sandbox is locked. This allows for incremental confinement. See the Path Resolution Restriction For Chdir and Open Calls section for more information.
readdir	Confine directory listings. List of filtered system calls are open(2), openat(2), and openat2(2) when they're called on an existing directory regardless of the O_DIRECTORY flag.
mkdir	Confine creation of directories. List of filtered system calls are mkdir(2), mkdirat(2), mknod(2) and mknodat(2).
rmdir	Confine deletion of directories. List of filtered system calls are rmdir(2) and unlinkat(2). Note unlinkat(2) is confined by this category if and only if AT_REMOVEDIR is set in flags, otherwise it's confined by the delete category. This category was split from the delete category as of version 3.33.0.
chown, chgrp	Confine owner and group changes on files. List of filtered system calls are chown(2), chown32(2), fchown(2), fchown32(2), lchown(2), lchown32(2), and fchownat(2).
chmod	Confine mode changes on files. List of filtered system calls are chmod(2), fchmod(2), fchmodat(2), and fchmodat2(2). In addition, a umask(2) value may be set using the trace/force_umask option which is enforced at chmod(2) boundary as well as during regular file creation, e.g. setting trace/force_umask:7177 effectively disallows setting s{u,g}id bits, all group+other bits and execute bit for the current user. This feature is useful in setting up W^X (Write XOR Execute) configuration for the sandbox.
chattr	Confine extended attribute changes on files. List of filtered system calls are setxattr(2), setxattrat(2), fsetxattr(2), lsetxattr(2), removexattr(2), removexattrat(2), fremovexattr(2), and lremovexattr(2). In addition, Syd ensures extended attributes whose name start with the one of the prefixes security., trusted. and user.syd. can not be listed or tampered by the sandbox process unless the sandbox lock is off for the respective process. This access can be permitted to the initial sandbox process with lock:exec or to all sandbox processes with lock:off. As of version 3.37.0, this restriction may be lifted with trace/allow_unsafe_xattr:1.
chroot	Confine change of the root directory using the chroot(2) system call. This sandboxing category can be disabled with trace/allow_unsafe_chroot:1 at startup, when the chroot(2) system call becomes a no-op. Similarly the pivot_root(2) system call is denied with the errno(3) EPERM by default unless trace/allow_unsafe_pivot_root:1 is set at startup in which case it becomes a no-op like chroot(2). No actual change of root directory takes place either way. Syd must share the root directory with the sandbox process to work correctly. Instead, Syd will prevent all filesystem access after the first allowed chroot(2) attempt regardless of the root directory argument. The only exception to the prevention of filesystem access is the chdir(2) system call with the specific argument /, aka the root directory, is allowed. This ensures a TOCTOU-free way to support the common use-case of cutting all filesystem access by means of a chroot(2) call to /var/empty which is common case among unix daemons. This sandboxing category does not depend on the Linux capability CAP_SYS_CHROOT, therefore can be used in unprivileged context. Syd drops the CAP_SYS_CHROOT Linux capability by default unless trace/allow_unsafe_caps:1 is passed at startup.
utime	Confine last access and modification time changes on files. List of filtered system calls are utime(2), utimes(2), futimesat(2), utimensat(2), and utimensat_time64(2).
mkbdev	Confine block device creation. List of filtered system calls are mknod(2) and mknodat(2). Block device creation is disabled by default to adhere to the principle of secure defaults with a kernel level seccomp-bpf filter which terminates the process on violation. This filter includes the Syd process, so a compromised Syd process will not be able to create block devices either. Therefore, the user must opt-in at startup using the trace/allow_unsafe_mkbdev:1 option to use this category for path-based access checks on block devices.
mkcdev	Confine character device creation. List of filtered system calls are mknod(2) and mknodat(2). Character device creation is disabled by default to adhere to the principle of secure defaults with a kernel level seccomp-bpf filter which terminates the process on violation. This filter includes the Syd process, so a compromised Syd process will not be able to create character devices either. Therefore, the user must opt-in at startup using the trace/allow_unsafe_mkcdev:1 option to use this category for path-based access checks on character devices.
mkfifo	Confine named pipe (FIFO) creation. List of filtered system calls are mknod(2) and mknodat(2).
mktemp	Confine temporary file creation. List of filtered system calls are open(2), openat(2), and openat2(2) with the O_TMPFILE flag. A rule such as allow/mktemp+/tmp permits the sandbox process to create anonymous temporary files under the directory /tmp. The creation of regular files of temporary nature are confined by the create category instead.
net	Confine network access. Socket types UNIX, IPv4, IPv6, NetLink and KCAPI are supported, use the option trace/allow_unsupp_socket:1 to pass-through sockets of unsupported types. UNIX domain sockets are always matched on absolute path, therefore always start with the character /. UNIX abstract sockets are prefixed with the @ character before access check. Similarly unnamed UNIX sockets use the dummy path !unnamed for access check. Finally, network sandboxing concentrates on confining the initial connection action and leaves out the system calls recvfrom(2), recvmsg(2) and recvmmsg(2) as out of scope for sandbox confinement for performance reasons and due to a lack of security implications noting the fact that recv* system calls cannot specify target addresses.
net/bind	Confine binding network access. This category confines the bind(2) system call, UNIX domain socket file creation using the mknod(2) and mknodat(2) system calls, and UNIX socket-pair creation using the socketpair(2) system call. socketpair(2) system call uses the dummy path !unnamed for access check. Unnamed UNIX sockets use the same dummy path.
net/connect	Confine connecting network access. List of filtered system calls are connect(2), sendto(2), sendmsg(2), and sendmmsg(2). For IPv4 and IPv6 sockets, the target address of these system calls are also checked against the IP blocklist, see the description of the block command in syd(2) manual page for more information.
net/sendfd	Confine sending of file descriptors. The list of filtered system calls are sendmsg(2) and sendmmsg(2). As of version 3.31.0, file descriptors referring to block devices, directories and symbolic links may not be passed. The restriction on block devices can be lifted with trace/allow_unsafe_mkbdev:1. UNIX domain sockets are always matched on absolute path, therefore always start with the character /. UNIX abstract sockets are prefixed with the @ (at sign) character before access check. Similarly unnamed UNIX sockets use the dummy path !unnamed for access check.
net/link	Confine netlink(7) sockets used in communication between kernel and user space. This sandboxing category may be used to specify a list of netlink(7) families to allow for the sandbox process. Use e.g. allow/net/link+route to allow the NETLINK_ROUTE family. See the syd(2) manual page for more information.
lock/read	Use landlock(7) to confine file read access. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_READ_FILE and only applies to the content of the directory not the directory itself. As of version 3.33.0, lock/exec and lock/readdir access rights are confined in their respective categories. Previously, this category included the access rights LANDLOCK_ACCESS_FS_EXECUTE and LANDLOCK_ACCESS_FS_READ_DIR as well. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/write	Use landlock(7) to confine file write access. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_WRITE_FILE and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/exec	Use landlock(7) to confine file execution. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_EXECUTE and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/ioctl	Use landlock(7) to confine ioctl(2) operations. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_IOCTL_DEV and only applies to the content of the directory not the directory itself. This access right is supported as of Landlock ABI version 4 which was introduced with Linux-6.7. This command has no effect when running on older Linux kernels. Use syd-lock(1) to check the latest Landlock ABI supported by the running Linux kernel. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/create	Use landlock(7) to confine file creation, renames and links. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_MAKE_REG and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/delete	Use landlock(7) to confine file unlinking, renames and links. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_REMOVE_FILE and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/rename	Use landlock(7) to confine link or rename a file from or to a different directory (i.e. reparent a file hierarchy). This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_REFER and only applies to the content of the directory not the directory itself. This access right is supported as of Landlock ABI version 2 which was introduced with Linux-5.19. This command has no effect when running on older Linux kernels. Use syd_lock(1) to check the latest Landlock ABI supported by the running Linux kernel. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/symlink	Use Landlock LSM to confine symbolic link creation, renames and links. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_MAKE_SYM and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/truncate	Use Landlock LSM to confine file truncation with truncate(2), ftruncate(2), creat(2), or open(2) with O_TRUNC. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_TRUNCATE and only applies to the content of the directory not the directory itself. This access right is supported as of Landlock ABI version 3 which was introduced with Linux-6.2. This command has no effect when running on older Linux kernels. Use syd-lock(1) to check the latest Landlock ABI supported by the running Linux kernel. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/readdir	Use Landlock LSM to confine directory listings. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_READ_DIR and applies to the given directory and the directories beneath it. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/mkdir	Use Landlock LSM to confine directory creation and renames. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_MAKE_DIR and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/rmdir	Use Landlock LSM to confine directory deletion and renames. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_REMOVE_DIR and only applies to the content of the directory not the directory itself. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/mkbdev	Use Landlock LSM to confine block device creation, renames and links. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_MAKE_BLOCK. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/mkcdev	Use Landlock LSM to confine character device creation, renames and links. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_MAKE_CHAR. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/mkfifo	Use Landlock LSM to confine named pipe (FIFO) creation, renames and links. This category corresponds to the landlock(7) access right LANDLOCK_ACCESS_FS_MAKE_FIFO. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/bind	Use Landlock LSM to confine network ports for bind(2) and UNIX domain socket creation, renames and links. This category corresponds to the Landlock access right LANDLOCK_ACCESS_NET_BIND_TCP for network ports, and LANDLOCK_ACCESS_FS_MAKE_SOCK for UNIX domain sockets. The latter access right only applies to the content of the directory not the directory itself. The access right LANDLOCK_ACCESS_NET_BIND_TCP is supported as of Landlock ABI version 4 which was introduced with Linux-6.7. This command has no effect when running on older Linux kernels. Use syd_lock(1) to check the latest Landlock ABI supported by the running Linux kernel. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
lock/connect	Use Landlock LSM to confine network ports for connect(2). This category corresponds to the Landlock access right LANDLOCK_ACCESS_NET_CONNECT_TCP. This access right is supported as of Landlock ABI version 4 which was introduced with Linux-6.7. This command has no effect when running on older Linux kernels. Use syd_lock(1) to check the latest Landlock ABI supported by the running Linux kernel. This category is enforced completely in kernel-space so it can be used to construct a multi-layered sandbox. See the Lock Sandboxing section for more information.
block	Application firewall with capability to include ipset and netset files. List of filtered system calls are accept(2), accept4(2), connect(2), sendto(2), sendmsg(2), sendmmsg(2). IPv4 and IPv6 family sockets are supported. Source and target addresses are checked against the IP blocklist. Refer to the description of the block command in syd(2) manual page for more information.
fs	Confine file opens based on filesystem type. By default, no filesystem types are allowed. To make this sandboxing practical, the fs profile included by the linux profile allows all filesystem types except aafs, bpf_fs, securityfs, selinux, smack, debugfs, pstorefs, tracefs, cgroup, cgroup2, nsfs, pid_fd, rdtgroup, devmem, efivarfs, hostfs, mtd_inode_fs, openprom, daxfs, secretmem, bdevfs, binderfs, usbdevice, xenfs, and zonefs. Use allow/fs+<fstype> to allow a filesystem type.
force	Verified Execution: Verify binary/library integrity at exec(3)/mmap(2) time which is similar to Veriexec (NetBSD) & IntegriForce (HardenedBSD). See the Force Sandboxing section for more information.
tpe	Trusted Path Execution: Execution only allowed from Trusted directories for Trusted files which are not writable by group or others and are optionally owned by root or current user. This feature is similar to the implementation of Grsecurity & HardenedBSD. See the TPE Sandboxing section for more information.
crypt	Transparent File Encryption with AES-CTR and HMAC-SHA256, see the Crypt Sandboxing section for more information.
proxy	SOCKS5 proxy forwarding with network namespace isolation. Defaults to TOR. See the Proxy Sandboxing section for more information.
pty	Run sandbox process inside a new pseudoterminal. See the PTY Sandboxing section for more information.
mem, pid	Memory and PID sandboxing: Simple, unprivileged alternatives to Control Groups. See the Memory Sandboxing and PID Sandboxing sections for more information.
SafeSetID	Safe user/group switching with predefined UID/GID transitions like SafeSetID of the Linux kernel. See the SafeSetID section for more information.
Ghost mode	Detach Syd from the sandbox process, similar to seccomp(2) Level 1, aka "Strict Mode". See the Ghost mode section for more information.

Sandboxing for a category may be on or off: If sandboxing is off, none of the relevant system calls are checked and all access is granted. If, however, sandboxing is on, the action defaults to deny and allowlists and denylists can be used to refine access rights, e.g. allow/read+/etc/passwd. The default action for a sandboxing category may be changed with the respective option, e.g. default/force:kill. See the syd(2) manual page for more information on how to configure Syd sandbox policies. If the sandbox process invokes a system call that violates access, this attempt is reported in system log and the system call is denied from execution. There are two ways to customise this behaviour. Syd may be configured to allow some glob(3p) patterns. If the path argument of the system call which is subject to be modified matches a pattern in the list of allowed glob(3p) patterns, this attempt is not denied. If, however it matches a pattern in the list of deny glob(3p) patterns the attempt is denied. If many rules match the same path or address, the last matching pattern wins. It is also possible to use the actions exit, kill, abort, stop, panic, and warn instead of the allow and deny actions. The list of available sandboxing actions is given below:

allow	Allow system call.
warn	Allow system call and warn.
filter	Deny system call silently.
deny	Deny system call and warn. This is the default.
panic	Deny system call, warn and panic the current Syd thread.
stop	Deny system call, warn and stop offending process.
abort	Deny system call, warn and abort offending process.
kill	Deny system call, warn and kill offending process.
exit	Warn, and exit Syd immediately with deny errno(3) as exit value.

deny is default unless another default action is set using one of the default/<category>:<action> options. See syd(2) manual page for more information. exit causes Syd to exit immediately with all the sandbox processes running under it. kill makes Syd send the offending process a SIGKILL signal and deny the system call. stop makes Syd send the offending process a SIGSTOP signal and deny the system call. abort makes Syd send the offending process a SIGABRT signal and deny the system call. Unlike kill and stop actions sandbox processes are able to catch the SIGABRT signal, therefore abort action should only be used for debugging in trusted environments where a core(5) dump file may provide invaluable information. panic causes the respective Syd emulator thread to panic in which case the system call is denied by an RAII guard. This behaviour of panic action is currently functionally equivalent to the deny action, however it may be further extended in the future where Syd emulator processes are fork+exec'ed and address space is rerandomized by ASLR on each access violation. warn makes Syd allow the system call and print a warning about it which is used by pandora(1) for learning mode. Additionally, Syd may be configured to filter some glob(3p) patterns. In this case a match will prevent Syd from reporting a warning about the access violation, the system call is still denied though. For lock/* categories the only available action is allow, and these categories accept path names rather than glob(3p) patterns as arguments. Relative paths are permitted for all lock/* categories except lock/bind which requires either an absolute UNIX domain socket path or a port-range as argument.

SANDBOX CATEGORY SETS

As of v3.38.0, multiple categories may be specified split by commas and the following sets are defined to streamline sandbox profile composition. Names are intentionally chosen to be consistent with OpenBSD's pledge(2) and FreeBSD's capsicum rights(4freebsd):

all	All categories
all-x	All categories except exec
lock/all	All landlock(7) access rights
lpath	walk, stat, chdir
rpath	read, readdir
lock/rpath	lock/read, lock/readdir
wpath	write, truncate
lock/wpath	lock/write, lock/truncate
cpath	create, delete, rename
lock/cpath	lock/create, lock/delete, lock/rename
dpath	mkbdev, mkcdev
lock/dpath	lock/mkbdev, lock/mkcdev
spath	mkfifo, symlink
lock/spath	lock/mkfifo, lock/symlink
tpath	mkdir, rmdir
lock/tpath	lock/mkdir, lock/rmdir
fown	chown, chgrp
fattr	chmod, chattr, utime
net	net/bind, net/connect, net/sendfd
lock/net	lock/bind, lock/connect
inet	net/bind, net/connect
lock/inet	lock/bind, lock/connect
bnet	net/bind
lock/bnet	lock/bind
cnet	net/connect
lock/cnet	lock/connect
snet	net/sendfd

Some examples are given below:

default/all:kill
sandbox/inet:off
deny/cpath,rpath,wpath+${HOME}/.ssh/***
kill/spath+/tmp/***
allow/inet+loopback!1024-65535
kill/unix+/dev/log

SANDBOX RULE SHORTCUTS

Sandbox capabilities may be passed to sandbox actions either as a single unit or as a comma-delimited list, e.g:

allow/read,write,stat,exec+/***
allow/read,write,stat-/***
deny/read,write,stat+/***
deny/read,write-/***
filter/read,write,stat+/dev/mem
filter/read,write-/dev/mem

As of version 3.18.14, sandboxing modes may be specified as a single unit or as a comma-delimited list, e.g:

sandbox/read,write,stat,exec:on
sandbox/net,lock:off

As of version 3.19.0, namespace types may be specified as a single unit or as a comma-delimited list, e.g.:

unshare/user,pid,mount:on
unshare/net,cgroup:off

As of version 3.35.0, default modes may be specified as a single unit or as a comma-delimited list, e.g:

default/write,truncate:kill
default/read,stat:allow

SegvGuard

As of version 3.16.3, Syd has a simple implementation of SegvGuard. The implementation is inspired by that of HardenedBSD with identical defaults: If a sandbox process receives a signal that may produce a core(5) dump file for segvguard/maxcrashes times (defaults to 5), in a period of segvguard/expiry seconds (defaults to 2 minutes), subsequent attempts to execute the same executable is denied for segvguard/suspension seconds (defaults to 10 minutes). SegvGuard can be disabled by setting segvguard/expiry:0. SegvGuard support depends on ptrace(2), therefore it may also be disabled by setting trace/allow_unsafe_ptrace:1 at startup. The trigger signals for SegvGuard are SIGABRT, SIGBUS, SIGFPE, SIGILL, SIGIOT, SIGKILL, SIGQUIT, SIGSEGV, SIGSYS, SIGTRAP, SIGXCPU, and SIGXFSZ. The signal SIGKILL is intentionally included into the list even though it is not a core(5) dump file generating signal to make kill rules trigger SegvGuard, a design later mirrored in HardenedBSD's work on PaX SEGVGUARD and Capsicum integration.

Check out the following links for further information on SegvGuard:

http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Deter_exploit_bruteforcing
http://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Active_kernel_exploit_response
http://phrack.org/archives/issues/59/9.txt
http://phrack.org/archives/issues/58/4.txt
https://github.com/HardenedBSD/hardenedBSD/wiki/segvguard2-ideas---brainstorm
https://hardenedbsd.org/article/shawn-webb/2025-03-01/hardenedbsd-february-2025-status-report

Force Sandboxing

Force Sandboxing enhances system security by scrutinizing the path provided to execve(2) and execveat(2) system calls, comparing them against a predefined Integrity Force map -- a registry of path-to-checksum correlations. Upon invocation of these calls, the sandbox computes the checksum of the target binary and cross-references it with the map. Discrepancies trigger rule-defined actions: execution might proceed with a logged warning, or culminate in the termination of the process in violation. This mechanism allows for rigorous enforcement of binary integrity, echoing the preventative ethos of HardenedBSD's Integriforce and NetBSD's Veriexec by proactively mitigating unauthorised code execution, albeit with a unique emphasis on flexible, user-defined consequence management ranging from permissive alerts to stringent execution blocks.

Distinguishing itself through user-centric customization, Force Sandboxing offers a versatile approach to execution integrity. Administrators can tailor the sandbox's response to checksum mismatches -- kill, deny, or warn -- thereby balancing security needs with operational flexibility. This adaptability, combined with tools like syd-sha(1) for checksum calculation and syd-path(1) for rule creation, positions Force Sandboxing as a powerful ally in the preservation of system integrity. See force command in syd(2) manual page on how to add/remove entries to/from the Integrity Force map.

As of version 3.16.3, Syd checks the paths of the dynamic libraries an executable is linked against for force access as well. This only works for ELF files.

As of version 3.21.3, Syd hooks into mmap(2), and mmap2(2) system calls and checks the file descriptor for Force access when the memory protection mode includes PROT_EXEC and flags does not include MAP_ANONYMOUS which typically indicates a dlopen(3). Therefore libraries dynamically loaded at runtime are checked for Force access as well.

TPE sandboxing

As of version 3.21.0, Syd introduces Trusted Path Execution (TPE) sandboxing, which restricts the execution of binaries to ensure they come from trusted directories. As of version 3.37.2, the binary file must be trusted as well as its parent directory. The intention is to make privilege escalation harder when an account restricted by TPE is compromised as the attacker won't be able to execute custom binaries which are not in the trusted path. A binary is trusted if the file and its parent directory meet the following criteria:

Not writable by group or others.
Optionally owned by root, controlled by the tpe/root_owned option.
Optionally owned by the current user or root, controlled by the tpe/user_owned option.
Optionally part of the root filesystem, controlled by the tpe/root_mount option.

If these criteria are not met, the execution is denied with an EACCES errno(3), and optionally, the offending process can be terminated with the SIGKILL signal using the default/tpe:kill option. This mechanism ensures that only binaries from secure, trusted paths can be executed, enhancing security by preventing unauthorized code execution. TPE sandboxing operates by checking the the executables at three stages:

During the system call entry of execve(2) and execveat(2) to check scripts.
On ptrace(2) exec event to check the ELF executable and dynamic loader.
On mmap(2) when dynamic libraries are mapped to memory, typically with dlopen(3).

TPE can be configured to apply to a specific user group. By default, TPE applies to all users. However, administrators can specify an untrusted GID with the tpe/gid setting, restricting TPE only to users in that group. Additionally, TPE can negate GID logic with the tpe/negate setting, making the specified group trusted and exempt from TPE.

Syd's TPE implementation is based on HardenedBSD's which is inspired by GrSecurity's TPE. Check out the following links for more information:

http://phrack.org/issues/52/6.html#article
http://phrack.org/issues/53/8.html#article
https://wiki.gentoo.org/wiki/Hardened/Grsecurity_Trusted_Path_Execution

Lock Sandboxing

Lock sandboxing utilises the Landlock Linux Security Module for simple unprivileged access control. It is enforced completely in kernel-space and the policy is also applied to the Syd process, such that a compromised Syd process is still stuck inside the landlock(7) sandbox, therefore Lock sandboxing can be used to construct a multi-layered sandbox for added security. Lock sandboxing may be turned on with the sandbox/lock:on sandbox command at startup. Paths to files and file hierarchies should be populated using the lock/* categories either specifying them one at a time, e.g. allow/lock/read+/usr, allow/lock/write+/dev/null or by specifying them as a comma delimited list, e.g. allow/lock/read,write,ioctl+/dev/null. The shorthand lock/all is provided to ease configuration and it stands for the union of categories lock/read, lock/write, lock/exec, lock/ioctl, lock/create, lock/delete, lock/rename, lock/symlink, lock/truncate, lock/readdir, lock/mkdir, lock/rmdir, lock/mkdev, lock/mkfifo, and lock/bind. As of version 3.29.0, network confinement is supported and allowlisted bind(2) and connect(2) ports can be specified using the commands allow/lock/bind+port and allow/lock/connect+port. A closed range in format port1-port2 may also be specified instead of a single port number. Use the lock/bind category with an absolute path to confine UNIX domain socket creation, renames and links, e.g allow/lock/bind+/run/user/${SYD_UID}. As of version 3.35.0, the default compatibility level has been changed to Hard Requirement. Compared to the old default Best Effort, this level ensures the sandbox is fully enforced. Moreover, ENOENT ("No such file or directory"), errors are made fatal in this level. The compatibility level may be changed at startup using the command default/lock. See the syd(2) manual page for more information.

Crypt Sandboxing

This sandboxing category provides transparent file encryption using AES-CTR, with HMAC-SHA256 ensuring secure data handling without manual encryption steps. When sandbox/crypt:on is set, files matching the glob(3) patterns specified by crypt+ are encrypted on write and decrypted on read. Configuration includes specifying a 32-bit decimal encryption key serial ID for the keyrings(7) interface using crypt/key/main, and specifying a 32-bit decimal authentication key serial ID for the keyrings(7) interface using crypt/key/auth. Specifying the same key serial ID for both options is permitted and the option crypt/key may be used as a shorthand to set both key serial IDs. The specified key serial IDs are used with the ALG_SET_KEY_BY_KEY_SERIAL setsockopt(2) operation which is new in Linux-6.2, therefore Crypt sandboxing requires Linux-6.2 or newer. The keys must have search permission -- i.e. have the KEY_(POS|USR|GRP|OTH)_SEARCH permission bit(s) set so the kernel can locate and copy the key data into the crypto API; otherwise the operation will be denied (EPERM: "Operation not permitted"). Refer to the following link for more information https://lkml.org/lkml/2022/10/4/1014.

The utility syd-key(1) may be used to generate encryption keys and save them to keyrings(7) for use with Crypt sanboxing. To avoid including the key serial IDs into the configuration file, the user may set the key serial IDs using an environment variable and then specify this environment variable, e.g: crypt/key:${SYD_KEY_ID}. The user must use an environment variable name that starts with the prefix SYD_ but does not start with the prefix SYD_TEST_ as such environment variables don't leak into the sandbox process. Similarly the user must refrain from using any environment variable specified under the ENVIRONMENT section of the syd(1) manual page.

Encryption operates via Linux kernel cryptography API sockets, using zero-copy techniques with splice(2) and tee(2) to avoid unencrypted data in memory. To assert we use zero-copy exclusively and respect user's privacy by avoiding to read plain-text into memory at all costs, syd_aes threads who are responsible for encryption are confined with a seccomp(2) filter to deny the read(2), open(2), and socket(2) system calls (and many more) and allow the write(2) system call only up to 32 bytes which is required to write the HMAC tag and the random IV to the file. The setup sockets are created on startup, the key is selected using the keyrings(7) interface without copying the key material into userspace. IV uniqueness is ensured by generating a random IV using getrandom(2) per file. In case of an error retrieving entropy via getrandom(2) the random bytes in AT_RANDOM are used instead. Per-file IV is prepended to encrypted files. This ensures security by preventing IV reuse. Syd ensures that per-file IVs are securely zeroized on drop.

A 32-byte HMAC (SHA256) message authentication tag is included between the file magic header and the IV, and is authenticated on decrypt, following the Encrypt-then-MAC approach. This provides integrity checking and resistance against bit-flip attacks. By default, decryption occurs in a memory file descriptor to prevent tampering, which limits practicality for large files due to memory constraints. User may specify a secure temporary backing directory with crypt/tmp to workaround this. Ideally this directory should be on encrypted storage as Syd is going to write plaintext here. File locks are set before attempting to encrypt files to ensure security and safe concurrent access. Linux OFD locks are used for locking. Encrypted data is flushed to disk only after all file descriptors that point to the encrypted open file description are closed enabling safe and performant concurrent access. File appends are handled efficiently with last block reencryption. Only regular files will be encrypted. The file format header \x7fSYD3 identifies encrypted files and the version in the header must match the current Syd API which at the moment is 3. Compared to GSWTK's dbfencrypt, Crypt sandboxing avoids TOCTOU vulnerabilities and encryption weaknesses by utilizing AES-CTR with HMAC-SHA256 and robust setup steps, providing a more secure and streamlined encryption process.

Crypt sandboxing employs the AES-CTR algorithm, a secure and efficient symmetric key encryption method suitable for various applications. It operates as a stream cipher (skcipher) with a block size of 1 byte, allowing data to be encrypted in a byte-by-byte manner. The algorithm uses a fixed key size of 32 bytes (256 bits) by default, providing robust security, and a fixed initialization vector (IV) size of 16 bytes to ensure randomness and uniqueness in each encryption operation. Processing data in byte-sized chunks, the algorithm maintains a consistent walk size of 16 bytes for traversal and operations, ensuring seamless encryption and decryption processes. This configuration, with its secure default key size, significantly enhances security, preventing common encryption weaknesses and supporting efficient, transparent file encryption within the sandbox environment. The inclusion of HMAC-SHA256 for integrity checking further enhances security by detecting any unauthorized modifications or corruption of data. CTR is infinitely parallelizable because each block in the stream can be encrypted independently. This allows for encryption and decryption processes to be split across multiple processors, significantly increasing throughput. With hardware support such as AES-NI CPU instructions, speeds can easily exceed a gigabyte per second.

As of version 3.21.2, Syd opens memory file descriptors with the flag MFD_NOEXEC_SEAL during transparent decryption to ensure the memfds are non-executable and can't ever be marked executable. This ensures security as otherwise transparent decryption can be used to bypass Exec, Force and TPE sandboxing. Notably, this flag requires Linux-6.3 or newer. On older kernels, a backing directory must be specified with crypt/tmp for transparent decryption to work. Attempt to use transparent decryption without a backing directory on older kernels will fail with the errno(3) EOPNOTSUPP ("Operation not supported on transport endpoint"). As of version 3.28.0, Syd allows this restriction to be lifted with the option trace/allow_unsafe_memfd:1.

As of version 3.39.0, keyrings(7) interface is used for key management and specifying keys as raw payload is no longer permitted. Moving key material into the kernel keyrings(7) interface substantially reduces the exposure of raw keys to userland, narrowing the attack surface for memory-disclosure, core-dump, and accidental-persistence vulnerabilities while enabling cryptographic operations to be performed without copying key bytes into process memory. Because keyrings(7) enforce kernel-side permissions and lifecycle semantics (search/view/revoke, expiries, etc.), they provide a principled provenance and access-control model that simplifies secure rotation, auditing, and least-privilege enforcement. Together, these properties both harden the runtime security posture and facilitate integration with hardware-backed or sealed key types, improving operational compliance and reducing the likelihood of application-level key-management errors.

File Format: Each file encrypted within the Crypt sandboxing framework follows a structured format to ensure consistency, secure handling, and clear identification. Each encrypted file starts with a five-byte magic header, \x7fSYD3, where \x7fSYD indicates that the file is encrypted by Syd, and 3 denotes the current API version. This header is followed by a 32-byte HMAC (SHA256) message authentication tag, providing integrity checking by authenticating the encrypted content. Next is followed by a 16-byte initialization vector (IV), which is unique per file, ensuring strong cryptographic security. The AES-CTR-encrypted ciphertext follows the IV, providing the file's protected content. Syd will only process files that match this format and have a compatible version; if a file does not have the correct file format header or API version, or if it exists unencrypted, Syd will leave it untouched. This approach prevents unintended operations on incompatible or unencrypted files.

+----------------+-------------------------+-----------------------+--------------------+
| Magic Header   | HMAC Tag                | Initialization Vector | Encrypted Content  |
| "\x7fSYD3"     | 32 bytes (SHA256 HMAC)  | 16 bytes              | AES-CTR Ciphertext |
+----------------+----------------------- -+-----------------------+--------------------+

Limitations:

Large files are not handled efficiently during decryption by default due to usage of in-memory files, specify a secure temporary backing directory with crypt/tmp:/path to workaround this. Ideally this directory should be on encrypted storage as Syd is going to write plaintext here.
Concurrent Access: Encrypted file access utilises Linux OFD locks, which are now standardized in POSIX 2024. Ensure that the underlying filesystem fully supports OFD locks to enable effective advisory file locking. Modern filesystems and NFS implementations compliant with POSIX 2024 typically provide this support, mitigating issues present in older versions. The multithreaded architecture of Syd relies on OFD locks to ensure safe and efficient concurrent access, eliminating the need for alternative locking mechanisms such as POSIX advisory locks. For further details, refer to the fcntl_locking(2) manual page.
Crash Safety: Currently, encrypted data is flushed to disk only after all file descriptors are closed. In the event of a system or sandbox crash, this may result in incomplete writes or potential data loss, as in-flight data might not be persisted. Future enhancements will focus on implementing transactional flush mechanisms and crash recovery procedures to ensure atomicity and integrity of encrypted data, thereby improving resilience against unexpected terminations.

Utilities:

syd-aes(1): Encrypt/decrypt files akin to openssl-enc(1ssl).
syd-key(1)
- Generate random AES-CTR keys using getrandom(2), and save to keyrings(7).
- Read passphrases from TTY or STDIN, hash with SHA3-256, and save to keyrings(7).

Proxy Sandboxing

As of version 3.22.0, Proxy sandboxing in Syd confines network communication exclusively through a designated SOCKS proxy, enforced by the helper utility syd-tor(1). Configured at startup with sandbox/proxy:on, this type implies the use of unshare/net:1, isolating network namespaces to prevent direct network access. Traffic is forwarded from a specified local port (proxy/port:9050) to an external address and port (proxy/ext/host:127.0.0.1, proxy/ext/port:9050). As of version 3.34.1, you may also specify an external UNIX domain socket using e.g. proxy/ext/unix:/path/socks5.sock. This setup ensures all network interactions route through the proxy, leveraging zero-copy data transfers and edge-triggered epoll(7) for efficient event handling. The implementation enhances security by employing seccomp and Landlock for additional confinement, preventing unauthorized network access and ensuring strict adherence to the defined network path. This approach minimizes the risk of proxy bypasses and maintains the integrity of the network isolation.

PTY Sandboxing

As of version 3.36.0, PTY Sandboxing runs the target process inside a dedicated pseudoterminal managed by the syd-pty(1) helper, isolating all terminal I/O from the host TTY and preventing direct ioctl(2) or control-sequence escapes. The PTY main is proxied via an edge-triggered epoll(7) loop with non-blocking zero-copy splice(2), ensuring no unencrypted data ever traverses user space. A minimal seccomp(2) filter confines only the essential PTY syscalls (e.g. TIOCGWINSZ, TIOCSWINSZ) and denies all others -- including injection via TIOCSTI -- while Landlock locks down access to the PTY device, filesystem, and network. Combined with no-exec memory seals and namespace isolation, this approach hardens against terminal-based attacks and preserves the confidentiality and integrity of the sandboxed session.

Memory Sandboxing

This sandboxing category handles the system calls brk(2), mmap(2), mmap2(2), and mremap(2) and checks the per-process memory usage on each memory allocation request. If the memory usage reaches the maximum value defined by mem/max, the system call is denied with ENOMEM. Moreover the virtual memory size can be limited using mem/vm_max. If the limit is reached on the entry of any of the respective system calls, the system call is denied with ENOMEM and the signal SIGKILL is delivered to the offending process. Subsequent to the delivery of the signal, the process_mrelease(2) system call is called on the process to immediately release memory. The default action may be changed using the default/mem option. The per-process memory usage is a fair estimate calculated using the file proc_pid_smaps(5) summing the following fields together:

Pss (Proportional Set Size) is similar to Rss, but accounts for shared memory more accurately by dividing it among the processes that share it. Rss (Resident Set Size) is the portion of memory occupied by a process that is held in RAM.
Private_Dirty represents the private memory that has been modified (dirty).
Shared_Dirty represents the shared memory that has been modified.

As of version 3.43.1, the memory sandboxing system has been updated to improve memory usage tracking. Syd now enforces a strict memory limit based on allocation granularity, meaning that programs cannot exceed the defined memory limits, even by the amount they allocate at once. This change aligns the limit with the allocation size rather than allowing any overflow beyond the limit. Additionally, memory tracking has been optimized by switching from iterating over proc_pid_smaps(5) to using the more efficient /proc/pid/smaps_rollup, which consolidates memory usage information for better performance and more accurate enforcement of memory constraints.

Memory sandboxing is not an alternative to cgroups(7)! You should use cgroups(7) when you can instead. This sandboxing category is meant for more constrained environments where cgroups(7) is not supported or not available due to missing permissions or other similar restrictions.

PID sandboxing

This sandboxing category handles the system calls fork(2), vfork(2), clone(2), and clone3(2) and checks the total number of tasks running on the system on each process creation request. If the count reaches the maximum value defined by pid/max, the system call is denied with EAGAIN. If pid/kill is set to true, the signal SIGKILL is delivered to the offending process. This sandboxing category is best coupled with a pid namespace using unshare/pid. In this mode, Syd will check the number of running tasks in the current namespace only.

As of version 3.40.0, with unshare/pid:1 the limit and accounting apply per PID namespace; on Linux 6.14 and newer the namespaced kernel.pid_max sysctl(8) is set to max(pid/max, 301) so the kernel's 300 reserved PIDs do not reduce the configured headroom, and on older kernels kernel.pid_max sysctl(8) is not modified.

PID sandboxing is not an alternative to cgroups(7)! You should use cgroups(7) when you can instead. This is meant for more constrained environments where cgroups(7) is not supported or not available due to missing permissions or other similar restrictions.

SafeSetID

SafeSetID, introduced in version 3.16.8, enhancing the management of UID/GID transitions. This feature enables finer-grained control by allowing administrators to explicitly specify permissible transitions for UID and GID changes, thus tightening security constraints around process privilege management. It works by allowing predefined UID and GID transitions that are explicitly configured using the setuid+<source_uid>:<target_uid> and setgid+<source_gid>:<target_gid> commands in the Syd configuration. This ensures that transitions can only occur between specified user and group IDs, and unauthorised privilege escalations are blocked. For instance, a transition might be allowed from a higher-privileged user to a less-privileged user but not vice versa, thereby preventing any escalation of privileges through these system calls.

As of version 3.24.5, Syd applies a kernel-level seccomp(2) filter by default to deny all set*uid system calls with UID less than or equal to 11 which is typically the operator user, and all set*gid system calls with GID less than or equal to 14 which is typically the uucp group. This means even a compromised Syd process cannot elevate privileges using these system calls. Refer to the output of the command syd-ls setid to see the full list of system calls in this group.

When a UID or GID transition is defined Syd will keep the CAP_SETUID and CAP_SETGID capabilities respectively and sandbox process will inherit these capabilities from Syd. Since version 3.24.6, Syd drops the CAP_SETUID capability after the first successful UID transition and similarly the CAP_SETGID capability after the first successful GID transition. This means Syd can only ever change its UID and GID once in its lifetime. However, this does not completely lock the setid system calls in the sandbox process: Transitions to Syd's current UID and GID are continued in the sandbox process which means the first successful UID and GID transition will continue to function as long as the sandbox process keeps the respective CAP_SETUID, and CAP_SETGID capabilities. This allows containing daemons, such as nginx(1), which spawn multiple unprivileged worker processes out of a single main privileged process.

Ghost mode

Ghost Mode, introduced in Syd version 3.20.0, is a one-way sandboxing mode, closely resembling seccomp(2) Level 1, also known as Strict Mode. This mode enhances security by allowing a process to transition to a highly restrictive state after completing its initial setup. When a sandboxed process is ready for this higher level of confinement, it invokes Ghost Mode by executing the stat(2) system call with the virtual path /dev/syd/ghost. Upon receiving this command, Syd closes the seccomp_unotify(2) file descriptor. This action elevates all previously hooked system calls to a kernel-level deny with the ENOSYS ("Function not implemented") errno(3), effectively making them unavailable. The transition to Ghost Mode is irreversible; once the file descriptor is closed, the process is locked into this restricted state. This mechanism ensures that the sandboxed process can only perform a very limited set of operations, akin to those allowed in Seccomp Level 1, thus significantly reducing its potential attack surface. Ghost Mode provides a robust security measure by denying all but the most essential system calls, which is crucial for applications that require maximum isolation and security after their initial configuration phase.

The mode is aptly named ghost because, upon closing the seccomp_unotify(2) file descriptor, the sandboxed process effectively detaches from Syd and becomes independent, much like a ghost. Entering ghost mode subsequently causes the syd_mon monitor thread and all syd_emu emulator threads to exit, and the remaining syd_main thread merely waits for the sandbox process to exit without any further intervention. This detachment underscores the finality and isolation of the Ghost Mode, ensuring that the process operates in a secure, tightly confined environment without further interaction from Syd. This mechanism is particularly useful for processes that require maximum security and minimal system call exposure after their initial configuration phase, providing a robust layer of protection against various exploits and vulnerabilities.

A process cannot enter Ghost mode once the sandbox lock is set. Alternatively, though, a process can set its process dumpable attribute to zero using the PR_SET_DUMPABLE prctl(2). Under Syd, this achieves almost the same effect as Syd will not be able to emulate system calls with the per-process directory inaccessible. This provides an unprivileged way to enter Ghost mode.

SECURITY

Syd stands out for its ability to operate without requiring elevated privileges, eliminating the need for root access. This feature significantly simplifies setup and usage. Users benefit from the capability to dynamically configure the sandbox from within, with options to secure it further as needed. Tip: To take a quick peek at the seccomp filters applied by Syd under various different configurations, use syd <flags...> -Epfc where PFC stands for Pseudo Filter Code which yields a human-readable textual dump of Syd's seccomp(2) filters. Syd further enrichens the output of this textual dump with # comments.

Threat Model

Syd strictly adheres to the current threat model of seccomp(2). The goal is to restrict how untrusted userspace applications interact with the shared OS kernel through system calls to protect the kernel from userspace exploits (e.g., shellcode or ROP payload). The kernel is trusted. Syd's threat model delineates the sandbox as the trusted interceptor of system calls, while all user applications running within the sandbox are considered untrusted. These untrusted applications can manipulate their execution environment through syscalls, and attackers are assumed to have the capability to execute arbitrary code within these applications. Syd uses several mechanisms, including seccomp(2) and ptrace(2) for syscall filtering, landlock(7) for filesystem access restrictions, and namespaces(7) for process and device isolation, to limit the impact of these potential attacks. The threat model assumes that attackers have control over the untrusted user space and may attempt reads, writes, or arbitrary code execution that could influence the behavior of the trusted sandbox or exploit syscall handling. The security of Syd relies on the correctness of its implementation and the underlying Linux kernel features it utilises. It is assumed that there are no vulnerabilities in Syd's interception and handling of syscalls, nor in the enforcement mechanisms provided by landlock(7) and namespaces(7). External attacks via network vectors or physical access to hardware are considered out of scope for this threat model.

"The sandbox lock" is an integral component of Syd's security architecture, which governs the configurability and integrity of the sandbox environment. By default, the sandbox lock is set to on, effectively preventing any further sandbox commands after the initial setup, thereby ensuring that once the sandbox is configured and the primary process is executed, the security policies remain unaltered by any untrusted processes within the sandbox. When the lock is set to exec, only the initial sandbox process retains the authority to access and modify the sandbox configuration, enabling a trusted process to securely establish the sandbox parameters while maintaining a pidfd (process ID file descriptor) to the initial process to safeguard against PID recycling attacks. Conversely, if the lock is set to off, all sandbox processes are permitted to access and modify the sandbox configuration, allowing for broader configurability during the setup phase. However, this state persists only until the sandbox is explicitly locked, after which the lock becomes immutable and the sandbox policies are fixed, preventing any subsequent processes from altering the configuration. This layered locking mechanism, reinforced by the use of pidfd in exec mode, effectively safeguards against untrusted processes attempting to modify sandbox settings to escalate privileges or circumvent restrictions, thereby maintaining a robust and secure execution environment within Syd's framework. In ipc mode, the sandbox configuration is accessible through a UNIX socket which may or may not be accessible from within the sandbox depending on sandbox ACL rules. In read mode, the sandbox configuration is accessible only to reads, but NOT edits. Transition from lock modes off, exec, and ipc into one of read and on is one-way and idempotent: It results in the sandbox policy getting sealed in memory using the mseal(2) system call either immediately or simultaneously with sandbox process startup. Transitions between lock modes read and on are not permitted.

"Crypt Sandboxing" in Syd ensures the confidentiality and integrity of specified files by transparently encrypting them using AES-CTR with HMAC-SHA256, even when adversaries fully control processes within the sandbox (i.e., attackers can execute arbitrary code and perform any allowed system calls). In this extended threat model, it is acknowledged that while attackers may access plaintext data within the sandbox's memory during process execution, they cannot extract encryption keys or plaintext data from outside the controlled environment, nor can they interfere with the encryption process to leak keys or plaintext to persistent storage or external channels. Cryptographic operations are performed via kernel-level cryptography API sockets using zero-copy techniques to prevent plaintext from residing in user-space memory buffers accessible to attackers. The syd_aes threads responsible for encryption are confined with strict seccomp(2) filters, denying them critical system calls like read(2), open(2), and socket(2), and allowing only minimal write(2) operations required for encryption metadata (e.g., writing the HMAC tag and random IV to the file). This confinement prevents exploitation that could leak sensitive data. Encryption keys are handled using kernel keyrings(7) interface and the ALG_SET_KEY_BY_KEY_SERIAL setsockopt(2) option. The threat model trusts the kernel and Syd's implementation, assuming attackers cannot exploit kernel vulnerabilities to access keys or plaintext within kernel memory or cryptographic operations. Additionally, file locks are employed before attempting to encrypt files to ensure safe concurrent access. In contrast to the general threat model, Crypt Sandboxing acknowledges that untrusted processes within the sandbox have access to plaintext data in memory during normal operation, as they need to read or write the plaintext files. However, the goal is to prevent attackers from accessing the plaintext outside the controlled environment or tampering with the encryption process to compromise confidentiality and integrity. This is achieved by ensuring that the encryption keys remain secure and that the encryption and decryption processes are tightly controlled and isolated from untrusted code.

Accessing remote process memory

Syd denies various system calls which can access remote process memory such as ptrace(2) and process_vm_writev(2) and common sandboxing profiles such as paludis and user disallow write access to the /proc/pid/mem file. This makes TOCTOU attack vectors harder to realise. Refer to the the output of the command syd-ls deny to see the full list of denied system calls.

Enhanced Handling of PTRACE_TRACEME

As of version 3.16.3, Syd introduced a new feature for managing the PTRACE_TRACEME operation, aimed at improving stealth against detection. Traditionally, PTRACE_TRACEME is the only ptrace(2) operation allowed by a tracee, which makes it a common target for detection of ptracers. By converting PTRACE_TRACEME into a no-operation (no-op) that always succeeds, Syd aims to subtly prevent simple detection methods that rely on this operation. Additionally, other ptrace(2) operations are modified to return an EPERM ("Operation not permitted") errno(3) instead of ENOSYS ("Function not implemented"), which helps reduce the likelihood of the sandbox being detected through these errors. This approach enhances the discreetness of Syd's operation by mitigating straightforward detection tactics used by monitored processes.

As of version 3.19.0, Syd extends this mitigation and turns the system call ptrace(2) into a no-op. Again, this provides a best-effort mitigation against using requests such as PTRACE_ATTACH or PTRACE_SEIZE to detect a ptracer.

As of version 3.47.0, Syd improves this mitigation and turns the prctl(2) calls with PR_SET_PTRACER argument into a no-op.

As of version 3.47.0, Syd improves this mitigation to defend against intelligent ptrace(2) detectors which utilize multiple ptrace(2) requests to detect ptracer. Refer to the following links for more information on intelligent ptrace(2) detection:

https://arxiv.org/pdf/2109.06127
https://seblau.github.io/posts/linux-anti-debugging
https://docs.rs/debugoff

Hardened procfs and devfs

To enhance system security and mitigate potential attack vectors, Syd enforces restrictions on procfs(5) and devfs file systems by implementing several key measures: denying both the listing and opening of block devices and files of unknown types by omitting entries corresponding to these file types (identified by DT_BLK and DT_UNKNOWN) from directory listings and rejecting open(2) operations on them. This prevents unauthorized enumeration and access to system storage devices, thereby mitigating information disclosure and potential tampering.

Syd also restricts visibility within the /proc directory so that processes can only see their own process IDs, effectively preventing discovery and potential interaction with other running processes, which reduces risks of information leakage, privilege escalation, and process manipulation. Access to the /proc entries of the Syd process itself is explicitly denied, safeguarding the sandbox manager from inspection or interference and preventing access to sensitive information about the sandboxing mechanism that could be exploited to bypass security controls or escape the sandbox.

Additionally, Syd addresses risks associated with magic symbolic links in /proc -- such as /proc/[pid]/exe and /proc/[pid]/fd/* -- by denying access to these links when they refer to processes other than the calling process, thus preventing exposure of sensitive file descriptors or executable paths of other processes and mitigating unauthorized access or container escape scenarios; this mitigation can be disabled with the trace/allow_unsafe_magiclinks:1 option if necessary, though doing so is not recommended.

Collectively, these hardened controls over procfs and devfs significantly reduce the attack surface by preventing information disclosure, unauthorized access, and potential privilege escalations, ensuring that sandboxed applications operate within a tightly controlled and secure environment that adheres to the principle of least privilege and maintains system integrity. Refer to the following links for more information:

https://forums.whonix.org/t/proc-pid-sched-spy-on-keystrokes-proof-of-concept-spy-gksu/8225
https://homes.luddy.indiana.edu/xw7/papers/zhou2013identity.pdf
https://petsymposium.org/2016/files/papers/Don%E2%80%99t_Interrupt_Me_While_I_Type__Inferring_Text_Entered_Through_Gesture_Typing_on_Android_Keyboards.pdf
https://staff.ie.cuhk.edu.hk/~khzhang/my-papers/2016-oakland-interrupt.pdf
https://www.cs.ucr.edu/~zhiyunq/pub/sec14_android_activity_inference.pdf
https://www.gruss.cc/files/procharvester.pdf
https://www.kicksecure.com/wiki/Dev/Strong_Linux_User_Account_Isolation#/proc/pid/sched_spy_on_keystrokes
https://www.openwall.com/lists/oss-security/2011/11/05/3
https://www.usenix.org/legacy/event/sec09/tech/full_papers/zhang.pdf
https://www.openwall.com/lists/oss-security/2025/11/05/3

Hardened proc_pid_status(5)

As of version 3.38.0, Syd filters proc_pid_status(5) at open(2) boundary to defeat common sandbox-fingerprinting heuristics while preserving compatibility with ordinary tooling. When a process (or its threads) reads /proc/<pid>/status or /proc/<pid>/task/<tid>/status, Syd normalizes only the security-critical fields -- zeroing TracerPid, NoNewPrivs, Seccomp, and Seccomp_filters, and rewriting the sandbox-revealing phrases in Speculation_Store_Bypass and SpeculationIndirectBranch. This targeted normalization breaks trivial anti-analysis checks (ptracer presence, seccomp/no_new_privs probes, speculative mitigation fingerprints) without altering process state.

The security impact is twofold: untrusted code loses a low-cost oracle for environment discovery, reducing the likelihood of logic bombs or capability gating based on sandbox detection, and defenders retain observability because the kernel's real enforcement still applies -- only the user-space view of these select fields is masked. For forensic and debugging workflows that explicitly need the unfiltered view, this mitigation can be temporarily relaxed per trace with trace/allow_unsafe_proc_pid_status:1, after which toggling back to :0 restores the hardened, stealth-preserving default.

Hardened uname(2)

As of version 3.15.1, Syd mediates uname(2) and returns a policy governed utsname that suppresses host identification and constrains kernel disclosure. The release string is synthesized to expose only the Linux major and minor as observed on the host or, as of 3.36.1, as supplied via SYD_ASSUME_KERNEL for controlled feature detection, while the micro component is randomized per Syd run to limit patch level fingerprinting; reads of /proc/version and /proc/sys/kernel/osrelease are hardened to present the same masked view. As of 3.40.0, the nodename, domainname, and version fields are sourced from the options uts/host, uts/domain, and uts/version with defaults localhost, (none), and a startup random value. As of 3.44.2, this restriction may be relaxed at startup with the option trace/allow_unsafe_uname:1. Practical effects include disrupting exploit and loader selection that depend on exact release matching, reducing cross host correlation via stable node and domain labels, neutralizing sandbox and VM fingerprinting heuristics that key off uname(2) and the corresponding proc(5) paths, and keeping build and compatibility probes functional by retaining major.minor semantics while allowing explicit control through SYD_ASSUME_KERNEL. Workloads that tie licensing, clustering, telemetry, or feature gates to the precise host release or to the original nodename should use the uts options to supply the required identity or opt out with the relaxation flag.

Denying TIOCLINUX ioctl

The limitation on the use of the TIOCLINUX ioctl(2) within secure environments, similar to the Syd sandbox, is an essential security measure addressing vulnerabilities specific to Linux terminal operations. The TIOCLINUX ioctl(2) command offers various functionalities, including but not limited to manipulating console settings, changing keyboard modes, and controlling screen output. While these capabilities can be leveraged for legitimate system management tasks, they also introduce potential security risks, particularly in multi-user environments or in the context of sandboxed applications.

The security concerns surrounding TIOCLINUX stem from its ability to alter terminal behaviors and settings in ways that could be exploited for unauthorised information disclosure, terminal hijacking, or privilege escalation. For instance, manipulating the console display could mislead users about the true nature of the operations being executed, or altering keyboard settings could capture or inject keystrokes.

In summary, the restriction on TIOCLINUX within secure environments is a vital security strategy, addressing the complex risks associated with direct terminal manipulation capabilities. This precaution is in keeping with the broader security community's efforts to mitigate known vulnerabilities and enhance the security posture of systems handling sensitive processes and data.

Denying TIOCSTI ioctl

The restriction on the use of the TIOCSTI ioctl(2) within the Syd sandbox addresses a significant security vulnerability associated with terminal input injection. The TIOCSTI ioctl(2) allows a byte to be inserted into the terminal input queue, effectively simulating keyboard input. This capability, while potentially useful for legitimate purposes, poses a substantial security risk, especially in scenarios where a process might retain access to a terminal beyond its intended lifespan. Malicious use of this ioctl(2) can lead to the injection of commands that execute with the privileges of the terminal's owning process, thereby breaching the security boundaries intended by user permissions and process isolation mechanisms. The concern over TIOCSTI is well-documented in the security community. For example, OpenBSD has taken measures to mitigate the risk by disabling the TIOCSTI ioctl(2), reflecting its stance on the ioctl(2) as one of the most dangerous due to its potential for abuse in command injection attacks. The decision to disable or restrict TIOCSTI in various Unix-like operating systems underscores the ioctl(2)'s inherent security implications, particularly in the context of privilege escalation and the execution of unauthorised commands within a secured environment.

In summary, the restriction on TIOCSTI within Syd is a critical security measure that prevents a class of vulnerabilities centered around terminal input injection, safeguarding against unauthorised command execution and privilege escalation. This precaution aligns with broader security best practices and mitigations adopted by the security community to address known risks associated with terminal handling and process isolation.

Denying FS_IOC_SETFLAGS ioctl

As of version 3.24.2, Syd denies the FS_IOC_SETFLAGS ioctl(2) request by default, a critical security measure to ensure that once file flags are set, they remain unchanged throughout the runtime of the sandbox. This policy is particularly focused on the immutable and append-only flags, which need to be configured by an administrator at the start of the Syd process. Once these attributes are set on crucial system and log files -- marking them either as immutable to prevent any modification, or append-only to ensure that existing data cannot be erased -- they are frozen. This means that no subsequent modifications can be made to these attributes, effectively locking down the security settings of the files against any changes. This approach prevents scenarios where, even after a potential security breach, malicious entities are unable to alter or delete important files, thus maintaining the integrity and reliability of the system against tampering and ensuring that audit trails are preserved.

Denying PR_SET_MM prctl

The PR_SET_MM prctl(2) call allows processes with the CAP_SYS_RESOURCE capability to adjust their memory map descriptors, facilitating operations like self-modifying code by enabling dynamic changes to the process's memory layout. For enhanced security, especially in constrained environments like Syd, this capability is restricted to prevent unauthorised memory manipulations that could lead to vulnerabilities such as code injection or unauthorised code execution. Notably, Syd proactively drops CAP_SYS_RESOURCE among other capabilities at startup to minimise security risks. This action is part of Syd's broader security strategy to limit potential attack vectors by restricting process capabilities.

Restricting prctl option space and trace/allow_unsafe_prctl

Syd meticulously confines the scope of permissible prctl(2) operations to enhance security within its sandbox environment. By limiting available prctl(2) options to a specific set, including but not limited to PR_SET_PDEATHSIG, PR_GET_DUMPABLE, PR_SET_NO_NEW_PRIVS, and PR_SET_SECCOMP, Syd ensures that only necessary process control functionalities are accessible, thereby reducing the risk of exploitation through less scrutinised prctl(2) calls. This constraint is pivotal in preventing potential security vulnerabilities associated with broader prctl(2) access, such as unauthorised privilege escalations or manipulations of process execution states. However, recognizing the need for flexibility in certain scenarios, Syd offers the option to lift these restrictions through the trace/allow_unsafe_prctl:1 setting. This capability allows for a tailored security posture, where users can opt for a more permissive prctl(2) environment if required by their specific use case, while still maintaining awareness of the increased security risks involved.

Restricting io_uring interface and trace/allow_unsafe_uring

The io_uring(7) interface can be used to bypass path sandboxing. By default, Syd restricts io_uring(7) operations due to their ability to perform system calls that could undermine the sandbox's security controls, particularly those designed to limit file access and modify file permissions. The setting, trace/allow_unsafe_uring, when enabled, relaxes these restrictions, allowing io_uring(7) operations to proceed unimpeded. While this can significantly enhance I/O performance for applications that rely on io_uring(7) for efficient asynchronous operations, it requires careful consideration of the security implications, ensuring that its use does not inadvertently compromise the sandboxed application's security posture. Refer to the output of the command syd-ls uring to see the full list of system calls that belong to the io_uring(7) interface.

Restricting creation of device special files

Since version 3.1.12, Syd has enhanced its security model by disallowing the creation of device special files through the mknod(2) and mknodat(2) system calls. This decision is rooted in mitigating potential security vulnerabilities, as device special files could be exploited to circumvent established path-based access controls within the sandbox environment. These files, which include character and block devices, can provide direct access to hardware components or facilitate interactions with kernel modules that could lead to unauthorised actions or data exposure. By restricting their creation, Syd significantly reduces the risk of such exploit paths, reinforcing the integrity and security of the sandboxed applications. This measure ensures that only predefined types of files -- such as FIFOs, regular files, and sockets -- are permissible, aligning with the principle of least privilege by limiting file system operations to those deemed safe within the sandbox's context.

Sharing Pid namespace with signal protections

Since version 3.6.7, Syd has introduced a crucial security feature that prevents sandboxed processes from sending signals to the Syd process or any of its threads. This protection is implemented by hooking and monitoring system calls related to signal operations, including kill(2), tkill(2), tgkill(2), and pidfd_open(2). When a sandboxed process attempts to send a signal to Syd or its threads, these system calls are intercepted, and the operation is denied at the seccomp level with an EACCES ("Permission denied") errno(3). This measure ensures that Syd maintains control over the execution and management of sandboxed processes, safeguarding against interruptions or unauthorised interactions that could compromise the security or stability of the sandbox environment. This security mechanism is part of Syd's broader strategy to share the same root, private proc, and mount namespaces with the sandboxed process, facilitating secure and simple system call emulation. By making Syd and its threads immune to signals from sandboxed processes, the integrity and isolation of the sandboxed environment are significantly enhanced, preventing potential exploitation scenarios where sandboxed processes could disrupt the operation of the sandbox manager or interfere with other sandboxed processes.

As of version 3.35.2, Syd puts itself in a new process group using setpgid(2) and releases the controlling terminal using the TIOCNOTTY ioctl(2) request. Moreover a scope-only Landlock sandbox is installed unconditionally to further isolate the sandbox process from the Syd process. This ensures that terminal-generated signals and I/O remain confined to the sandbox's process group and cannot affect Syd or any other processes, further strengthening the sandbox's isolation guarantees alongside the existing seccomp-based PID namespace protections.

Process Priority and Resource Management

Since version 3.8.1, Syd has been implementing strategies to ensure the smooth operation of the host system while managing security through its sandboxing mechanism. It sets the nice(2) value of its system call handler threads to 19, ensuring these threads operate at the lowest priority to minimise CPU starvation for other critical processes. This approach prioritises system stability and fair CPU resource distribution, enabling Syd to handle numerous system calls without compromising the host's performance and responsiveness.

Enhancing this strategy, Syd introduced further adjustments in versions 3.8.6 and 3.9.7 to address I/O and CPU resource management more comprehensively. From version 3.8.6, it sets the I/O priority of the system call handler threads to idle, ensuring that I/O operations do not monopolise resources and lead to I/O starvation for other processes. Similarly, from version 3.9.7, it adjusts the CPU scheduling priority of these threads to idle, further safeguarding against CPU starvation. These measures collectively ensure that Syd maintains optimal performance and system responsiveness while securely sandboxing applications, striking a balance between security enforcement and efficient system resource utilization.

As of version 3.30.0, changes in process and I/O priorities are inherited by sandbox processes as well and sandbox processes are prevented from making any further changes. Moreover, the option trace/allow_unsafe_nice may be set at startup to prevent Syd from making any changes and allow sandbox processes access to the system calls that are used to make process and I/O priority changes.

Streamlining File Synchronization Calls

As of version 3.8.8, Syd has rendered the sync(2) and syncfs(2) system calls as no-operations (no-ops), ensuring they report success without executing any underlying functionality. This adjustment is designed to streamline operations within the sandboxed environment, bypassing the need for these file synchronization actions that could otherwise impact performance or complicate the sandbox's control over file system interactions. By adopting this approach, Syd enhances its compatibility with applications that issue these calls, without altering the sandboxed process's behavior or the integrity of file system management. As of version 3.28.0, this restriction can be disabled at startup with the option trace/allow_unsafe_sync:1. This is useful in scenarios where sync is actually expected to work such as when sandboxing databases.

Restricting Resource Limits, Core Dumps, and trace/allow_unsafe_prlimit

Since version 3.9.6, Syd has implemented restrictions on setting process resource limits and generating core dumps for the sandboxed process, enhancing the sandbox's security posture. This measure prevents the sandboxed process from altering its own resource consumption boundaries or producing core dumps, which could potentially leak sensitive information or be exploited for bypassing sandbox restrictions. However, recognizing the need for flexibility in certain use cases, Syd provides the option to disable these restrictions at startup through the trace/allow_unsafe_prlimit:1 setting. This allows administrators to tailor the sandbox's behavior to specific requirements, balancing security considerations with functional needs.

Enhancing Sandbox Security with Landlock

Since version 3.0.1, Syd leverages landlock(7) to enforce advanced filesystem sandboxing, significantly bolstering the security framework within which sandboxed processes operate. By integrating Landlock, Syd empowers even unprivileged processes to create secure sandboxes, enabling fine-grained access control over filesystem operations without requiring elevated permissions. This approach is instrumental in mitigating the risk of security breaches stemming from bugs or malicious behaviors in applications, offering a robust layer of protection by restricting ambient rights, such as global filesystem or network access. Landlock operates by allowing processes to self-impose restrictions on their access to system resources, effectively creating a secure environment that limits their operation to a specified set of files and directories. This mechanism is particularly useful for running legacy daemons or applications that require specific environmental setups, as it allows for the precise tailoring of access rights, ensuring processes can only interact with designated parts of the filesystem. For instance, by setting Landlock rules, Syd can confine a process's filesystem interactions to read-only or read-write operations on explicitly allowed paths, thus preventing unauthorised access to sensitive areas of the system.

Furthermore, the inclusion of the Syd process itself within the Landlock-enforced sandbox adds an additional layer of security. This design choice ensures that even if the Syd process were compromised, the attacker's ability to manipulate the sandboxed environment or access unauthorised resources would be significantly constrained. This self-sandboxing feature underscores Syd's commitment to maintaining a high security standard, offering peace of mind to users by ensuring comprehensive containment of sandboxed processes.

Namespace Isolation in Syd

Syd enhances sandbox isolation through meticulous namespace use, starting from version 3.0.2. Version 3.9.10 marks a pivotal enhancement by restricting user subnamespace creation, addressing a key path sandboxing bypass vulnerability. This strategic limitation thwarts sandboxed processes from altering their namespace environment to access restricted filesystem areas. Furthermore, since version 3.11.2, Syd maintains process capabilities within user namespaces, mirroring the unshare(1) command's --keep-caps behavior. This ensures sandboxed processes retain necessary operational capabilities, enhancing security without compromising functionality. Additionally, Syd utilises the powerful bind command within the mount namespace to create secure, isolated environments by allowing specific filesystem locations to be remounted with custom attributes, such as ro, noexec, nosuid, nodev, or nosymfollow, providing a flexible tool for further restricting sandboxed processes' access to the filesystem.

Syd also introduces enhanced isolation within the mount namespace by offering options to bind mount temporary directories over /dev/shm and /tmp, ensuring that sandboxed processes have private instances of these directories. This prevents inter-process communication through shared memory and mitigates the risk of temporary file-based attacks, further solidifying the sandbox's defence mechanisms. As of version 3.35.2, an empty mount namespace may be built from scratch starting with the root:tmpfs command. As of version 3.11.2, Syd mounts the procfs(5) filesystem privately with the hidepid=2 option, enhancing privacy by concealing process information from unauthorised users. As of version 3.37.2, this option is changed to hidepid=4 which is new in Linux>=5.8 for added hardening. As of version 3.39.0 the option subset=pid is also supplied to private procfs(5) mount for added hardening. This option is also new in Linux>=5.8.

Syd's container and immutable profiles exemplify its adaptability, offering from isolated to highly restrictive environments. The container profile provides a general-purpose sandbox, while the immutable profile enforces stricter controls, such as making essential system directories read-only, to prevent tampering. This comprehensive approach underlines Syd's adept use of kernel features for robust sandbox security, ensuring a secure and controlled execution environment for sandboxed applications. See syd-cat -pcontainer, and syd-cat -pimmutable to list the rules in these sandboxing profiles.

As of version 3.23.0, Syd has further strengthened its security with the introduction of a time namespace, represented by the unshare/time:1 option, allows Syd to reset the boot-time clock, ensuring that the uptime(1) command reports container uptime instead of host uptime. Moreover, the creation of namespaces, including mount, UTS, IPC, user, PID, net, cgroup, and time is denied by default to prevent unauthorized namespace manipulation that could undermine path sandboxing security. To allow specific namespace types, administrators must explicitly enable them via the trace/allow_unsafe_namespace setting. Another restriction to note is that the system calls mount(2), mount_setattr(2), umount(2), and umount2(2) are denied by default unless mount namespace is allowed. This change ensures tighter control over process capabilities and isolation, reinforcing the defense mechanisms against potential security breaches.

Restricting environment and trace/allow_unsafe_env

As of version 3.11.1, Syd has implemented measures to clear unsafe environment variables, such as LD_PRELOAD, enhancing security by preventing the manipulation of dynamic linker behavior by sandboxed processes. This action mitigates risks associated with dynamic linker hijacking, where adversaries may load malicious shared libraries to execute unauthorised code, potentially leading to privilege escalation, persistence, or defence evasion. Variables like LD_PRELOAD allow specifying additional shared objects to be loaded before any others, which could be exploited to override legitimate functions with malicious ones, thus hijacking the execution flow of a program. To accommodate scenarios where developers might need to use these variables for legitimate purposes, Syd allows this security feature to be disabled at startup with trace/allow_unsafe_env:1, offering flexibility while maintaining a strong security posture. This careful balance ensures that sandboxed applications operate within a tightly controlled environment, significantly reducing the attack surface and enhancing the overall security framework within which these applications run. Refer to the output of the command syd-ls env to see the full list of environment variables that Syd clears from the environment of the sandbox process. As of version 3.39.0, Syd additionally clears LANG and the full set of LC_* locale variables (e.g. LC_CTYPE, LC_TIME, LC_ALL, etc.) to avoid leaking locale settings into the sandboxed process -- preventing subtle behavior differences or information disclosure that could be abused. Similarly, the TZ variable is cleared to prevent leaking timezone settings to the sandbox process. The builtin linux profile masks the file /etc/localtime and the glob(3p) pattern /usr/share/zoneinfo/** with the file /usr/share/zoneinfo/UTC preventing another vector of timezone settings leaking into the environment of the sandbox process. For controlled exceptions, the CLI -e flag provides fine-grained control: -e var=val injects var=val into the child environment, -e var removes var from the child environment, and -e var= explicitly passes through an otherwise unsafe variable; any of these forms may be repeated as needed.

Managing Linux Capabilities for Enhanced Security

Since its 3.0.17 release, Syd strategically curtails specific Linux capabilities(7) for sandboxed processes to bolster security. By revoking privileges such as CAP_SYS_ADMIN among others, Syd significantly reduces the risk of privilege escalation and system compromise. This proactive measure ensures that even if a sandboxed process is compromised, its ability to perform sensitive operations is severely limited. The comprehensive list of dropped capabilities, including but not limited to CAP_NET_ADMIN, CAP_SYS_MODULE, and CAP_SYS_RAWIO, reflects a meticulous approach to minimizing the attack surface. Refer to the output of the command syd-ls drop to see the full list of capabilities(7) that Syd drops at startup.

Exceptions to this stringent policy, introduced in version 3.11.1, such as retaining CAP_NET_BIND_SERVICE with trace/allow_unsafe_bind:1, CAP_NET_RAW with trace/allow_unsafe_socket:1, CAP_SYSLOG with trace/allow_unsafe_syslog:1 and CAP_SYS_TIME with trace/allow_unsafe_time:1, offer a nuanced security model. These exceptions allow for necessary network, syslog and time adjustments within the sandbox, providing flexibility without significantly compromising security.

Since version 3.12.5, Syd allows the user to prevent dropping capabilities at startup using the command trace/allow_unsafe_caps:1. This command may be used to construct privileged containers with Syd.

This balanced strategy of restricting capabilities(7), coupled with selective permissions, exemplifies Syd's commitment to crafting a secure yet functional sandbox environment. By leveraging the granularity of Linux capabilities(7), Syd offers a robust framework for safeguarding applications against a variety of threats, underscoring its role as a pivotal tool in the security arsenal of Linux environments.

Path Resolution Restriction For Chdir and Open Calls

In Syd version 3.15.1, a configurable security feature is available to address the risk of directory traversal attacks by restricting the use of .. components in path arguments for chdir(2), open(2), openat(2), openat2(2), and creat(2) system calls. This feature is off by default, ensuring broad compatibility and operational flexibility for a range of applications. When enabled with the trace/deny_dotdot:1 command, Syd strengthens its defence mechanisms against unauthorised directory access, echoing the flexibility seen in FreeBSD's vfs.lookup_cap_dotdot sysctl. This allows for a nuanced approach to filesystem security, where administrators can tailor the sandbox's behavior to match specific security requirements or operational contexts. By drawing on the security insights of FreeBSD and HardenedBSD, Syd provides a versatile toolset for managing path traversal security, adaptable to the unique demands of various application environments. See the following links for more information:

https://man.freebsd.org/cgi/man.cgi?open(2)
https://cgit.freebsd.org/src/tree/sys/kern/vfs_lookup.c#n351

Enhanced Symbolic Link Validation

As of version 3.13.0, Syd enhances security by enforcing stricter validation on symbolic links within /proc/pid/fd, /proc/pid/cwd, /proc/pid/exe, and /proc/pid/root, addressing potential misuse in container escape scenarios. Specifically, Syd returns an EACCES ("Permission denied") errno(3) for attempts to resolve these symbolic links if they do not pertain to the current process, akin to implementing RESOLVE_NO_MAGICLINKS behavior of the openat2(2) system call. This measure effectively hardens the sandbox against attacks exploiting these links to access resources outside the intended confinement, bolstering the isolation provided by Syd and mitigating common vectors for privilege escalation and sandbox escape. As of version 3.14.5, Syd keeps intercepting path system calls even if sandboxing is off making this protection unconditional.

Trusted Symbolic Links

As of version 3.37.2, Syd implements a robust symbolic-link hardening mechanism that intercepts every symlink(7) resolution within untrusted directories -- those marked world-writable, group-writable, or carrying the sticky bit -- and denies any follow operation, returning EACCES ("Permission denied"); this behavior mirrors GrSecurity's CONFIG_GRKERNSEC_LINK and guarantees that symlink chains in shared or temporary locations cannot be weaponized for TOCTOU or link-trick exploits. Under the default policy, neither direct nor nested symlinks in untrusted paths will be traversed, and the check is applied at the seccomp(2) interception layer prior to any mutable state changes -- ensuring an early, fail-close enforcement. Administrators may relax this restriction at startup or runtime by enabling the trace/allow_unsafe_symlinks:1 option, which restores legacy symlink behavior for compatibility at the cost of re-exposing potential link-based race vulnerabilities. Refer to the following links for more information:

https://wiki.gentoo.org/wiki/Hardened/Grsecurity2_Quickstart
https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Linking_restrictions
https://xorl.wordpress.com/2010/11/11/grkernsec_link-linking-restrictions/
https://man7.org/linux/man-pages/man5/proc_sys_fs.5.html

Trusted Hardlinks

As of version 3.37.4, Syd introduces a comprehensive Trusted Hardlinks policy to mitigate a class of vulnerabilities stemming from unsafe hardlink creation, particularly those enabling time-of-check-to-time-of-use (TOCTOU) exploitation and privilege escalation in shared filesystem environments. This mitigation enforces strict constraints on which files may be linked, based on their visibility, mutability, and privilege-related attributes. A file is permitted as a hardlink target only if it is accessible for both reading and writing by the caller, ensuring that immutable or opaque targets cannot be leveraged in multi-stage attack chains. Furthermore, the file must be a regular file and must not possess privilege-escalation enablers such as the set-user-ID bit or a combination of set-group-ID and group-executable permissions. These checks are performed preemptively and unconditionally during syscall handling to eliminate reliance on ambient filesystem state and to maintain integrity under adversarial conditions. Administrators may relax this policy for compatibility purposes using the trace/allow_unsafe_hardlinks:1 option, though doing so reintroduces well-documented attack surfaces and undermines the guarantees provided by Syd's secure execution model. Refer to the following links for more information:

https://wiki.gentoo.org/wiki/Hardened/Grsecurity2_Quickstart
https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Linking_restrictions
https://xorl.wordpress.com/2010/11/11/grkernsec_link-linking-restrictions/
https://man7.org/linux/man-pages/man5/proc_sys_fs.5.html

Trusted File Creation

As of version 3.37.4, Syd enforces a strict Trusted File Creation policy designed to mitigate longstanding race-condition vulnerabilities associated with unprivileged use of O_CREAT in shared or adversarial environments. Building upon the Linux kernel's protected_fifos and protected_regular sysctls -- as well as the stricter semantics of grsecurity's CONFIG_GRKERNSEC_FIFO -- this mitigation blocks all O_CREAT operations targeting pre-existing FIFOs or regular files unless the calling process is the file's owner and the file is neither group-writable nor world-writable, irrespective of the parent directory's ownership or permissions. Unlike upstream Linux, which allows certain accesses if the file resides in a directory owned by the caller, Syd eliminates this dependency to close subtle privilege boundary gaps and ensure consistent, capability-centric enforcement even in nested namespace or idmapped mount scenarios. This policy guarantees that users cannot preempt or hijack file-based IPC or partial writes via shared directories, while maintaining usability through precise capability trimming. For compatibility with legacy workloads or permissive setups, this restriction may be selectively disabled by setting the trace/allow_unsafe_create:1 option, though doing so reintroduces exposure to well-documented filesystem race attacks.

As of version 3.45.0, Syd extends this policy to deny file creation through dangling symbolic links as part of its filesystem race hardening. At the open(2) boundary, the presence of O_CREAT implicitly adds O_NOFOLLOW unless O_EXCL is also specified, so attempts to create or truncate a path whose final component is a symlink will fail rather than resolving the link target. This behaviour directly addresses classes of vulnerabilities where privileged components are tricked into creating or modifying files behind attacker-controlled symlinks, such as CVE-2021-28153 in GLib (file creation via dangling symlink replacement) and repeated symlink- or mount-race attacks in container runtimes: CVE-2018-15664 (docker cp path traversal via symlink and mount races), CVE-2019-16884 (runc bind-mount escape through user-controlled symlinked host paths), CVE-2021-30465 (runc container escape via crafted /proc and mount races), CVE-2025-31133 (runc maskedPath abuse to obtain writable procfs bindings), CVE-2025-52565 (runc /dev/console bind-mount symlink races leading to writable procfs targets), and CVE-2025-52881 (runc redirected writes bypassing LSM enforcement to arbitrary procfs files). By enforcing fail-closed semantics for all O_CREAT operations that encounter symlinks, Syd reduces the attack surface for these patterns even when higher-level code assumes symbolic links cannot influence file creation. Refer to the following links for more information:

https://wiki.gentoo.org/wiki/Hardened/Grsecurity2_Quickstart
https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#FIFO_restrictions
https://xorl.wordpress.com/2010/11/24/grkernsec_fifo-named-pipe-restrictions/
https://man7.org/linux/man-pages/man5/proc_sys_fs.5.html
https://nvd.nist.gov/vuln/detail/CVE-2021-28153
https://github.com/advisories/GHSA-9hh6-p5c5-mmmf
https://nvd.nist.gov/vuln/detail/CVE-2018-15664
https://nvd.nist.gov/vuln/detail/CVE-2019-16884
https://nvd.nist.gov/vuln/detail/CVE-2021-30465
https://nvd.nist.gov/vuln/detail/CVE-2025-31133
https://nvd.nist.gov/vuln/detail/CVE-2025-52565
https://nvd.nist.gov/vuln/detail/CVE-2025-52881
https://www.openwall.com/lists/oss-security/2025/11/05/3
https://github.com/opencontainers/runc/security
https://www.starlab.io/blog/linux-symbolic-links-convenient-useful-and-a-whole-lot-of-trouble

Memory-Deny-Write-Execute Protections

Syd version 3.14.1 enhances its security framework by implementing Memory-Deny-Write-Execute (MDWE) protections, aligning with the PR_SET_MDWE and PR_MDWE_REFUSE_EXEC_GAIN functionality introduced in Linux kernel 6.3. This feature establishes a stringent policy against creating memory mappings that are simultaneously writable and executable, closely adhering to the executable space protection mechanisms inspired by PaX project. In addition, Syd fortifies these MDWE protections by employing kernel-level seccomp filters on critical system calls, including mmap(2), mmap2(2), mprotect(2), pkey_mprotect(2), and shmat(2). These filters are designed to intercept and restrict operations that could potentially contravene MDWE policies, such as attempts to make non-executable memory mappings executable or to map shared memory segments with executable permissions. By integrating PR_SET_MDWE for preemptive kernel enforcement and utilizing seccomp filters for granular, kernel-level control over system call execution, Syd provides a robust defence mechanism against exploitation techniques that exploit memory vulnerabilities, thereby ensuring a securely hardened execution environment. This restriction may be relaxed using the trace/allow_unsafe_exec_memory:1 sandbox command at startup. Even with this restriction relaxed, Syd is going to call PR_SET_MDWE, but it will use the PR_MDWE_NO_INHERIT flag to prevent propagation of the MDWE protection to child processes on fork(2).

As of version 3.25.0, Syd kills the process on memory errors rather than denying these system calls with EACCES ("Permission denied"). This ensures the system administrator gets a notification via dmesg(1), and has a higher chance to react soon to investigate potentially malicious activity. In addition, repeated failures are going to trigger SegvGuard.

As of version 3.37.0, Syd addresses a fundamental architectural vulnerability in the Linux kernel's Memory-Deny-Write-Execute (MDWE) implementation through proactive file descriptor writability assessment during memory mapping operations. This enhancement directly mitigates Linux kernel bug 219227, which exposes a critical W^X enforcement bypass wherein adversaries can circumvent memory protection mechanisms by exploiting the semantic disconnect between file-backed memory mappings and their underlying file descriptors. The vulnerability manifests when executable memory regions are mapped with PROT_READ|PROT_EXEC permissions from file descriptors that retain write access, enabling post-mapping modification of executable memory content through standard file I/O operations -- effectively transforming read-only executable mappings into mutable code regions that violate fundamental W^X invariants. By implementing mandatory writability validation prior to permitting any file-backed executable memory mapping, Syd enforces strict temporal isolation between memory mapping permissions and underlying file descriptor capabilities, thereby preventing the exploitation of this kernel-level abstraction leakage that would otherwise enable arbitrary code injection through seemingly benign file operations. This defense mechanism operates at the syscall interception layer, providing comprehensive protection against sophisticated memory corruption attacks that leverage the incongruity between virtual memory management and file system semantics to achieve unauthorized code execution within ostensibly hardened environments. This restriction may be relaxed using the trace/allow_unsafe_exec_memory:1 sandbox command at startup.

Advanced Memory Protection Mechanisms

Syd version 3.15.1 enhances its security framework by integrating sophisticated a seccomp BPF hook to meticulously block executable+shared memory mappings, targeting a critical vulnerability exploitation pathway. As of version 3.21.3, Syd also blocks executable+anonymous memory. These updates refine the sandbox's defence against unauthorised memory access and arbitrary code execution by inspecting and filtering system calls, notably mmap(2), and mmap2(2), to enforce stringent policies against dangerous memory mapping combinations. While this bolstered security measure significantly reduces the attack surface for exploits like buffer overflows and code injections, it acknowledges potential legitimate use cases, such as Just-In-Time (JIT) compilation and plugin architectures, that may require exceptions. To accommodate necessary exceptions without compromising overall security, Syd allows these restrictions to be relaxed with explicit configuration through the trace/allow_unsafe_exec_memory:1 command, ensuring that users can fine-tune the balance between security and functionality according to specific requirements, with a keen eye on preventing the propagation of relaxed security settings to child processes.

Null Address Mapping Prevention

In our ongoing effort to enhance the security features of Syd, as of version 3.15.1 we introduced a crucial update inspired by the practices of HardenedBSD, specifically aimed at bolstering our sandbox's defences against null pointer dereference vulnerabilities. Following the model set by HardenedBSD, Syd now includes a new security measure that completely prohibits the mapping of memory at the NULL address using the mmap(2) and mmap2(2) system calls with the MAP_FIXED and MAP_FIXED_NOREPLACE flags. This addition is implemented through meticulous seccomp filter rules that block these specific mapping requests when the first argument (addr) is zero, effectively rendering attempts to exploit null pointer dereferences as non-viable by ensuring such memory allocations result in respective system call getting denied with EACCES ("Permission denied"). By disallowing the execution of arbitrary code at the NULL address, Syd significantly reduces the attack surface associated with such vulnerabilities, reinforcing the sandbox's commitment to providing a robust security framework for Linux systems. This technical enhancement reflects our dedication to leveraging advanced security insights from the broader community, embodying our proactive stance on safeguarding against evolving threats.

Linux has vm/mmap_min_addr which guards against this already. Hence, this acts as a second layer of defense. Unlike Syd, Linux allows processes with the CAP_SYS_RAWIO capability to edit/override this value. As of version 3.37.0, Syd caps this value at page size like OpenBSD does for added hardening against such edits.

As of version 3.25.0, all addresses lower than the value of vm/mmap_min_addr at Syd startup are included into the seccomp filter the action of the filter is set to kill process rather than deny with EACCES. This ensures the system administrator gets a notification via dmesg(1), and has a higher chance to react soon to investigate potentially malicious activity. In addition, repeated failures are going to trigger SegvGuard.

Default Memory Allocator Security Enhancement

As of version 3.46.0, Syd has transitioned to using the GrapheneOS allocator as its default memory allocator. This new allocator leverages modern hardware capabilities to provide substantial defenses against common vulnerabilities like heap memory corruption, while reducing the lifetime of sensitive data in memory. While the previously used mimalloc with the secure option offered notable security improvements, the GrapheneOS allocator goes further with features like out-of-line metadata protection, fine-grained randomization, and aggressive consistency checks. It incorporates advanced techniques such as hardware memory tagging for probabilistic detection of use-after-free errors, zero-on-free with write-after-free detection, and randomized quarantines to mitigate use-after-free vulnerabilities. The allocator is designed to prevent traditional exploitation methods by introducing high entropy, random base allocations across multiple memory regions, and offers a portable solution being adopted by other security-focused operating systems like Secureblue. It also heavily influenced the next-generation musl malloc implementation, improving security with minimal memory usage. Refer to the following links for more information:

https://grapheneos.org/features#exploit-mitigations
https://github.com/GrapheneOS/hardened_malloc

Enhanced Security for Memory File Descriptors

In version 3.21.1, Syd significantly enhanced its security posture by introducing restrictions on memory file descriptors (memfds). The memfd_create(2) system call is now sandboxed under Create sandboxing, with the name argument prepended with !memfd: before access checks. This allows administrators to globally deny access to memfds using rules like deny/create+!memfd:*. Additionally, the memfd_secret(2) system call, which requires the secretmem.enable=1 boot option and is seldom used, was denied to prevent potential exploits. Despite file I/O being restricted on secret memfds, they could be abused by attackers to write payloads and map them as executable, thus bypassing denylisted code execution controls.

Building on these changes, version 3.21.2 further fortifies security by making memfds non-executable by default. This is achieved by removing the MFD_EXEC flag and adding the MFD_NOEXEC_SEAL flag to memfd_create(2), ensuring memfds cannot be made executable. Notably, the MFD_NOEXEC_SEAL flag requires Linux-6.3 or newer to function. These measures collectively mitigate the risk of memfd abuse, which can involve executing malicious code within a sandbox, circumventing security mechanisms like Exec, Force, and TPE sandboxing. For scenarios where executable or secret memfds are genuinely required, the trace/allow_unsafe_memfd:1 option allows for relaxing these restrictions, though it introduces increased security risks. By default, these enhancements enforce a robust security posture, preventing attackers from leveraging memfds as a vector for unauthorized code execution.

Path Masking

Introduced in version 3.16.7, the Path Masking feature in Syd enhances security by enabling the obfuscation of file contents without denying access to the file itself. This functionality is critical in scenarios where compatibility requires file presence, but not file readability. Path Masking works by redirecting any attempt to open(2) a specified file to the character device /dev/null, effectively presenting an empty file to the sandboxed process. The original file metadata remains unchanged, which is essential for applications that perform operations based on this data. Moreover, masked files can still be executed, providing a seamless integration where executability is required but content confidentiality must be preserved.

This feature leverages glob(3p) patterns to specify which files to mask, allowing for flexible configuration tailored to diverse security needs. By default, Syd masks sensitive paths such as /proc/cmdline to prevent the leakage of potentially sensitive boot parameters, aligning with Syd's security-first design philosophy. Path Masking is a robust security enhancement that minimises the risk of sensitive data exposure while maintaining necessary system functionality and compliance with expected application behaviors.

Refined Socket System Call Enforcement

In Syd version 3.16.12, we have strengthened the enforcement of socket system call restrictions within the sandbox using kernel-level BPF filters. This enhancement builds upon existing features by embedding these controls directly into the Syd process, ensuring that even if Syd is compromised, it cannot utilise or manipulate denied socket domains. This proactive measure restricts socket creation strictly to permitted domains such as UNIX (AF_UNIX), IPv4 (AF_INET), and IPv6 (AF_INET6), significantly reducing the network attack surface. The trace/allow_unsupp_socket:1 option allows for the extension of permissible socket domains, catering to specific needs but potentially increasing exposure risks. Additionally, trace/allow_safe_kcapi:1 enables access to the Kernel Crypto API, facilitating necessary cryptographic operations directly at the kernel level. These enhancements provide a more secure and configurable environment, allowing administrators precise control over network interactions and improving the overall security posture of the sandbox.

Enhanced Execution Control (EEC)

The Enhanced Execution Control (EEC) feature, introduced in Syd version 3.17.0, represents a significant advancement in the sandbox's defence mechanisms. This feature strategically disables the execve(2) and execveat(2) system calls for the Syd process after they are no longer required for executing the sandbox process, thus safeguarding against their potential abuse by a compromised Syd process. The prohibition of these critical system calls adds a robust layer to the existing Memory-Deny-Write-Execute (MDWE) protections, intensifying the system's defences against exploit techniques such as code injection or return-oriented programming (ROP). Concurrently, EEC ensures that the ptrace(2) syscall is limited following the initial use of the PTRACE_SEIZE call for execution-related mitigations. This action effectively prevents subsequent system trace operations, barring unauthorised process attachments and further securing the system against manipulation. Together, these measures enhance Syd's security architecture, reflecting an ongoing commitment to implement rigorous, state-of-the-art safeguards within the execution environment.

As of version 3.17.1, the Enhanced Execution Control (EEC) has been further strengthened by integrating mprotect(2) hardening mechanisms specifically targeting the prevention of the ret2mprotect exploitation technique. This enhancement blocks attempts to alter memory protections to executable (using the PROT_EXEC flag) via the mprotect(2) and pkey_mprotect(2) system calls. By adding these checks, EEC mitigates the risk associated with compromised Syd processes by enforcing stringent memory operation policies that prevent unauthorised memory from becoming executable, thereby countering sophisticated memory corruption attacks such as return-oriented programming (ROP) and other code injection strategies. This proactive security measure is crucial for maintaining the integrity of the sandbox environment, ensuring that Syd continues to offer robust protection against evolving exploit techniques.

As of version 3.23.9, the Enhanced Execution Control (EEC) feature has been expanded to mitigate Sigreturn Oriented Programming (SROP) attacks by denying access to the system calls sigreturn(2) and rt_sigreturn(2) for syd(1), syd-oci(1), and syd-tor(1). Given the lack of signal handlers, these system calls have no legitimate use. By preventing these calls, the system is better protected against SROP attacks, which involve manipulating signal handler frames to control program state, thus significantly enhancing the security of the execution environment. For further reading, refer to section 2.4.4 Sigreturn-oriented programming in the Low-Level Software Security book (URL: https://llsoftsec.github.io/llsoftsecbook/#sigreturn-oriented-programming ). SROP (Bosman and Bos 2014) is a special case of ROP where the attacker creates a fake signal handler frame and calls sigreturn(2), a system call on many UNIX-type systems normally called upon return from a signal handler, which restores the state of the process based on the state saved on the signal handler's stack by the kernel previously. The ability to fake a signal handler frame and call sigreturn gives an attacker a simple way to control the state of the program.

Enhanced execve and execveat Syscall Validation

As of version 3.24.2, security enhancements to execve(2) and execveat(2) syscalls have been introduced to thwart simple Return-Oriented Programming (ROP) attacks. Per the Linux execve(2) manpage: "On Linux, argv and envp can be specified as NULL. In both cases, this has the same effect as specifying the argument as a pointer to a list containing a single null pointer. Do not take advantage of this nonstandard and nonportable misfeature! On many other UNIX systems, specifying argv as NULL will result in an error (EFAULT: "Bad address"). Some other UNIX systems treat the envp==NULL case the same as Linux." Based on this guidance, Syd now rejects execve(2) and execveat(2) with EFAULT when one of the pathname, argv and envp arguments is NULL. This mitigation targets basic ROP chains where NULL pointers are used as placeholders to bypass argument validation checks, a common tactic in exploiting buffer overflow vulnerabilities. For example, a typical ROP chain trying to execute execve(2) with argv and envp set to NULL would be intercepted and denied under these rules:

0x0000:         0x40ee2b pop rdx; ret
0x0008:              0x0 [arg2] rdx = 0
0x0010:         0x402885 pop rsi; ret
0x0018:              0x0 [arg1] rsi = 0
0x0020:         0x4013cc pop rdi; ret
0x0028:         0x460000 [arg0] rdi = 4587520
0x0030:         0x438780 execve

An attacker might circumvent this mitigation by ensuring that none of the critical syscall arguments are NULL. This requires a more sophisticated setup in the ROP chain, potentially increasing the complexity of the exploit and reducing the number of vulnerable targets. This focused security measure enhances system resilience against simple ROP exploits while maintaining compliance with POSIX standards, promoting robustness and cross-platform security.

As of version 3.25.0, Syd terminates the process upon entering these system calls with NULL arguments rather than denying them with EFAULT. This ensures the system administrator gets a notification via kernel audit log, ie. dmesg(1), about potentially malicious activity. In addition, repeated failures are going to trigger SegvGuard.

We have verified the same issue is also present on HardenedBSD and notified upstream:

Issue: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/issues/106
Fix: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/cd93be7afbcfd134b45b52961fc9c6907984c85f

Securebits and Kernel-Assisted Executability

As of version 3.41.0, Syd initializes the per-thread securebits in a kernel-cooperative manner: on Linux 6.14 and newer, which provide the executability-check interface (execveat(2) with AT_EXECVE_CHECK) and the corresponding interpreter self-restriction securebits, Syd first attempts to install a comprehensive securebits configuration (with locks) that hardens capability semantics and execution constraints; if the kernel refuses changes due to privilege (e.g., CAP_SETPCAP not present) and returns EPERM ("Operation not permitted"), Syd deterministically degrades to the unprivileged, interpreter-facing policy only, thereby enabling and locking a file-descriptor-based executability check and prohibiting interactive snippet execution unless the same kernel probe passes, while on older kernels the secure-exec policy setup is treated as a no-op and startup proceeds without altering executability behavior; this initialization is inherited across forks and execs (with the kernel rule that the keep capabilities base flag is cleared on exec), is orthogonal to the no_new_privs attribute, and is designed to be monotonic and predictable under mixed-privilege and mixed-kernel deployments: unsupported features are ignored, permission failures do not abort startup, and the resulting state is the strongest policy the kernel will accept; Users may opt out of these defaults per deployment by setting trace/allow_unsafe_exec_script:1 to skip the script/file vetting policy, trace/allow_unsafe_exec_interactive:1 to allow interactive interpreter inputs again, trace/allow_unsafe_exec_null:1 to permit legacy exec with NULL argv/envp as described in the previous subsection, or trace/allow_unsafe_cap_fixup:1 to preserve traditional UID/capability-fixup semantics. Refer to the following links for more information:

https://docs.kernel.org/userspace-api/check_exec.html
https://man7.org/linux/man-pages/man2/execveat.2.html
https://man7.org/linux/man-pages/man7/capabilities.7.html
https://man7.org/linux/man-pages/man2/prctl.2.html
https://man7.org/linux/man-pages/man2/pr_set_securebits.2const.html
https://www.man7.org/linux/man-pages/man2/PR_SET_KEEPCAPS.2const.html

Enhanced Path Integrity Measures

As of version 3.17.4, Syd incorporates crucial enhancements to maintain the integrity of file system paths by systematically denying and masking paths that contain control characters. These modifications are essential for preventing the exploitation of terminal-based vulnerabilities and for maintaining robustness in logging activities. Paths identified with control characters are not only denied during sandbox access check but are also sanitized when logged to ensure that potentially harmful data does not compromise log integrity or facilitate inadvertent security breaches. Such measures underscore Syd's ongoing commitment to fortifying security by adhering to rigorous, up-to-date standards for handling untrusted input efficiently.

As of version 3.18.6, this restriction can be relaxed by using the setting trace/allow_unsafe_filename:1. This setting may be toggled from within the sandbox during runtime prior to locking the sandbox.

As of version 3.28.0, Syd has enhanced its path integrity measures by incorporating an implementation based on David A. Wheeler's Safename Linux Security Module (LSM) patches. This update not only prevents the creation of filenames containing potentially harmful characters but also hides existing files with such names. Invalid filenames are now denied with an EILSEQ ("Illegal byte sequence") errno(3) when necessary. In alignment with Wheeler's recommendations on restricting dangerous filenames, the validation now enforces stricter rules:

Control Characters: Filenames containing control characters (bytes 0x00–0x1F and 0x7F) are denied.
UTF-8 Encoding: Filenames must be valid UTF-8 sequences.
Forbidden Characters: The following characters are disallowed in filenames as they may interfere with shell operations or be misinterpreted by programs: *, ?, [, ], ", <, >, |, (, ), &, ', !, \, ;, $, and `.
Leading Characters: Filenames cannot start with a space ( ), dash (-), or tilde (~).
Trailing Characters: Filenames cannot end with a space ( ).

As of version 3.37.9, space checks have been extended to cover UTF-8 whitespace, thanks to an idea by Jacob Bachmeyer, see https://seclists.org/oss-sec/2025/q3/123 for more information.

As of version 3.38.0, the characters :, {, and } have been removed from the forbidden set to improve usability and reduce false positives. : is used commonly across /dev and /proc. {} are used by firefox(1) for filenames under the profile directory.

As of version 3.48.0, deny errno(3) has been changed from EINVAL ("Invalid argument") to EILSEQ ("Illegal byte sequence") to match ZFS behaviour.

These measures mitigate security risks associated with malicious filenames by ensuring that both new and existing filenames adhere to stringent validation rules. This enhancement strengthens overall system robustness by preventing potential exploitation through untrusted input in file operations. For more information, refer to the following links:

https://dwheeler.com/essays/fixing-unix-linux-filenames.html
https://lwn.net/Articles/686021/
https://lwn.net/Articles/686789/
https://lwn.net/Articles/686792/

Device Sidechannel Mitigations

As of Syd version 3.21.0, Syd's device sidechannel mitigations align closely with GRKERNSEC_DEVICE_SIDECHANNEL in Grsecurity, aiming to prevent timing analyses on block or character devices via stat(2) or inotify(7)/fanotify(7). For stat-family system calls, Syd, like Grsecurity, matches the last access and modification times to the creation time for devices, thwarting unprivileged user timing attacks. Instead of dropping events, Syd strips access and modify fanotify(7)/inotify(7) flags at syscall entry, preventing unsafe fanotify(7)/inotify(7) event generation. This approach ensures unauthorized users cannot determine sensitive information, such as the length of the administrator password. Syd's solution offers robust security by dynamically stripping flags, enhancing protection against these sidechannel attacks without compromising functionality. As of version 3.40.0, these mitigations can be disabled using the options trace/allow_unsafe_stat_bdev, trace/allow_unsafe_stat_cdev, trace/allow_unsafe_notify_bdev, trace/allow_unsafe_notify_cdev respectively. Refer to the following links for more information:

https://web.archive.org/web/20130111093624/http://vladz.devzero.fr/013_ptmx-timing.php
https://en.wikibooks.org/wiki/Grsecurity/Appendix/Grsecurity_and_PaX_Configuration_Options#Eliminate_stat/notify-based_device_sidechannels

Restricting CPU Emulation System Calls

As of version 3.22.1, Syd denies the modify_ldt(2), subpage_prot(2), switch_endian(2), vm86(2), and vm86old(2) system calls by default, which are associated with CPU emulation functionalities. These calls can only be allowed if the trace/allow_unsafe_cpu option is explicitly set. This restriction helps mitigate potential vulnerabilities and unauthorized access that can arise from modifying CPU state or memory protections, thus strengthening the overall security posture of the sandbox environment.

Kernel Keyring Access Restriction

To enhance system security, access to the kernel's key management facility via the add_key(2), keyctl(2), and request_key(2) system calls is restricted by default as of version 3.22.1. These calls are crucial for managing keys within the kernel, enabling operations such as adding keys, manipulating keyrings, and requesting keys. The restriction aims to prevent unauthorized or potentially harmful modifications to keyrings, ensuring that only safe, controlled access is permitted. However, administrators can relax this restriction by enabling the "trace/allow_unsafe_keyring" option, allowing these system calls to be executed when necessary for legitimate purposes.

Because of this restriction, Syd is not affected by CVE-2024-42318 although we use Landlock. See here for more information: https://www.openwall.com/lists/oss-security/2024/08/17/2

Restricting Memory Protection Keys System Calls

As of version 3.22.1, Syd denies the system calls pkey_alloc(2), pkey_free(2), and pkey_mprotect(2) by default. These system calls are associated with managing memory protection keys, a feature that can be leveraged to control memory access permissions dynamically. To allow these system calls, administrators can enable the trace/allow_unsafe_pkey option. This restriction enhances security by preventing unauthorized or potentially harmful manipulations of memory access permissions within the sandbox environment, ensuring stricter control over memory protection mechanisms.

Restricting vmsplice System Call

As of version 3.23.5, Syd disables the vmsplice(2) system call by default to enhance security. This syscall, identified as a potential vector for memory corruption and privilege escalation, poses significant risks in sandboxed environments. By default, disabling vmsplice(2) reduces the attack surface, aligning with security practices in other systems like Podman. Refer to the following links for more information:

https://lore.kernel.org/linux-mm/X+PoXCizo392PBX7@redhat.com/
https://lwn.net/Articles/268783/

As of version 3.41.3, vmsplice(2) call may be permitted at startup using the trace/allow_unsafe_vmsplice:1 option.

Enforcing Position-Independent Executables (PIE)

As of version 3.23.9, Syd mandates that all executables must be Position-Independent Executables (PIE) to leverage Address Space Layout Randomization (ASLR). PIE allows executables to be loaded at random memory addresses, significantly enhancing security by making it more difficult for attackers to predict the location of executable code. This randomization thwarts various types of exploits, such as buffer overflow attacks, which rely on predictable memory addresses to execute malicious code. To accommodate scenarios where PIE is not feasible, users can relax this restriction using the trace/allow_unsafe_exec_nopie:1 option. This ensures compatibility while maintaining a robust security posture by default, aligning with Syd's overarching strategy of employing advanced security measures to mitigate potential attack vectors.

Enforcing Non-Executable Stack

As of version 3.23.16, Syd mandates that all executables must have a non-executable stack to enhance security. A non-executable stack helps to prevent various types of exploits, such as stack-based buffer overflow attacks, by making it more difficult for attackers to execute malicious code from the stack. This security measure is similar to the enforcement of Position-Independent Executables (PIE) and is a crucial part of Syd's comprehensive security strategy. To accommodate scenarios where a non-executable stack is not feasible, administrators can relax this restriction using the trace/allow_unsafe_exec_stack:1 option. This ensures compatibility while maintaining a robust security posture by default, aligning with Syd's overarching strategy of employing advanced security measures to mitigate potential attack vectors.

As of version 3.23.19, Syd enforces this restriction at mmap(2) boundary as well so it is no longer possible to dlopen(3) a library with executable stack to change the stack permissions of the process to executable. This is useful in mitigating attacks such as CVE-2023-38408. Refer to the URL https://www.qualys.com/2023/07/19/cve-2023-38408/rce-openssh-forwarded-ssh-agent.txt for more information. As of version 3.25.0, Syd kills the process in this case rather than denying the system call to be consistent with other memory related seccomp filters. This ensures the system administrator gets a notification via the audit log, and has a higher chance to react soon to investigate potentially malicious activity. In addition, repeated failures are going to trigger SegvGuard.

Mitigation against Page Cache Attacks

As of version 3.25.0, Syd denies the mincore(2) system call by default, which is typically not needed during normal run and has been successfully (ab)used for page cache attacks: https://arxiv.org/pdf/1901.01161

To quote the Countermeasures section of the article:

Our side-channel attack targets the operating system page cache via operating system interfaces and behavior. Hence, it clearly can be mitigated by modifying the operating system implementation. Privileged Access. The QueryWorkingSetEx and mincore system calls are the core of our side-channel attack. Requiring a higher privilege level for these system calls stops our attack. The downside of restricting access to these system calls is that existing programs which currently make use of these system calls might break. Hence, we analyzed how frequently mincore is called by any of the software running on a typical Linux installation. We used the Linux perf tools to measure over a 5 hour period whenever the sys_enter_mincore system call is called by any application. During these 5 hours a user performed regular operations on the system, i.e., running various work-related tools like Libre Ofﬁce, gcc, Clion, Thunderbird, Firefox, Nautilus, and Evince, but also non-work-related tools like Spotify. The system was also running regular background tasks during this time frame. Surprisingly, the sys_enter_mincore system call was not called a single time. This indicates that making the mincore system call privileged is feasible and would mitigate our attack at a very low implementation cost.

As of version 3.35.2, the new system call cachestat(2) is also denied for the same reason as it is a scalable version of the mincore(2) system call. Again, as of version 3.35.2, the option trace/allow_unsafe_page_cache has been added to relax this restriction at startup. This may be needed to make direct rendering work with Firefox family browsers.

Enforcing AT_SECURE and UID/GID Verification

As of version 3.27.0, Syd enhances security by enforcing the AT_SECURE flag in the auxiliary vector of executables at ptrace(2) boundary upon receiving the PTRACE_EVENT_EXEC event to enforce secure-execution mode. This event happens after the executable binary is loaded into memory but before it starts executing. This enforcement ensures that the C library operates in a secure mode, disabling unsafe behaviors like loading untrusted dynamic libraries or accessing insecure environment variables. Additionally, Syd performs strict UID and GID verification to confirm that the process's user and group IDs match the expected values, preventing unauthorized privilege escalation. If the verification fails or the AT_SECURE flag cannot be set, Syd terminates the process to prevent potential security breaches. This mitigation can be relaxed at startup with the option trace/allow_unsafe_exec_libc:1, though doing so is not recommended as it reduces the effectiveness of the sandbox. Notably, secure-execution mode is enforced by apparmor(7) too and it may also be enforced by other LSMs and eBPF. You may find some implications of the secure-execution mode below. Refer to the ld.so(8) and getauxval(3) manual pages for implications of secure-execution mode on your system.

glibc dynamic linker strips/ignores dangerous LD_* variables in secure-execution mode, including LD_LIBRARY_PATH, LD_PRELOAD (only standard dirs; paths with slashes ignored), LD_AUDIT, LD_DEBUG, LD_DEBUG_OUTPUT, LD_DYNAMIC_WEAK, LD_HWCAP_MASK, LD_ORIGIN_PATH, LD_PROFILE, LD_SHOW_AUXV, LD_USE_LOAD_BIAS, etc. glibc also treats some non-LD_* variables as unsafe in secure-execution mode: GCONV_PATH, GETCONF_DIR, HOSTALIASES, LOCALDOMAIN, LOCPATH, MALLOC_TRACE, NIS_PATH, NLSPATH, RESOLV_HOST_CONF, RES_OPTIONS, TMPDIR, TZDIR (stripped/ignored). Refer to the ld.so(8) manual page for more information. As of version 3.11.1, Syd also strips unsafe environment variables before executing the sandbox process by default and this can be disabled altogether with trace/allow_unsafe_env:1 or unsafe environment variables can be selectively allowed using the -e var= format, e.g. -eLD_PRELOAD= See the Restricting environment and trace/allow_unsafe_env section of this manual page for more information.

glibc's LD_PREFER_MAP_32BIT_EXEC is always disabled in secure-execution mode (mitigates ASLR-weakening). Historical bugs (e.g., CVE-2019-19126) fixed cases where this wasn't ignored after a security transition. Refer to the ld.so(8) manual page and the following links for more information:

https://lists.gnu.org/archive/html/info-gnu/2020-02/msg00001.html
https://alas.aws.amazon.com/ALAS-2021-1511.html

glibc GLIBC_TUNABLES environment variable handling under AT_SECURE: tunables carry security levels (SXID_ERASE, SXID_IGNORE) so they're ignored/erased for secure-execution mode; post-CVE-2023-4911 hardening ensures secure-execution mode invocations with hostile GLIBC_TUNABLES are blocked/terminated. Refer to the following links for more information:

https://lwn.net/Articles/947736/
https://access.redhat.com/security/cve/cve-2023-4911
https://nvd.nist.gov/vuln/detail/CVE-2023-4911

glibc secure_getenv(3) returns NULL when AT_SECURE is set; any glibc subsystem that uses secure_getenv(3) (e.g., timezone, locale, iconv, resolver paths) will ignore environment overrides in secure-execution mode. Similarly calling getauxval(3) with the flag AT_SECURE returns true in secure-execution mode.

musl libc honors AT_SECURE and likewise ignores preload/library/locale environment knobs in secure-execution mode; examples include LD_PRELOAD, LD_LIBRARY_PATH, and MUSL_LOCPATH. Refer to the following links for more information:

https://musl.libc.org/manual.html
https://wiki.musl-libc.org/environment-variables

Because the Linux host kernel is not aware of Syd setting the AT_SECURE bit, the proc_pid_auxv(5) file will report the bit as unset. On the contrary, when verbose logging is turned on using the log/verbose:1 option, Syd will correctly log this bit as set after parsing the proc_pid_auxv(5) file of the sandbox process.

Process Name Modification Restriction

As of version 3.28.0, Syd introduces a critical security enhancement that logs and denies attempts to set a process's name using the PR_SET_NAME prctl(2) request. This mitigation is essential as it prevents malicious software from disguising itself under legitimate process names such as apache or other system daemons, thereby thwarting attempts to evade detection and maintain stealth within the system. By default, any invocation of PR_SET_NAME within the sandboxed environment is intercepted; the action is logged for audit purposes if verbose logging is on, and the system call is denied with success return, essentially turning it into a no-op. If there is a legitimate need to permit process name changes within the sandbox, this restriction can be overridden by enabling the trace/allow_unsafe_prctl:1 option, which allows PR_SET_NAME requests to succeed without logging.

Mitigation against Sigreturn Oriented Programming (SROP)

As of version 3.30.0, Syd employs a robust, multi-layered mitigation strategy against Sigreturn Oriented Programming (SROP), a sophisticated exploit technique that manipulates the state restoration behavior of the sigreturn(2) system call to hijack process execution. This approach addresses SROP's ability to bypass critical memory protections such as ASLR, NX, and partial RELRO by setting up a fake stack frame to redirect control flow upon signal return. Inspired by Erik Bosman's proposal in May 2014 (LKML PATCH 3/4), Syd incorporates a signal counting mechanism to track the number of signals delivered to a thread group, ensuring that each sigreturn(2) invocation corresponds to an actual, in-progress signal handler. A stray sigreturn(2) call violating this rule causes the process to be terminated with the signal SIGKILL. This method provides more precise protection than sigreturn(2) frame canaries, which are susceptible to circumvention under certain conditions and significantly enhances the integrity of sandboxed environments, effectively blocking a critical class of attacks. Administrators can disable these mitigations via the trace/allow_unsafe_sigreturn:1 option, though doing so exposes systems to exploitation and undermines security. For more information, refer to the following links:

http://www.cs.vu.nl/~herbertb/papers/srop_sp14.pdf
https://web.archive.org/web/20221002135950/https://lkml.org/lkml/2014/5/15/660
https://web.archive.org/web/20221002123657/https://lkml.org/lkml/2014/5/15/661
https://web.archive.org/web/20221002130349/https://lkml.org/lkml/2014/5/15/657
https://web.archive.org/web/20221002135459/https://lkml.org/lkml/2014/5/15/858
https://lwn.net/Articles/674861
https://lore.kernel.org/all/1454801964-50385-1-git-send-email-sbauer@eng.utah.edu/
https://lore.kernel.org/all/1454801964-50385-2-git-send-email-sbauer@eng.utah.edu/
https://lore.kernel.org/all/1454801964-50385-3-git-send-email-sbauer@eng.utah.edu/
https://marc.info/?l=openbsd-tech&m=146281531025185
https://isopenbsdsecu.re/mitigations/srop/

Speculative Execution Mitigation

As of version 3.30.0, Syd integrates a robust mitigation mechanism leveraging the prctl(2) system call to enforce speculative execution controls to fortify the sandbox against advanced speculative execution vulnerabilities, such as Spectre and related side-channel attacks. Upon initialization, Syd attempts to apply the PR_SPEC_FORCE_DISABLE setting for critical speculative execution features -- namely PR_SPEC_STORE_BYPASS, PR_SPEC_INDIRECT_BRANCH, and PR_SPEC_L1D_FLUSH -- thereby irrevocably disabling these CPU-level misfeatures when permissible. This proactive stance ensures that, where supported by the underlying kernel and hardware, speculative execution is constrained to eliminate potential avenues for data leakage and privilege escalation across privilege domains. The mitigation is conditionally enforced based on the availability of per-task control via prctl(2), and any inability to apply these settings due to architectural constraints or insufficient permissions results in logged informational messages without disrupting sandbox operations. Furthermore, administrators retain the capability to override this stringent security posture through the trace/allow_unsafe_exec_speculative:1 configuration option, permitting flexibility in environments where speculative execution controls may need to be relaxed for compatibility or performance reasons. This dual approach balances rigorous security enforcement with operational adaptability, ensuring that Syd maintains a hardened execution environment while providing mechanisms for controlled exceptions. By systematically disabling speculative execution vulnerabilities at the kernel interface level, Syd significantly mitigates the risk of sophisticated side-channel exploits, thereby enhancing the overall integrity and confidentiality of sandboxed applications. Refer to the links below for more information:

https://docs.kernel.org/admin-guide/hw-vuln/spectre.html
https://docs.kernel.org/userspace-api/spec_ctrl.html

As of version 3.35.2, Syd disables Speculative Store Bypass mitigations for seccomp(2) filters when trace/allow_unsafe_exec_speculative:1 is set at startup.

Cryptographically Randomized Sysinfo

Since Syd 3.28.0, the sysinfo(2) system call has been cryptographically obfuscated by applying high-entropy offsets to memory fields (e.g., total RAM, free RAM) and constraining them to plausible power-of-two boundaries, frustrating trivial attempts at system fingerprinting. Specifically, uptime and idle counters each incorporate a distinct offset up to 0xFF_FFFF (~194 days) unless unshare/time:1 when time starts from zero, while load averages are randomized in fixed-point format and clamped to realistic upper limits. Administrators seeking genuine system metrics may disable these transformations via trace/allow_unsafe_sysinfo:1, albeit at the cost of enabling straightforward correlation and potential data leakage.

Memory Sealing of Sandbox Policy Regions on Lock

Beginning with version 3.33.1, Syd applies Linux's mseal(2) syscall to enforce immutability of policy-critical memory regions at the moment the sandbox is locked with lock:on. At this point, all mutable structures influencing access control -- such as ACLs, action filters, and syscall mediation rules -- are sealed at the virtual memory level. Unlike traditional permission schemes (e.g., W^X or mprotect(2)), mseal(2) protects against structural manipulation of memory mappings themselves, preventing mmap(2), mremap(2), mprotect(2), munmap(2), and destructive madvise(2) operations from altering sealed VMAs. This eliminates attacker primitives that rely on reclaiming, remapping, or changing permissions on enforcement data, thereby closing off advanced data-oriented exploitation paths such as policy subversion through remapped ACLs or revocation of constraints via memory permission resets. Syd permits legitimate late-stage policy configuration during startup and defers sealing until lock:on is called, after which mutation of enforcement state is structurally frozen. The process is one-way and idempotent; sealed memory cannot be unsealed, ensuring strong guarantees once lockdown is complete. For diagnostic or non-hardened environments, this mechanism may be disabled explicitly via the startup toggle trace/allow_unsafe_nomseal:1, which should only be used with full awareness of the resulting relaxation in protection. When enabled, sealing substantially raises the integrity threshold of the sandbox, ensuring that post-lock policy enforcement is immune to both direct and indirect memory-level tampering.

Force Close-on-Exec File Descriptors

The trace/force_cloexec option, introduced in Syd version 3.35.2, ensures that all creat(2), open(2), openat(2), openat2(2), memfd_create(2), socket(2), accept(2), and accept4(2) system calls made by the sandbox process include the O_CLOEXEC flag. This feature can be toggled at runtime via Syd's virtual stat API, enabling dynamic adjustment of confinement levels as needed. The O_CLOEXEC flag, when set on file descriptors, ensures they are automatically closed when executing a new program via execve(2) or similar system calls. This automatic closure of file descriptors is critical for enhancing security and safety, as it prevents file descriptors from being unintentionally inherited by newly executed programs, which could otherwise lead to unauthorized access to sensitive files or resources. By enforcing the O_CLOEXEC flag across all open(2) calls, Syd mitigates the risk of file descriptor leakage, effectively isolating the sandboxed environment and ensuring a clean execution context for newly spawned processes.

Force Randomized File Descriptors

The trace/force_rand_fd option, introduced in Syd version 3.35.2, ensures that all creat(2), open(2), openat(2), openat2(2), memfd_create(2), socket(2), accept(2), and accept4(2) system calls made by the sandbox process allocate file descriptors at random available slots rather than the lowest-numbered one. When this feature is enabled, Syd specifies a random available slot (rather than the lowest-numbered one) to the SECCOMP_IOCTL_NOTIF_ADDFD operation which is used to install a file descriptor to the sandbox process. Randomizing file descriptor numbers makes it significantly harder for an attacker to predict or deliberately reuse critical descriptors, thereby raising the bar against file-descriptor reuse and collision attacks. Note that enabling this may break programs which rely on the POSIX guarantee that open(2) returns the lowest available descriptor. This behavior can be toggled at runtime via Syd's virtual stat API, allowing operators to enable or disable descriptor randomization without restarting or recompiling the sandboxed process. We're also cooperating with the HardenedBSD project to implement a similar feature in the BSD kernel. Refer to the following link for more information: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/issues/117

Syscall Argument Cookies

To further harden the seccomp(2) boundary, as of version 3.35.2 Syd embeds cryptographically-strong, per-instance "cookies" into unused architecture-defined syscall argument slots (e.g., the 5th and 6th arguments of openat2(2)). These cookies are generated at startup via the OS random number generator using getrandom(2), and are checked in the BPF filter so that only calls bearing the correct 32- or 64-bit values will be allowed. By requiring this unpredictable token, Syd raises the bar against arbitrary or forged syscalls: Attackers must first discover or leak the randomized cookies despite Address Space Layout Randomization (ASLR) before mounting a successful path or network operation. This approach effectively transforms unused syscall parameters into an application-level authorization mechanism, preventing trivial reuse of legitimate code paths and mitigating time-of-check-to-time-of-use (TOCTTOU) and ROP payloads that rely on guessing or omitting optional arguments. In combination with absolute path enforcement and the denial of relative descriptors (e.g. AT_FDCWD), syscall argument cookies form a lightweight, zero-cost integrity check that elevates syscall hardening without kernel modifications or performance penalties. As an example, here is how the filters look in pseudo filter code for the system calls openat2(2) and socket(2) on x86-64. openat2(2) uses two unused arguments as cookies and socket(2) uses three. In addition, openat2(2) denies negative file descriptor arguments such as AT_FDCWD:

# filter for syscall "openat2" (437) [priority: 65528]
if ($syscall == 437)
	if ($a0.hi32 > 0)
	else
		if ($a0.hi32 == 0)
			if ($a0.lo32 > 2147483647)
			else
				if ($a4.hi32 == 2047080271)
					if ($a4.lo32 == 419766579)
						if ($a5.hi32 == 2863373132)
							if ($a5.lo32 == 396738706)
								action ALLOW;
		else
			if ($a4.hi32 == 2047080271)
				if ($a4.lo32 == 419766579)
					if ($a5.hi32 == 2863373132)
						if ($a5.lo32 == 396738706)
							action ALLOW;
# filter for syscall "socket" (41) [priority: 65529]
if ($syscall == 41)
	if ($a3.hi32 == 3378530982)
		if ($a3.lo32 == 4160747949)
			if ($a4.hi32 == 2899982880)
				if ($a4.lo32 == 990920938)
					if ($a5.hi32 == 3611760485)
						if ($a5.lo32 == 1163305215)
							action ALLOW;

Another example is how the critical seccomp(2) notify ioctl(2) requests SECCOMP_IOCTL_NOTIF_SEND and SECCOMP_IOCTL_NOTIF_ADDFD are confined for the Syd emulator threads. SECCOMP_IOCTL_NOTIF_SEND is critical because it allows pass-through of system calls to the host Linux kernel with the SECCOMP_USER_NOTIF_FLAG_CONTINUE flag in the seccomp(2) response data structure. This flag must be used with utmost care and in the hands of an attacker it can be a tool for further exploitation. SECCOMP_IOCTL_NOTIF_ADDFD is critical because it allows file descriptor transfer between the Syd process and the sandbox process and in the hands of an attacker it can be a tool for file descriptor stealing. As part of this mitigation three syscall cookies are enforced for ioctl(2) system calls with the SECCOMP_IOCTL_NOTIF_SEND and SECCOMP_IOCTL_NOTIF_ADDFD requests. Coupled with the startup randomization of the seccomp(2) notify file descriptor, this mitigation raises the bar for an attacker trying to call arbitrary or forged syscalls within a compromised Syd emulator thread. Excerpt from the seccomp filter in pseudo filter code is given below:

# Syd monitor rules with seccomp fd 626
#
# pseudo filter code start
#
# filter for arch x86_64 (3221225534)
...
# filter for syscall "ioctl" (16) [priority: 65497]
if ($syscall == 16)
	if ($a0.hi32 == 0)
		if ($a0.lo32 == 626)
			if ($a1.hi32 == 4294967295)
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_RECV)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SEND)
					if ($a3.hi32 == 4195042482)
						if ($a3.lo32 == 329284685)
							if ($a4.hi32 == 3163914537)
								if ($a4.lo32 == 2000745976)
									if ($a5.hi32 == 3932715328)
										if ($a5.lo32 == 2409429749)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ADDFD)
					if ($a3.hi32 == 2387882717)
						if ($a3.lo32 == 529632567)
							if ($a4.hi32 == 2017338540)
								if ($a4.lo32 == 3732042218)
									if ($a5.hi32 == 4202049614)
										if ($a5.lo32 == 546113052)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SET_FLAGS)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ID_VALID)
					action ALLOW;
			if ($a1.hi32 == 0)
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_RECV)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SEND)
					if ($a3.hi32 == 4195042482)
						if ($a3.lo32 == 329284685)
							if ($a4.hi32 == 3163914537)
								if ($a4.lo32 == 2000745976)
									if ($a5.hi32 == 3932715328)
										if ($a5.lo32 == 2409429749)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ADDFD)
					if ($a3.hi32 == 2387882717)
						if ($a3.lo32 == 529632567)
							if ($a4.hi32 == 2017338540)
								if ($a4.lo32 == 3732042218)
									if ($a5.hi32 == 4202049614)
										if ($a5.lo32 == 546113052)
											action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_SET_FLAGS)
					action ALLOW;
				if ($a1.lo32 == SECCOMP_IOCTL_NOTIF_ID_VALID)
					action ALLOW;
...
	# default action
	action KILL_PROCESS;
# invalid architecture action
action KILL_PROCESS;

List of system calls protected by cookies is given below. The list may be further extended in the future to cover more system calls used by Syd:

ioctl(2)
- PROCMAP_QUERY
- SECCOMP_IOCTL_NOTIF_SEND
- SECCOMP_IOCTL_NOTIF_ADDFD
linkat(2), renameat2(2), unlinkat(2)
memfd_create(2)
openat2(2)
pipe2(2)
socket(2), bind(2), connect(2), accept4(2) (64-bit only)
truncate(2), truncate64(2), ftruncate(2)
uname(2)
fchdir(2), umask(2)

As of version 3.36.0, this mitigation may be disabled at startup using the trace/allow_unsafe_nocookie:1 option.

Shared Memory Hardening

As of version 3.48.0, Syd denies access to sysvipc(7) and mq_overview(7) system calls by default to enforce a strict shared-nothing architecture. This hardening eliminates an entire class of inter-process communication (IPC) vulnerabilities, including "memory squatting" attacks where malicious actors preemptively allocate shared memory keys to hijack or disrupt legitimate applications, as detailed in the research by Portcullis. By blocking the creation and usage of System V shared memory, semaphores, message queues, and POSIX message queues, Syd closes complex kernel attack surfaces that have historically harbored privilege escalation and information leakage bugs. This strict isolation aligns with modern container security best practices, ensuring that sandboxed processes cannot interfere with the host or other containers via shared global namespaces. If legacy application compatibility is required, these subsystems can be selectively re-enabled using the trace/allow_unsafe_shm:1 and trace/allow_unsafe_msgqueue:1 options, partially exposing the sandbox to the aforementioned risks. Refer to the following links for more information:

https://man7.org/linux/man-pages/man7/sysvipc.7.html
https://man7.org/linux/man-pages/man7/mq_overview.7.html
https://labs.portcullis.co.uk/whitepapers/memory-squatting-attacks-on-system-v-shared-memory/
https://labs.portcullis.co.uk/presentations/i-miss-lsd/
https://www.cve.org/CVERecord?id=CVE-2013-0254

Shared Memory Permissions Hardening

As of version 3.37.0, Syd introduces a kernel-enforced mitigation against System V shared memory squatting by conditioning allow rules on strict permission masks. By inspecting the mode bits passed to shmget(2), msgget(2), semget(2) and mq_open(2) system calls, the sandbox admits creates only when user-, group-, and other-permission fields exclude unsafe write or execute flags (i.e., no bits set in mask 0o177). This measure prevents untrusted processes from elevating permissions after creation or exploiting legacy IPC segments with permissive ACLs, which could lead to disclosure or corruption of shared pages. Based on the attack taxonomy described in Memory Squatting: Attacks on System V Shared Memory (Portcullis, 2013), mode checks take place within the seccomp(2) BPF filter before any mapping. The IPC_SET operations of the shmctl(2), msgctl(2), and semctl(2) system calls are also denied, preventing permission changes after creation. Additionally, any attempt to attach a shared memory segment with the SHM_EXEC flag via shmat(2) is denied to enforce W^X policies, blocking executable mappings through shared memory. The seccomp(2) filter also blocks the MSG_STAT_ANY, SEM_STAT_ANY, and SHM_STAT_ANY operations (Linux 4.17+), which would otherwise return segment metadata without verifying its mode, mitigating unintended information leaks. This mitigation is applied in the parent seccomp(2) filter, ensuring that the Syd process itself is subject to these restrictions. Administrators may relax this policy at startup using the trace/allow_unsafe_perm_msgqueue:1 and trace/allow_unsafe_perm_shm:1 options, but doing so reintroduces the classic squatting vulnerabilities documented in CVE-2013-0254 and related research. For more information refer to the following links:

https://labs.portcullis.co.uk/whitepapers/memory-squatting-attacks-on-system-v-shared-memory/
https://labs.portcullis.co.uk/presentations/i-miss-lsd/
https://www.cve.org/CVERecord?id=CVE-2013-0254

Mitigation Against Heap Spraying

As of version 3.23.18, Syd introduces a critical security enhancement to mitigate kernel heap-spraying attacks by restricting the msgsnd(2) system call. This call, integral to System V message queues, is essential for inter-process communication (IPC) in Unix-like operating systems. System V message queues allow processes to send and receive messages asynchronously, facilitating robust communication between processes. However, it is also frequently exploited for heap spraying, a technique that increases the predictability of memory allocations to facilitate arbitrary code execution. Notably, exploits such as CVE-2016-6187, CVE-2021-22555, and CVE-2021-26708 have leveraged this system call for kernel heap-spraying to achieve privilege escalation and kernel code execution. Heap spraying aims to introduce a high degree of predictability to heap allocations, facilitating arbitrary code execution by placing specific byte sequences at predictable memory locations. This method is particularly dangerous because it increases the reliability of exploiting vulnerabilities by aligning memory in a way that malicious code execution becomes feasible. To counter this, Syd now disables the msgsnd(2) system call by default, which is commonly used for heap spraying due to its ability to allocate large, contiguous blocks of memory in the kernel heap. This preemptive measure significantly reduces the attack surface, preventing attackers from leveraging this system call to bypass security mitigations and achieve kernel code execution. Administrators can re-enable this call using the trace/allow_unsafe_shm:1 option if required for legitimate inter-process communication needs, ensuring that the default configuration prioritizes security against such advanced exploitation techniques. Refer to the following links for more information:

https://en.wikipedia.org/wiki/Heap_spraying
https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game
https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit
https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html
https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html

Denying Restartable Sequences

As of version 3.37.0, Syd denies access to the restartable sequences with the rseq(2) system call by default, substantially elevating the security baseline of the sandbox. The restartable sequences interface enables user space to register per-thread critical regions with kernel-enforced atomicity guarantees, but critically, also exposes a user-controlled abort handler address. In adversarial scenarios, this facility can be abused: attackers with the ability to manipulate process memory or rseq(2) registration can redirect execution to arbitrary, attacker-chosen code locations on preemption or CPU migration, bypassing intra-process isolation boundaries and subverting mechanisms such as memory protection keys or control-flow integrity. By prohibiting rseq(2), Syd eliminates this kernel-facilitated control-flow transfer primitive, foreclosing a sophisticated class of attacks that leverage restartable sequence state for privilege escalation, sandbox escape, or bypass of compartmentalization. This mitigation exemplifies a least-privilege syscall surface and strong adherence to modern threat models, allowing only strictly necessary system calls and neutralizing emergent attack vectors rooted in nuanced kernel-user collaboration. Administrators may explicitly re-enable this system call if required for compatibility using the trace/allow_unsafe_rseq:1 startup option, with the understanding that doing so weakens this critical security boundary. For more information, refer to the following links:

https://arxiv.org/abs/2108.03705
https://arxiv.org/abs/2406.07429
https://www.usenix.org/system/files/usenixsecurity24-yang-fangfei.pdf

Personality Syscall Restrictions

As of version 3.37.0, Syd implements comprehensive restrictions on the personality(2) system call to mitigate security vulnerabilities associated with unsafe personality(2) flags, particularly the ADDR_NO_RANDOMIZE flag which can disable Address Space Layout Randomization (ASLR) -- a fundamental memory protection mechanism that prevents reliable exploitation of memory corruption vulnerabilities by randomizing memory layout or the READ_IMPLIES_EXEC flag which can bypass memory protections provided by Memory-Deny-Write-Execute, aka W^X. This security enhancement aligns Syd with industry-standard container runtimes including Docker and Podman, which employ identical restrictions to balance security with application compatibility by maintaining an allowlist of safe personality values: PER_LINUX for standard Linux execution domain, PER_LINUX32 for 32-bit compatibility, UNAME26 for legacy kernel version reporting, PER_LINUX32|UNAME26 for combined 32-bit and legacy compatibility, and GET_PERSONALITY for querying current personality(2) without modification. The implementation follows the principle of least privilege by denying all potentially dangerous personality(2) modifications while permitting only essential compatibility requirements, thereby preventing malicious actors from leveraging personality(2) flags to make exploits more predictable and reliable -- a behavior specifically monitored by security detection systems. Administrators requiring unrestricted personality system call access can disable these restrictions using trace/allow_unsafe_personality:1, though this should be undertaken with careful consideration of the security implications as it potentially exposes the sandbox to personality-based security bypasses that could compromise the isolation guarantees provided by Syd's broader security hardening strategy encompassing comprehensive system call filtering, capability restrictions, and resource access controls.

As of version 3.47.0, Syd extends these protections by adding ADDR_COMPAT_LAYOUT -- which forces a legacy, more predictable memory layout -- and MMAP_PAGE_ZERO -- which allows mapping page 0 and can turn NULL-pointer dereferences into code execution -- to the personality(2) "kill list", so that any attempt within the sandbox to enable READ_IMPLIES_EXEC, ADDR_NO_RANDOMIZE, ADDR_COMPAT_LAYOUT, or MMAP_PAGE_ZERO results in immediate termination of the offending process. During sandbox setup, Syd also proactively clears all four of these flags from the inherited personality(2) so that untrusted workloads always start with ASLR-friendly layouts and without the ability to rely on legacy low-entropy address layouts or exploit NULL-pointer mappings.

Thread-Level Filesystem and File-Descriptor Namespace Isolation

As of version 3.37.2, Syd's interrupt, IPC and emulator worker threads are each placed into their own filesystem and file-descriptor namespace by unshare(2)'ing both CLONE_FS and CLONE_FILES. This per-thread isolation ensures that working directory, umask(2) and open-file table changes in one thread cannot leak into -- or be influenced by -- any other, closing subtle attack vectors such as TOCTOU races on shared procfs(5) or fd entries, descriptor reuse across threads, and cwd-based side channels. By scoping thread-local filesystem state and descriptor tables, this enhancement hardens Syd's sandbox manager against advanced multithreading exploits and preserves strict separation between the monitoring and emulation components.

Denying MSG_OOB Flag in send/recv System Calls

As of version 3.37.5, Syd unconditionally denies the use of the MSG_OOB flag in all send(2), sendto(2), sendmsg(2), and sendmmsg(2) calls -- regardless of socket family -- by returning the EOPNOTSUPP ("Operation not supported on transport endpoint") errno(3). As of version 3.41.1, the restriction includes the system calls recv(2), recvfrom(2), recvmsg(2), and recvmmsg(2). This measure addresses long-standing security concerns with out-of-band messaging semantics in stream sockets, where urgent data bypasses normal in-order delivery rules and is handled via separate kernel paths. Such semantics are rarely required by modern software but introduce complexity and subtle state transitions inside the kernel's networking stack, which have historically led to memory safety bugs and race conditions exploitable from unprivileged code. By default, removing MSG_OOB support reduces the kernel attack surface for sandboxed processes without impacting typical application behavior. For controlled environments where MSG_OOB is explicitly required, Syd provides the opt-in trace/allow_unsafe_oob:1 flag to restore legacy behavior, though enabling it reintroduces the inherent risks associated with out-of-band data handling. This mitigation is enabled by default on all architectures without the socketcall(2) multiplexer which are aarch64, arm, loongarch64, mips64, mipsel64, parisc, parisc64, riscv64, x32, and x86_64. It is not supported on architectures x86, m68k, mips, mipsel, ppc, ppc64, ppc64le, s390, s390x, sheb, and sh. For more information refer to the following links:

https://googleprojectzero.blogspot.com/2025/08/from-chrome-renderer-code-exec-to-kernel.html
https://chromium-review.googlesource.com/c/chromium/src/+/6711812
https://u1f383.github.io/linux/2025/10/03/analyze-linux-kernel-1-day-0aeb54ac.html

Denying O_NOTIFICATION_PIPE Flag in pipe2

As of version 3.37.5, Syd unconditionally denies the use of the O_NOTIFICATION_PIPE flag in pipe2(2) by returning the ENOPKG ("Package not installed") errno(3), unless the trace/allow_unsafe_pipe:1 option is provided at startup. This restriction addresses the security risks associated with notification pipes -- a specialized and seldom-used mechanism designed for delivering kernel event notifications (currently only from the keys subsystem) to userspace when the kernel is built with CONFIG_WATCH_QUEUE. Unlike normal pipes, notification pipes operate with distinct semantics and are tightly integrated with kernel internals, creating a more complex and less widely audited code path. Historically, vulnerabilities in notification pipe handling have demonstrated that exposing this functionality to unprivileged, sandboxed code can create exploitable kernel attack surface. Because typical sandboxed applications, including high-risk workloads such as browser renderers, have no legitimate need for notification pipes, Syd disables this flag by default, thereby eliminating an entire class of low-value yet high-risk kernel interfaces. The trace/allow_unsafe_pipe:1 flag can be used to re-enable this capability for controlled testing or compatibility purposes, but doing so reintroduces the underlying security concerns. Refer to the following links for more information:

https://chromium-review.googlesource.com/c/chromium/src/+/4128252
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=watch_queue

madvise(2) Hardening

As of version 3.41.3, Syd tightens its seccomp(2) BPF policy by argument-filtering madvise(2) to an allow-list that is safe for untrusted workloads and has well-understood locality: MADV_SEQUENTIAL, MADV_DONTNEED, MADV_REMOVE, MADV_HUGEPAGE, MADV_NOHUGEPAGE, MADV_DONTDUMP, MADV_COLLAPSE, MADV_POPULATE_READ, MADV_POPULATE_WRITE, and (since Linux 6.13) the lightweight guard operations MADV_GUARD_INSTALL/MADV_GUARD_REMOVE (page-table-level red zones that fault on access without VMA churn). The advice MADV_HWPOISON is denied and all other advice are treated as no-op because they enable cross-domain information leaks or system-wide pressure channels with no isolation benefit, e.g., MADV_MERGEABLE drives KSM deduplication which has been repeatedly shown to enable cross-VM/process side channels and targeted bit-flip exploitation (Flip Feng Shui) as well as newer remote and timing channels. MADV_WILLNEED/MADV_RANDOM manipulate page-cache residency and prefetch behavior that underpin page-cache side-channel attacks; and reclaim steering like MADV_FREE/MADV_COLD/MADV_PAGEOUT introduces externally observable memory-pressure/timing signals and accounting ambiguity that sandboxes should not expose; privileged page state changes MADV_SOFT_OFFLINE/MADV_HWPOISON are unnecessary in least-authority contexts and remain outside the sandbox contract even if capability checks would reject them. This design follows the strict syscall-and-argument allow-listing discipline also employed by Google's Sandbox2/Sandboxed-API while remaining specific to Syd's threat model. To temporarily relax this mitigation for tracing/compatibility, set trace/allow_unsafe_madvise:1 at startup, otherwise unsafe advice remain blocked by default. Refer to the following links for more information:

https://www.usenix.org/system/files/conference/usenixsecurity16/sec16_paper_razavi.pdf
https://www.ndss-symposium.org/wp-content/uploads/2022-81-paper.pdf
https://svs.informatik.uni-hamburg.de/publications/2024/Lindemann_ACSAC2024_FakeDD.pdf
https://arxiv.org/pdf/1901.01161
https://lwn.net/Articles/790123/
https://lwn.net/Articles/1011366/
https://developers.google.com/code-sandboxing/sandbox2/explained
https://developers.google.com/code-sandboxing/sandboxed-api/explained

setsockopt(2) Hardening

As of version 3.46.1, Syd introduces a fine-grained setsockopt(2) hardening layer that denies a curated set of historically fragile or highly privileged socket(2) options by matching on the (level, optname) pair in a dedicated seccomp(2) filter, covering netfilter rule programming (iptables, ip6tables, arptables, ebtables), multicast routing control, IPv4/IPv6 multicast group management, IPv6 header manipulation, TCP repair and upper-layer protocol hooks, congestion control selection, UDP corking, AF_PACKET ring/fanout configuration, BPF-based socket filters, and VSOCK buffer sizing. Syd converts these dangerous combinations into success-returning no-ops emulating a successful setsockopt(2) while silently discarding the request, which preserves compatibility with applications that merely probe for these features but never rely on their semantics, and at the same time removes a substantial kernel attack surface reachable from unprivileged code. This mitigation is enabled by default on all architectures without the socketcall(2) multiplexer which are aarch64, arm, loongarch64, mips64, mipsel64, parisc, parisc64, riscv64, x32, and x86_64. It is not supported on architectures x86, m68k, mips, mipsel, ppc, ppc64, ppc64le, s390, s390x, sheb, and sh. The mitigation may be relaxed at startup using the option trace/allow_unsafe_setsockopt:1. Refer to the following links for more information:

https://nvd.nist.gov/vuln/detail/CVE-2016-9793
https://www.cve.org/CVERecord?id=CVE-2016-9793
https://security-tracker.debian.org/tracker/CVE-2016-9793
https://ubuntu.com/security/CVE-2016-9793
https://www.exploit-db.com/exploits/41995
https://nvd.nist.gov/vuln/detail/CVE-2017-6346
https://www.cve.org/CVERecord?id=CVE-2017-6346
https://security-tracker.debian.org/tracker/CVE-2017-6346
https://ubuntu.com/security/CVE-2017-6346
https://www.cvedetails.com/cve/CVE-2017-6346/
https://nvd.nist.gov/vuln/detail/CVE-2018-18559
https://www.cve.org/CVERecord?id=CVE-2018-18559
https://security-tracker.debian.org/tracker/CVE-2018-18559
https://ubuntu.com/security/CVE-2018-18559
https://www.cvedetails.com/cve/CVE-2018-18559/
https://nvd.nist.gov/vuln/detail/CVE-2020-14386
https://www.openwall.com/lists/oss-security/2020/09/03/3
https://unit42.paloaltonetworks.com/cve-2020-14386/
https://sysdig.com/blog/cve-2020-14386-falco
https://gvisor.dev/blog/2020/09/18/containing-a-real-vulnerability/
https://www.cve.org/CVERecord?id=CVE-2007-1353
https://nvd.nist.gov/vuln/detail/CVE-2007-1353
https://security-tracker.debian.org/tracker/CVE-2007-1353
https://ubuntu.com/security/CVE-2007-1353
https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2007-1353
https://ssd-disclosure.com/ssd-advisory-linux-kernel-af_packet-use-after-free-2/

Hardening against kernel pointer misuse

As of version 3.48.0, Syd hardens against kernel pointer misuse by default. This mitigation deploys a seccomp(2) BPF filter to inspect system call arguments known to accept pointers. If a user-supplied argument is detected to point into kernel memory, the seccomp(2) filter returns EFAULT ("Bad address") without passing it on to the host kernel. This defense-in-depth measure effectively neutralizes a class of critical vulnerabilities where the kernel fails to validate that a user-supplied pointer resides in user-space memory (e.g. missing access_ok() checks), typically leading to arbitrary kernel memory corruption. A seminal example of such a vulnerability is CVE-2017-5123, where the waitid(2) system call failed to validate the infop argument, allowing unprivileged users to trigger arbitrary kernel writes. To disable this mitigation, set the configuration option trace/allow_unsafe_kptr:1 at startup. Refer to the following links for more information:

https://lwn.net/Articles/736348/
https://www.cvedetails.com/cve/CVE-2017-5123/
https://salls.github.io/Linux-Kernel-CVE-2017-5123/
https://github.com/salls/kernel-exploits/blob/master/CVE-2017-5123/exploit_smap_bypass.c
https://www.cvedetails.com/cve/CVE-2018-1000199
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f67b15037a7a
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=27747f8bc355

Hardening executable mappings

As of version 3.48.0, Syd performs self-hardening by enforcing immutability and Execute-Only Memory (XOM) protections on its own executable mappings during initialization. This mitigation, inspired by the OpenBSD mimmutable(2) system call introduced by Theo de Raadt, aims to protect the sandbox monitor itself from compromise and code-reuse attacks like Return-Oriented Programming (ROP) by iterating over its executable Virtual Memory Areas (VMAs) and applying mprotect(2) to limit permissions to PROT_EXEC (blocking PROT_READ) and mseal(2) to render them immutable. These operations prevent attackers from scanning the text segment for gadgets or remapping memory to bypass W^X (Write XOR Execute) policies. Note that this hardening is applied on a best-effort basis; specifically, mseal(2) is only available on 64-bit Linux kernels (version 6.10+), and mprotect(2) XOM support depends on the underlying architecture and kernel configuration. The hardening may be disabled at startup using the option trace/allow_unsafe_noxom:1. Refer to the following links for more information:

https://lwn.net/Articles/779478/
https://lwn.net/Articles/948129/
https://lwn.net/Articles/958438/
https://lwn.net/Articles/978010/
https://lwn.net/Articles/1006375/
https://man.openbsd.org/mimmutable.2
https://www.openbsd.org/papers/csw2023.pdf

Stack Pivot Detection

As of version 3.48.0, Syd introduces a critical exploitation mitigation that detects and blocks "stack pivot" attacks during process execution via execve(2) or execveat(2). Stack pivoting is a primitive often used in Return-Oriented Programming (ROP) where the attacker modifies the stack pointer (SP) to point to a controlled memory region (e.g., heap or BSS) to facilitate the execution of ROP chains. Inspired by the MAP_STACK protection in OpenBSD, Syd enforces stack integrity by verifying that the stack pointer at the time of execution entry resides within the legitimate [stack] Virtual Memory Area (VMA). If the stack pointer is detected to be outside the designated stack region, the process is immediately terminated with the SIGKILL signal, thereby neutralizing the attack before it can execute any malicious code. This validational check acts as a robust safeguard against ROP and Jump-Oriented Programming (JOP) exploits that rely on hijacking the execution flow by pivoting the stack. This security feature is enabled by default and can be disabled if necessary using the trace/allow_unsafe_pivot_stack:1 option, although doing so drastically reduces the resilience of the sandbox against memory corruption exploits. Refer to the following links for more information:

http://phrack.org/issues/58/4.html
https://dl.acm.org/doi/10.1145/1315245.1315313
https://man.openbsd.org/mmap.2
https://www.openbsd.org/papers/hackfest2015-pledge/mgp00001.html

HISTORY & DESIGN

sydbox-0 https://git.sr.ht/~alip/syd/tree/sydbox-0 is a ptrace(2) based sandbox.
sydbox-1 https://git.sr.ht/~alip/syd/tree/sydbox-1 is a ptrace(2) and seccomp(2) based sandbox.
sydbox-2 https://git.sr.ht/~alip/syd/tree/sydbox-1 is a seccomp(2) and seccomp-notify based sandbox.
sydbox-3 is a rewrite of sydbox-2 in Rust and it's what you are looking at.

This codebase has a history of a bit over 15 years and up to this point we have used C11 as our implementation language for various reasons. With sydbox-3 we are moving forwards one step and writing the sandbox from scratch using the Rust programming language with the only non-Rust dependency being libseccomp. Although we inherit many ideas and design decisions from the old codebase, we also don't shy away from radically changing the internal implementation making it much simpler, idiomatic, and less prone to bugs. We have proper multiarch support since release 3.0.11, e.g on x86-64, you can run your x32 or x86 binaries just fine under Syd.

This version takes advantage of multithreading and handles system calls using a thread pool whose size is equal to the number of CPUs on the running machine and utilises globsets to match a list of patterns at once, thus continues to perform reasonably well even with very long rulesets. This version also comes with four new sandboxing categories called Lock Sandboxing, Memory Sandboxing, PID sandboxing, Stat Sandboxing, Force Sandboxing: Lock Sandboxing utilises the Landlock Linux Security Module (LSM), Memory Sandboxing allows the user to define a per-process memory limit, PID sandboxing allows the user to define a limit on the maximum number of running tasks under the sandbox, Stat Sandboxing can be used to effectively hide files and directories from the sandboxed process whereas Force Sandboxing can be used to verify file checksums prior to exec, similar to HardenedBSD's Integriforce and NetBSD's Veriexec.

Finally, the new Syd has support for namespaces. Use e.g. syd -munshare/user:1 to create a user namespace. You may use mount, uts, ipc, pid, net, and cgroup instead of user to create various namespaces. You may use the container profile as a shorthand to create namespaces with syd -pcontainer.

You may use Syd as your login shell because it is very practical to have a restricted user. To do this simply add /path/to/syd to the file /etc/shells and do chsh -s /path/to/syd username as root. In this mode the sandbox may be configured using the files /etc/user.syd-3 and ~/.user.syd-3. If you want to restrict user configuration of the sandbox, lock the sandbox using lock:on at the end of the site-wide configuration file.

EXHERBO

Syd is the default sandbox of Exherbo Linux. We use it to provide a restricted environment under which package builds run with controlled access to file system and network resources. exheres-0 has a function called esandbox to interact with Syd.

AUTHORS

Maintained by Ali Polatel. Up-to-date sources can be found at https://gitlab.exherbo.org/sydbox/sydbox.git and bugs/patches can be submitted to https://gitlab.exherbo.org/groups/sydbox/-/issues. Discuss in #sydbox on Libera Chat or in #sydbox:mailstation.de on Matrix.