FreeBSD capsicum - practice
Modern OSes provides different mechanisms to isolate userland applications from each other. This is important because the CPU provides only limited protections which mostly protects from the basic improper access. The recent vulnerabilities in Intel and other vendors and architectures CPUs clearly showed that the security was not and will never be in the first place. The performance is in the first place, but it drops down because we have to apply more and more security layers, apply countermeasures i.e isolations and other algorithms to be sure that everything is limited to its role on the system.
The following mechanisms are available for the following OSes.
- Windows [Windows ACLs and SIDs]
- Linux [chroot, SELinux, seccomp, AppArmor]
- FreeBSD [jail, chroot, capsicum, sandbox(developed by me)]
- OpenBSD[chroot, pledge]
- Mac OS X [seatbelts-sandbox, chroot]
Linux has more programs for supporting access control security policies and other mechanisms. OpenBSD is secure by design i.e auditing and removing unnecessary code. Since OpenBSD 5.9 release, there was added a pledge(2) mechanism for putting a process into a “restricted-service operating mode”. Windows itself is not really secure, no comments. Mac OS X has a seatbelts-sandbox which is based on the path and used to isolate applications on iOS. FreeBSD has Jais. This is actually a OS-level virtualization and not necessary isolates running programs from each other or applies any policies. A sandbox, which is developed by me, is based on the MAC Framework and it is functioning like the Mac OS X seatbelts, but sandbox for FreeBSD is less functional. A Capsicum, which is supported by the FreeBSD since version 10 released. There are plenty of articles available about the mechanisms on how the Capsicum is functioning, so there is no need to describe it once again. In this article, a practical usage of the Capsicum sandbox will be demonstrated.
The main drawback of using Capsicum technologies, in my opinion, is a need to build it into your program. Also at the moment Capsicum in capability mode has one limitation for fexecv(). When you try to execute execv() from your program when it is already in capability mode, the following message will be displayed:
ELF interpreter /libexec/ld-elf.so.1 not found, error 94
This happens because in capability mode normal access to the filesystem is prevented. In the FreeBSD kernel sources it is said:
"While capability mode can't reach this point via direct path arguments to execve(), we also don't allow interpreters to be used in capability mode (for now). Catch indirect lookups and return a permissions error."
For now, the sandbox program can not execute any other programs. This should be done before entering capability mode.
Practice?
In order to properly sandbox you program, it is necessary to know to which resources a program should gain access. If your program is large, this will require some time to figure everything out. In the sys/sys/capsicum.h header file, can be found defined "Possible rights on capabilities".
At first, it is required to initialize a cap_rights_t structure, where the rights are defined.
struct cap_rights
{
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};
typedef struct cap_rights cap_rights_t;
The cap_rights_t structure is initialized with cap_rights_init(). The necessary capability rights are provided as a second argument to the function. For instance:
//initializing capsicum sandbox for program itself, but this is not necessary.
cap_rights_t self_rights;
cap_rights_init(&self_rights, CAP_FCNTL, CAP_FSTAT, CAP_IOCTL, CAP_READ);
// limit base capabilities
if (cap_rights_limit(0, &self_rights) < 0)
{
err(1, "cap_rights_limit() failed, could not restrict capabilities");
}
OR limiting the specific fd, but not the global namespace
//usr.bin/cmp/cmp.c
cap_rights_init(&rights, CAP_FCNTL, CAP_FSTAT, CAP_MMAP_R);
if (cap_rights_limit(fd1, &rights) < 0 && errno != ENOSYS)
err(ERR_EXIT, "unable to limit rights for %s", file1);
if (cap_rights_limit(fd2, &rights) < 0 && errno != ENOSYS)
err(ERR_EXIT, "unable to limit rights for %s", file2);
The ioctls and fcntls are limited separatly:
// allow selected ioctls
unsigned long cmds[] = { TIOCGETA, TIOCGWINSZ };
if (cap_ioctls_limit(0, cmds, nitems(cmds)) < 0)
{
err(1, "cap_ioctls_limit() failed, could not restrict capabilities");
}
// allow selected fcntls
if (cap_fcntls_limit(0, CAP_FCNTL_GETFL) < 0)
{
err(1, "cap_fcntls_limit() failed, could not restrict capabilities");
}
The returned value of the *_limit functions should be checked to verify these operations actually succeed like for the setgroups(2), setgid(2).
And finally, the mysterious functions which triggers the magic of isolation! As any "magic", it will operate as much precisely as you have casted it, lol.
printf("enering cap mode\n");
//entering cap mode
if (cap_enter() < 0)
{
//Failed to sandbox
//CAPABILITY_MODE enabled fatal
if (errno != ENOSYS)
{
err(1, "failed to enter security sandbox");
}
}
else
{
if (cap_sandboxed() == false)
{
err(1, "we are not in sandbox");
}
}
Ater calling calling cap_enter() the program becomes sandboxed. The cap_sandboxed() is used to confirm if program was sandboxed.
Example
In this example, we are going to sandbox program which is working with the sockets and network itself. The original code is available at https://www.nixd.org/en/freebsd/freebsd-capsicum-practice because I can not publish code there.
hat is going on in the listing above. The cap_rights_t socket_rights is initialized with the CAP_SOCK_SERVER which is a definition.
Each socket fd is limited to prepared cap_rights with the following line of code
cap_rights_limit(sockfd, sock_right)
The meaning of the "TEST FAILED" and "TEST SUCCEED": if bind fails, the message "TEST FAILED" will be printed. On success "TEST SUCCESS" will be printed.
After entering capability mode, we are repeating our attempt to bind socket again - sockfd4 which will fail.
Results:
I think, at the moment, this is all what I wanted to say about the capsicum.
Comparing with MAC sandbox
Unfortunately, the sandbox which I am currently developing is still unstable for the following reasons:
- vnode to path conversion falure because deadlock, namecache mismatch etc...
- it was decided to introduce some improvements to the schema
But, anyway thats how would the sandbox scheme looks like when sandboxing with MAC sandbox:
As in the capsicum model of sandboxing, there we also allows network-accept and network send/receive. Also allow to create socket of a specific type. Access to read only one sysctl OID from plenty available. And limit application privs.
Thank you for reading!