"Sandboxing" in FreeBSD #1
Introduction
In the FreeBSD 10.0 to the kernel there was introduced a new technology called 'Capsicum' by Robert Watson and Jonathan Anderson which is based on a "hybrid capability and UNIX access control mode" approach. This is quite fast and reliable software used to apply additional constraints on the userland software. However, this approach based on the complex i.e non really flexible method of "sandboxing". It requires to introduce Capsicum to the code in order to use Capsicum functionality. Also the Casper is a userland daemon of the Capsicum system which provides various services to sandboxed processes. For example, 'Casper DNS' which is invoked when sandboxed application is not allowed (when process is in capability mode) to access the DNS services directly. But what, if we want to apply additional constraints to the specific program without modifying the code and specify exactly which resources can be accessed on the system, despite the fact with which user credentials the program was executed i.e root, operator. (Limiting a program globally ignoring the under which credentials it is running)
I think, I will write another article on how the Capsicum enhances the security of host system and makes easier to separate the privileges in the program where in some cases allows to avoid complicated privilege separation coding using fork(2) and setgid(2)/setuid(2). For now, this article is about alternative methods of applying additional constrains for the program which I would like to offer.
On what I am currently working on is an approach, where the sandboxing of the userland program will be performed using a provided features by the kernel via the kernel module based on the MAC Framework in FreeBSD kernel and a scheme which describes to which resources the program can have access or to which it can not have access. This includes: IPC, FS, Network, system actions, sysctls modification and other... This approach is not new. In order to "sandbox" userland programs from each other and from critical system resources that should not be accessed, Apple's iOS and OSX uses sandbox(7) (aka seatbelts).
The sandbox isolates the application according to the supplied scheme. The scheme is structured using 'tinyscheme' language. The sandbox provides security procedures with or without filtering keys which isolates the program from specific place on FS or from OS functionality. In order isolate the required program, it should be loaded via the special utility-loader from userland, which helps to load the compiled scheme ('binary scheme') to the sandbox kernel.
This method does not require to modify the code of the sandboxed application, but does not offer such functionality like Capsicum provides. It also has two drawbacks: a performance decrease and complexity of the composing of the schemes. The performance will be affected depending how complex is the scheme was created. The other problem with composition of the schemes is obscurity as some of the programs are closed source and it is not possible to say to which resources the program normally and usually should have access (blackbox).
Implementation
After one week of brainstorming, an initial protype of the sandbox utility for FreeBSD 11 was created. The idea was to use mandatory automatic sandboxing, which also allowed to track the integrity of the programs, but the project become so complicated, that this approached was declined. In order to track integrity, it was required to keep an integrity data table securely somewhere and somehow load it before the /sbin/init runs. The second prototype is based on the former, but the way program is sandboxed was changed.
The latter utilizes the following model:
- a sandbox.ko is a kernel module which is a logic core of the sandbox which takes a decisions based on the loaded scheme.
- a sndboxif.ko is a virtual network interface which is used by the sandbox to send logs to the userland daemon.
The userland software consists from two programs:
- a sandbox-exec which is used to execute program in a sandbox, and also compiles the sandbox scheme.
- a sandboxd is a daemon which is receiving logs and storing it on the disk in binary format. In future, the functionality of the sandboxd will be expanded.
Both former and latter models uses 'tinyscheme' as a scripting language to create a sandbox schemes. The scheme is 'compiled' and 'linked' to a single file. In other words, it is converted to the binary blob which has a strict structure and with optimizations to avoid data duplication. The scheme consists from 4 regions/sections: .export, .data, .nodes, .extra. The scheme logic is formed from the node-to-node logic where the node has its type which defines its behavior. The compiled scheme is passed to the kernel module through the syscall with the assigned identification number. It is stored in the cache. When the program is executed a labelling process assignes the scheme to the program. A labeling - is a MAC Framework technology which assigns 'label' to the asset. In our case it is performed during mac_execv which assigns label to the process itself fetched from the cache.
A sandbox schemes are composed using tinyscheme, as it was written several times before. Tinysheme is "is a free software implementation of the Scheme programming language with a lightweight Scheme interpreter of a subset of the R5RS standard" (wiki). There are two actions available: allow or deny. There should be set a global action. If the global action states as allow, then the rest should define what is not allowed. If the global action states as deny, then the rest should define what is not allowed to be accessed. The action is applied to the security procedure. The security procedure defines which OS feature is controlled. The filters are different for each security procedure. Some security procedures does not have filters at all. When the filters are defined the filtering value is compared against filter values. Otherwise, if the values were not set, the security procedure will execute the action not depending from the input value. For example:
(deny default)
(allow ipc*) ; allow IPC globally (all sub sec. proc.)
(allow file* (literal “/etc/rc.conf”) ; literaly allow all file oper to rc.conf
)
In the example above, there are two security procedures icp* and file* which allows:
- to perform all types on inter process communication
- allows all file operations on the file which path is 'literally' specified
When some sandboxed program makes attempt, for instance, to read the meta of the file located at /etc/sysctl.conf it will be returned with the error code EACCES, because this file was not specified, in allowed. But if the script would be modified other way around
(deny default)
(allow ipc*) ; allow IPC globally (all sub sec. proc.)
(allow file* (literal “/etc/rc.conf”) ; literaly allow all file oper.s
)
(allow file-read-meta) ; allow to read meta anywhere on FS
the program will be allowed to read meta of any file on the disk.
The structure is described in the documentation which is available on the project page Sandbox for FreeBSD.
The good thing about this approach is that even if the program was executed and running under root user, it will not be allowed to perform operations which were not allowed. But, this is not a UNIX way of doing an access control and privilege separation.
At the moment, the list of the supported security procedures was extended and some functionality was added. The project got a new realisation of the regex library small-regex, so it has got rid from the external dependancy - library tiny-regex-c. Also, in case if the program is using fork(2) to change the uid/gid of the child, the child process will inherit the scheme, because the label will be copied from the parent. At the moment, this is also not supported.