. This is the file where the
system call,
appropriate sysctls, and networking functions are defined.
In
kern_jail.c, the following sysctls are defined:
/usr/src/sys/kern/kern_jail.c:
int jail_set_hostname_allowed = 1;
SYSCTL_PROC(_security_jail, OID_AUTO, set_hostname_allowed, CTLFLAG_RW,
&jail_set_hostname_allowed, 0,
"Processes in jail can set their hostnames");
int jail_socket_unixiproute_only = 1;
SYSCTL_PROC(_security_jail, OID_AUTO, socket_unixiproute_only, CTLFLAG_RW,
&jail_socket_unixiproute_only, 0,
"Processes in jail are limited to creating UNIX/IPv4/route sockets only");
int jail_sysvipc_allowed = 0;
SYSCTL_PROC(_security_jail, OID_AUTO, sysvipc_allowed, CTLFLAG_RW,
&jail_sysvipc_allowed, 0,
"Processes in jail can use System V IPC primitives");
static int jail_enforce_statfs = 2;
SYSCTL_PROC(_security_jail, OID_AUTO, enforce_statfs, CTLFLAG_RW,
&jail_enforce_statfs, 0,
"Processes in jail cannot see all mounted file systems");
int jail_allow_raw_sockets = 0;
SYSCTL_PROC(_security_jail, OID_AUTO, allow_raw_sockets, CTLFLAG_RW,
&jail_allow_raw_sockets, 0,
"Prison root can create raw sockets");
int jail_chflags_allowed = 0;
SYSCTL_PROC(_security_jail, OID_AUTO, chflags_allowed, CTLFLAG_RW,
&jail_chflags_allowed, 0,
"Processes in jail can alter system file flags");
int jail_mount_allowed = 0;
SYSCTL_PROC(_security_jail, OID_AUTO, mount_allowed, CTLFLAG_RW,
&jail_mount_allowed, 0,
"Processes in jail can mount/unmount jail-friendly file systems");
Each of these sysctls can be accessed by the user through the
sysctl(8) program.
Throughout the kernel, these specific sysctls are recognized by their name. For example,
the name of the first sysctl is
security.jail.set_hostname_allowed.
Like all system calls, the
jail(2) system call
takes two arguments,
struct thread *td and
struct jail_args *uap.
td is a pointer to
the
thread structure which describes the calling thread. In this
context,
uap is a pointer to the structure in which a pointer to
the
jail structure passed by the userland
jail.c is contained. When I described the userland program before,
you saw that the
jail(2) system call
was given a
jail structure as its own argument.
/usr/src/sys/kern/kern_jail.c:
/*
* struct jail_args {
* struct jail *jail;
* };
*/
int
jail(struct thread *td, struct jail_args *uap)
Therefore,
uap->jail can be used to access the
jail structure which was passed to the system call. Next, the system
call copies the
jail structure into kernel space using the
copyin(9) function.
copyin(9) takes three
arguments: the address of the data which is to be copied into kernel space,
uap->jail, where to store it,
j and the
size of the storage. The
jail structure pointed by
uap->jail is copied into kernel space and is stored in another
jail structure,
j.
/usr/src/sys/kern/kern_jail.c:
error = copyin(uap->jail, &j, sizeof(j));
There is another important structure defined in
jail.h. It
is the
prison structure. The
prison
structure is used exclusively within kernel space. Here is the definition of the
prison structure.
/usr/include/sys/jail.h:
struct prison {
LIST_ENTRY(prison) pr_list; /* (a) all prisons */
int pr_id; /* (c) prison id */
int pr_ref; /* (p) refcount */
char pr_path[MAXPATHLEN]; /* (c) chroot path */
struct vnode *pr_root; /* (c) vnode to rdir */
char pr_host[MAXHOSTNAMELEN]; /* (p) jail hostname */
u_int32_t pr_ip; /* (c) ip addr host */
void *pr_linux; /* (p) linux abi */
int pr_securelevel; /* (p) securelevel */
struct task pr_task; /* (d) destroy task */
struct mtx pr_mtx;
void **pr_slots; /* (p) additional data */
};
The
jail(2) system call
then allocates memory for a
prison structure and copies data
between the
jail and
prison
structure.
/usr/src/sys/kern/kern_jail.c:
MALLOC(pr, struct prison *, sizeof(*pr), M_PRISON, M_WAITOK | M_ZERO);
...
error = copyinstr(j.path, &pr->pr_path, sizeof(pr->pr_path), 0);
if (error)
goto e_killmtx;
...
error = copyinstr(j.hostname, &pr->pr_host, sizeof(pr->pr_host), 0);
if (error)
goto e_dropvnref;
pr->pr_ip = j.ip_number;
Next, we will discuss another important system call
jail_attach(2), which
implements the function to put a process into the
jail.
/usr/src/sys/kern/kern_jail.c:
/*
* struct jail_attach_args {
* int jid;
* };
*/
int
jail_attach(struct thread *td, struct jail_attach_args *uap)
This system call makes the changes that can distinguish a jailed process from those
unjailed ones. To understand what
jail_attach(2) does
for us, certain background information is needed.
On FreeBSD, each kernel visible thread is identified by its
thread structure, while the processes are described by their
proc structures. You can find the definitions of the
thread and
proc structure in
/usr/include/sys/proc.h. For example, the
td argument in any system call is actually a pointer to the calling
thread's
thread structure, as stated before. The
td_proc member in the
thread structure
pointed by
td is a pointer to the
proc
structure which represents the process that contains the thread represented by
td. The
proc structure contains members
which can describe the owner's identity(
p_ucred), the process
resource limits(
p_limit), and so on. In the
ucred structure pointed by
p_ucred member
in the
proc structure, there is a pointer to the
prison structure(
cr_prison).
/usr/include/sys/proc.h:
struct thread {
...
struct proc *td_proc;
...
};
struct proc {
...
struct ucred *p_ucred;
...
};
/usr/include/sys/ucred.h
struct ucred {
...
struct prison *cr_prison;
...
};
In
kern_jail.c, the function
jail()
then calls function
jail_attach() with a given
jid. And
jail_attach() calls function
change_root() to change the root directory of the calling process.
The
jail_attach() then creates a new
ucred structure, and attaches the newly created
ucred structure to the calling process after it has successfully
attached the
prison structure to the
ucred structure. From then on, the calling process is recognized as
jailed. When the kernel routine
jailed() is called in the kernel
with the newly created
ucred structure as its argument, it
returns 1 to tell that the credential is connected with a
jail. The public ancestor process of all the process forked
within the
jail, is the process which runs
jail(8), as it calls
the
jail(2) system call.
When a program is executed through
execve(2), it inherits
the jailed property of its parent's
ucred structure, therefore
it has a jailed
ucred structure.
/usr/src/sys/kern/kern_jail.c
int
jail(struct thread *td, struct jail_args *uap)
{
...
struct jail_attach_args jaa;
...
error = jail_attach(td, &jaa);
if (error)
goto e_dropprref;
...
}
int
jail_attach(struct thread *td, struct jail_attach_args *uap)
{
struct proc *p;
struct ucred *newcred, *oldcred;
struct prison *pr;
...
p = td->td_proc;
...
pr = prison_find(uap->jid);
...
change_root(pr->pr_root, td);
...
newcred->cr_prison = pr;
p->p_ucred = newcred;
...
}
When a process is forked from its parent process, the
fork(2) system call
uses
crhold() to maintain the credential for the newly forked
process. It inherently keep the newly forked child's credential consistent with its
parent, so the child process is also jailed.
/usr/src/sys/kern/kern_fork.c:
p2->p_ucred = crhold(td->td_ucred);
...
td2->td_ucred = crhold(p2->p_ucred);