Hexley, the mascot for OSX/Darwin

Suppose we want to monitor memory and time usage of a child process. Frequently. On MacOS X. With an unprivileged user. I did not find anywhere a description of how to achieve this. This articles pulls together some of the needed information.

MacOS X's Mach kernel surely does not make it straight-forward, but the biggest showstopper is the lack of documentation on how the low-level operations are supposed to be used together. Apple's stance appears to be that they do not like these interfaces to be used. It was even suggested to parse the output of the ps utility which is not such a hot idea if the monitoring has to be done frequently enough. Adequate high-level interfaces are not provided either.

The Big Picture

Note: much of the following was learned by reading random (undocumented) source code and guess work. Please let me know about any inaccuracies.

Not very long ago, Tim Becker wrote an article on Determining memory usage in process on OSX. The function to use is task_info(), which takes a task port as parameter. We can obtain a task port for our own process via mach_task_self(). For arbitrary processes, the first function that comes to mind is task_for_pid(), which, given a pid_t, returns its task port. But, since we can do a lot of nasty things with task ports, it is a privileged operation, at least on newer versions of OS X (for Intel Macs).

So, we need a different way to get at the task port of a process. In Mach, we have to ask the other process nicely whether it grants us this right. For fork()ed processes, we can establish a communication channel between parent and child, on which the child sends its own task port back to the parent. The parent process is then free to use it in the call to task_info() to obtain information about its child.

Fork(1) It!

There is one small problem, though. In contrast to, e.g., Unix file descriptors, task ports are not inherited across a fork(). Hence the parent process must set up a special bootstrap port to get things off the ground. Since the parent is at the receiving end, the port must be created with MACH_PORT_RIGHT_RECEIVE.

Furthermore, the parent process must explicitly grant its child the right to send back messages via this port. This is done in function setup_recv_port() below.

static int
setup_recv_port (mach_port_t *recv_port)
{
    kern_return_t       err;
    mach_port_t         port = MACH_PORT_NULL;
    err = mach_port_allocate (mach_task_self (),
                              MACH_PORT_RIGHT_RECEIVE, &port);
    CHECK_MACH_ERROR (err, "mach_port_allocate failed:");

    err = mach_port_insert_right (mach_task_self (),
                                  port,
                                  port,
                                  MACH_MSG_TYPE_MAKE_SEND);
    CHECK_MACH_ERROR (err, "mach_port_insert_right failed:");

    *recv_port = port;
    return 0;
}

Subsequently, the newly created port is set as bootstrap port in the parent. It is the only port that is inherited by child processes.

static int
setup_recv_port (mach_port_t *recv_port)
{
    kern_return_t       err;
    mach_port_t         port = MACH_PORT_NULL;
    err = mach_port_allocate (mach_task_self (),
                              MACH_PORT_RIGHT_RECEIVE, &port);
    CHECK_MACH_ERROR (err, "mach_port_allocate failed:");

    err = mach_port_insert_right (mach_task_self (),
                                  port,
                                  port,
                                  MACH_MSG_TYPE_MAKE_SEND);
    CHECK_MACH_ERROR (err, "mach_port_insert_right failed:");

    *recv_port = port;
    return 0;
}

For the child process to send a message back to its parent process it first asks for the bootstrap port. Then we create a message which contains the task port we want to send back (the magic constants may or may not be completely correct, but they appear to work) and send it (function send_port()):

static int
send_port (mach_port_t remote_port, mach_port_t port)
{
    kern_return_t       err;

    struct {
        mach_msg_header_t          header;
        mach_msg_body_t            body;
        mach_msg_port_descriptor_t task_port;
    } msg;

    msg.header.msgh_remote_port = remote_port;
    msg.header.msgh_local_port = MACH_PORT_NULL;
    msg.header.msgh_bits = MACH_MSGH_BITS (MACH_MSG_TYPE_COPY_SEND, 0) |
        MACH_MSGH_BITS_COMPLEX;
    msg.header.msgh_size = sizeof msg;

    msg.body.msgh_descriptor_count = 1;
    msg.task_port.name = port;
    msg.task_port.disposition = MACH_MSG_TYPE_COPY_SEND;
    msg.task_port.type = MACH_MSG_PORT_DESCRIPTOR;

    err = mach_msg_send (&msg.header);
    CHECK_MACH_ERROR (err, "mach_msg_send failed:");

    return 0;
}

On the parent side, we prepare to receive the same message (with a some trailing information tacked on for whatever reason). And voilà, we have the child's task port. It can later be used as first parameter to a task_info() call.

static int
recv_port (mach_port_t recv_port, mach_port_t *port)
{
    kern_return_t       err;
    struct {
        mach_msg_header_t          header;
        mach_msg_body_t            body;
        mach_msg_port_descriptor_t task_port;
        mach_msg_trailer_t         trailer;
    } msg;

    err = mach_msg (&msg.header, MACH_RCV_MSG,
                    0, sizeof msg, recv_port,
                    MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL);
    CHECK_MACH_ERROR (err, "mach_msg failed:");

    *port = msg.task_port.name;
    return 0;
}

The parent process now has access to the child's task. Now we need to do some cleanup work. We need to restore the original bootstrap port both in parent and child process. This is so that both have access to the Mach bootstrap daemon again. This can be done by having the child process create another port with the above functions, and send it to the parent. Through this, the parent process then sends back its original bootstrap port to the child. (There is probably a simpler way to achieve this. I would gladly like to know.)

Here's the code:

static task_t       child_task = MACH_PORT_NULL;

pid_t
sampling_fork ()
{
    kern_return_t       err;
    mach_port_t         parent_recv_port = MACH_PORT_NULL;
    mach_port_t         child_recv_port = MACH_PORT_NULL;

    if (setup_recv_port (&parent_recv_port) != 0)
        return -1;
    err = task_set_bootstrap_port (mach_task_self (), parent_recv_port);
    CHECK_MACH_ERROR (err, "task_set_bootstrap_port failed:");

    pid_t               pid;
    switch (pid = fork ()) {
    case -1:
        err = mach_port_deallocate (mach_task_self(), parent_recv_port);
        CHECK_MACH_ERROR (err, "mach_port_deallocate failed:");
        return pid;
    case 0: /* child */
        err = task_get_bootstrap_port (mach_task_self (), &parent_recv_port);
        CHECK_MACH_ERROR (err, "task_get_bootstrap_port failed:");
        if (setup_recv_port (&child_recv_port) != 0)
            return -1;
        if (send_port (parent_recv_port, mach_task_self ()) != 0)
            return -1;
        if (send_port (parent_recv_port, child_recv_port) != 0)
            return -1;
        if (recv_port (child_recv_port, &bootstrap_port) != 0)
            return -1;
        err = task_set_bootstrap_port (mach_task_self (), bootstrap_port);
        CHECK_MACH_ERROR (err, "task_set_bootstrap_port failed:");
        break;
    default: /* parent */
        err = task_set_bootstrap_port (mach_task_self (), bootstrap_port);
        CHECK_MACH_ERROR (err, "task_set_bootstrap_port failed:");
        if (recv_port (parent_recv_port, &child_task) != 0)
            return -1;
        if (recv_port (parent_recv_port, &child_recv_port) != 0)
            return -1;
        if (send_port (child_recv_port, bootstrap_port) != 0)
            return -1;
        err = mach_port_deallocate (mach_task_self(), parent_recv_port);
        CHECK_MACH_ERROR (err, "mach_port_deallocate failed:");
        break;
    }

    return pid;
}

Documentation about the Mach IPC Interface is available. In principle, I like how port rights (capabilities) are handled in Mach, compared to the Unix way of sharing almost everything across a fork(). If only it would have been made easier to discover the details.

(On Linux, all this would have been done by reading the information we are interested in from the /proc or /sys file systems. A MacFUSE based procfs filesystem for OS X exists as well, but that is unlikely to be installed, and perhaps too cumbersome to set up for most users.)

Update 2010-01-11: Visibility Link to heading

A number of people have contacted me about the sampling_fork code example. It has found its way into Will Drewry's patient0 and the Chromium browser.