TEE Exploitation on Samsung Exynos devices (I/IV) : Introduction

Part 1 of a series of posts on exploiting Trusted Applications on the Samsung Galaxy S9 TEE.

Posted on May 27, 2019 | Author: Eloi Sanfelix

Over the next couple of posts we are going to explain our research into exploitation of Trusted Applications on the Kinibi Trusted Execution Environment (TEE) used by Samsung on their Exynos devices up to the Galaxy S9. These same results have been presented at zer0con and Infiltrate 2019, and you can find the corresponding slides here.

We'll have a total of four posts, which will hopefully provide a better understanding than the slides alone:

Introduction: this post introduces the concept of TEEs, how ARM TrustZone is used to support them, and how that translates to Samsung's phones.
Exploiting stack-based buffer overflows: we describe SVE-2018-12852 and walk through the exploitation process.
Double-fetch issues: we'll discuss two related issues (SVE-2018-12855 and SVE-2019-13910) and how we exploited them.
Kinibi heap allocator internals, and exploitation of heap memory corruption on Kinibi. We'll walk through the exploitation of SVE-2018-12853.

Note that all issues identified by SVE numbers above were found by us and disclosed to Samsung. We reported SVE-2018-12852, SVE-2018-12855 and SVE-2018-12853 in August 2018. Samsung published fixes for these issues in October/November 2018. Finally, we reported SVE-2019-13910 in January 2019 and Samsung published a fix in April 2019.

A lot has been written about TEEs and Samsung's implementation, so we'll provide only the background information required to understand our work and provide links to previous publications when necessary.

Trusted Execution Environments

Trusted Execution Environments were introduced out of a concern that the main OS used in consumer devices (the so-called Rich Execution Environment or REE) was not trustworthy enough for security-critical tasks. Therefore, a TEE exists to provide security-related services to the REE. Typical uses for TEEs include key management, DRM systems and system management functions.

Following from the above definition, the typical TEE threat model assumes that the REE is already compromised. The TEE needs to be protected from a malicious REE in terms of preventing access to its runtime environment, memory and key assets.

To this end, TEEs implement separation at at least the following levels:

Execution separation ensures that applications running on the REE cannot interfere with applications running on the TEE.
Memory separation ensures that applications running on the REE cannot access TEE-private data. You could say that this is a requirement for execution separation, as for example the TEE runtime stack should not be accessible to the REE. However, this goes beyond that to include memory for specific use-cases such as so-called "Secure Data Paths": e.g. ensuring that decrypted video content does not leak to the REE (especially in compressed form!).
I/O separation ensures that certain hardware peripherals (or bus slaves) can only be accessed by the TEE. This could be done either temporarily (e.g. to support a Trusted UI) or permanently (e.g. fingerprint sensors are only avialable to the Secure World).

The implementation of each of these boundaries is typically SoC-specific, though there are typically some common denominators. We will not extend further in this area since this is not the main purpose of this series. However, feel free to reach out if you want to discuss further.

ARMv8 TrustZone and TEEs

ARM TrustZone is the technology that supports the great majority of TEEs, especially in the mobile world. This technology basically divides the CPU resources into a "normal world" (NWd) and a "secure world" (SWd). The REE runs in the NWd, while the TEE runs in the SWd.

From the above separations, TrustZone by itself provides the means for execution separation within the CPU core. To understand how this works, we need to take a look at the different exception levels defined by the ARMv8 architecture:

We can see four privilege levels, EL0 through EL3, and two worlds: NWd (left) and SWd (right). EL0 and EL1 exist in both worlds, and correspond to user-land and kernel-land respectively. EL2 only exists in the normal world (though ARMv8.4 is introducing a secure EL2 as well ) and is used to implement a hypervisor on top of which several virtualized systems can run.

Finally, EL3 is by definition a Secure World level and is used as a bridge between the two worlds.

When the CPU is accessing memory or peripherals through the AXI bus, the bus protocol includes signals indicating whether the access is secure or non-secure. These signals can then be used by the addressed bus slaves (e.g. memory controller, TrustZone-aware peripherals, etc.) to determine whether a given access is allowed or not.

For more information on this part, you can take a look at the slides for this talk on TEE initialization Cristofaro Mune and I gave at EuskalHack 2017.

In order to transition between states, there are specific instructions that can be run at each privilege level:

EL0 can execute an SVC instruction to perform system calls into EL1. It goes without saying that Secure EL0 will perform system calls into Secure EL1, and Non-secure EL0 will perform system calls into Non-secure EL1.
Non-secure EL1 can perform "hypercalls" into EL2 by means of the HVC instruction. EL2 can also configure traps to be interrupted in case specific events occur in EL0 or EL1 (e.g. access to system registers).
EL1 and EL2 can perform monitor calls by means of the SMC instruction. There are monitor calls that are handled directly within the EL3 level, while some others are used to call into the Secure World or return from it.

Therefore, all communication between the two worlds does necessarily flow through the monitor in some way. However, at the logical level applications in EL0 usually open a session with applications in Secure EL0 (Trusted Applications, TAs or trustlets) and communicate with them.

Samsung's TEE implementation

As you may have guessed by now, Samsung's TEE implementation is based on TrustZone technology. On their Exynos SoC up to the Galaxy S9 we can see the following components:

In summary:

Non-secure EL0 and EL1 correspond to the Android applications and framework we are all familiar with.
EL2 implements (parts of) the Runtime Kernel Protection component, which is in charge of protecting the running Linux kernel. As far as I can tell, this hypervisor does not support multiple guests but simply protects criticla Linux kernel data such as credentials and page tables.
Secure EL1 contains the Kinibi Microkernel. Kinibi is a Trusted Execution Environment provided by Trustonic, which Samsung licenses and uses in their handsets.
Secure EL0 contains a number of Trusted Applications (TA) and Trusted Drivers (TD). Generally, TAs provide services to client applications on the NWd, and TDs provide services either to TAs or to client drivers in the NWd. This is the part we focus on in this research.
EL3 contains a monitor based on the ARM Trusted Firmware.

Newer Samsung phones, which at the time of writing means the Galaxy S10 series, ship with a new Samsung-proprietary TEE OS called TEEGris. Therefore our discussion will not be directly applicable to these phones.

Previous public research on Kinibi TEE OS

As mentioned earlier, the Kinibi TEE OS is based on a micro-kernel, with TAs and TDs running in userspace. Daniel Komaromy gave a great presentation introducing the TEE OS at Ekoparty 13 in 2017, and subsequently posted three very informative posts about it. You can also find the video of his talk on youtube .

Daniel's work provides the necessary information to be able to quickly reverse engineer Kinibi Trusted Applications and Drivers:

A description of the OS itself
A list of API calls implemented by the Mobicore standard library, both the TA API and the TD API.
List of system calls, very useful when analyzing the OS itself

Furthermore, Synacktiv recently published a TEE Exploitation 101 blog post where they introduce the exploitation of a stack-based buffer overflow. Additionally, Gal Beniamini published some posts on the Google Project Zero blog with similar information a few years back.

I recommend reading all these posts before continuing with this series to get the best out of this information.

A minimal TA application

A Trusted Application application requires a tlMain function with two parameters:

void *tci: a shared buffer used to communicate with the Normal World.
size_t tciLen: the length of that buffer.

Within this function, the Trusted Application can wait for messages from the Normal World by calling tlApiWaitNotification(timeout). Once a notification is received, a message should be present in the tci buffer and can be processed.

Finally, when the application is finished processing the message it calls tlApiNotify() to tell the Normal World that the processing is complete.

Here is an example application obtained from this presentation:

_TLAPI_ENTRY void tlMain(const addr_t buf,const uint32_t len)
{
        uint32_t secbuf;
        if ((NULL==buf)||(buflen!=4)||!tlApiIsNwdBufferValid(buf, 4))
                tlApiExit(EXIT_ERROR);
        for (;;)
        {
                tlApiWaitNotification(TLAPI_INFINITE_TIMEOUT);
                memcpy(&secbuf,buf,4); secbuf |= 0xDEAD; memcpy(buf,&secbuf,4);
                tlApiNotify();
        }
}

Reverse Engineering Kinibi applications

If you've read the posts from previous research, you already know that Kinibi TAs come in their very own format called MCLF, and there is an IDA Pro mclf loader available. Additionally, there is a port of this loader to Ghidra here.

If you use one of these loaders, your Trusted App code and data segments get mapped into the RE tool but no symbols are available. This is because the format does not use symbols at all, but rather an API table.

All API calls look like this, where tlApiLibEntry contains a function pointer and R0 is used to indicate the API index into the tlApi function table:

tlApiWaitNotification
LDR             R1, =dword_1000
LDR.W           R2, [R1,#(tlApiLibEntry - 0x1000)]
MOV             R1, R0
MOVS            R0, #6
BX              R2

Using Daniel's API list, we can make a quick and dirty script to identify and rename API calls. Further, we can use the abundant debug strings left by Samsung to rename a few more functions. We can even automate this using Joxean's idamagicstrings script, though I mostly named functions manually while doing analyzing the code.

When we are done with that, we can easily find the tlMain function by looking for cross-references to tlApiNotify and tlApiWaitNotification.

If we apply this process to the VaultKeeper TA that shipped with the Samsung Galaxy S9 firmware from July 2018 (when I did this research) and decompile the tlMain function we see something like this:

void __fastcall sub_1194(int a1, unsigned int a2)
{
  int v2; // r4
  unsigned int v3; // r5
  _DWORD *v4; // r5
  int v5; // r0

  v2 = a1;
  v3 = a2;
  sub_6248(dword_A0C84);
  sub_663C("VaultKeeper :: Tlvaultkeeper::Starting\n");
  if ( v2 && v3 >= 0x110 )
  {
    if ( *(_DWORD *)(v2 + 108) <= 0x2820u && *(_DWORD *)(v2 + 120) <= 0x2820u )
    {
      *(_QWORD *)(v2 + 128) = (unsigned int)(v2 + 10408);
      v4 = (_DWORD *)(v2 + 20680);
      *(_DWORD *)(v2 + 112) = v2 + 136;
      *(_DWORD *)(v2 + 116) = 0;
      *(_DWORD *)(v2 + 20808) = v2 + 31088;
      *(_DWORD *)(v2 + 20812) = 0;
      *(_DWORD *)(v2 + 20792) = v2 + 20816;
      *(_DWORD *)(v2 + 20796) = 0;
      while ( 1 )
      {
        tlApiWaitNotification(-1);
        sub_663C("VaultKeeper :: Tlvaultkeeper::Got a message!\n");
        if ( *(_DWORD *)v2 >= 0 )
        {
          sub_19CC(v2, v2 + 20680);
          if ( *v4 )
            *v4 -= 20000;
          v5 = sub_663C("VaultKeeper :: Tlvaultkeeper::Returning [%d/0x%08x]\n");
        }
        else
        {
          v5 = sub_663C("VaultKeeper :: Tlvaultkeeper::Invalid Command(0x%08x)\n");
        }
        tlApiNotify(v5);
      }
    }
    sub_663C("VaultKeeper :: Tlvaultkeeper:: Invalid value(clen %u, dlen %u)\n");
  }
  else
  {
    sub_663C("VaultKeeper :: Tlvaultkeeper:: ticBuffer has problem exit.(expected %u, but %u)\n");
  }

Here we can easily see that there's a logging function at 0x663C, and that the application requires a TCI buffer of at least 0x110 bytes. We can also identify what will be the main message handling function at 0x19CC and start reversing from there.

Memory layout and mitigation techniques

Before moving further into exploitation in the next post, let's recall the TA memory layout and mitigation techniques present in the system. This information can already be obtained from prior publications, but it's worth repeating here.

The TA memory layout can be seen in the diagram below. The layout is fairly straightforward:

There's an unmapped page at NULL, likely to make sure NULL pointer dereferences are always invalid.
The .text segment typically starts at a fixed address indicated by the MCLF header, typically 0x1000.
The .data and .bss segments follow the .text header. The main runtime stack is allocated at the end of .bss.
The main heap is mapped after the .bss segment, with a guard page in between them.
Shared memory buffers (TCI and bulk buffers) are mapped starting at 0x00100000. These are the only buffers that are accessible both to the NWd and the SWd, and therefore their contents must be treated as volatile by the TEE.

From the above description one can infer that there is no ASLR whatsoever, and thus memory addresses will be predictable. However, there's strict NX in the sense that the .text segment is mapped as read/execute and the data segments are mapped as read/write (not executable). There are also no API calls that provide the ability to mark memory as executable or allocate RWX memory.

Therefore, we will need to perform all our actions within a Trusted Application by reusing existing code (ROP, return to libc, etc.).

Finally, some Trusted Applications are compiled with stack canaries. However, this is not applied to every application nor to every function. It is still possible to find Trusted Applications where we can smash the return address in the stack using a classic linear buffer overflow alone, and in fact we'll see an example in the next post.

Next steps

Stay tuned for part 2, where we'll introduce a vulnerability we found in the esecomm Trusted Application (SVE-2018-12852) and an exploit that achieves arbitrary calls within the application.