To stand by my claim that the Microsoft Windows operating system has been built on the fundamental assumption that administrative privileges would always be equivalent to granting the ability to run arbitrary ring-0 code, I have decided to briefly discuss yet another portion of some Windows internals and how they could be easily misused by a system administrator to unlawfully cross the admin / kernel boundary on a X86-64 platform, and effectively elevate his rights on the machine by loading an unsigned device driver of his choice. The technique is directly related to CSRSS (the infamous Client/Server Runtime Subsystem), a part of Windows that has likely motivated most of the dirty relict hacks in the kernel that still remain visible in the most recent versions of the OS.
As usual, let’s start with some historical context. Back in July 2007, omega_red started a thread on the woodmann RCE forums, stating that he had found a GDI bug (Blue Screen of Death triggerable from user-mode) that required “pretty unusual conditions” to work. A few days into the discussion, Alex Ionescu chimed in and said that inspired by omega’s finding, he had spent a night looking around the win32k.sys module and located four vulnerabilities that he would be willing to present on the BlackHat conference; and so he did – the slides from his BlackHat USA 2008 conference talk titled “Pointer and Handles” can be found here. All issues discussed by Alex are fairly interesting, so be sure to check out the slides if you haven’t already; the important one for us would be the NULL Pointer Dereference within CSRSS.EXE via xxxCreateThreadInfo. The kernel routine would dereference an internal CurrentW32Thread->Desktop pointer without prior sanitization, thus using a pointer that was never initialized for the special subsystem process, in the first place. Oh, in fact there might be a “few“ more instances of such bugs in the kernel nowadays, but stay tuned… :-)
During his talk, Alex also mentioned a list of CSRSS-reserved system calls within the main graphical device driver, most of which are still found in Windows 7 and 8 today. These services usually begin with the following expression:
if (PsGetCurrentProcess() != gpepCsrss) { return STATUS_ACCESS_DENIED; }
If we take a look at the list of cross-references to the global gpepCsrss symbol in the context of a cmp operation, we will get a comprehensive list of functions that are either entirely reserved for system use, or provide some such functionality (the list below was acquired from Windows 8 32-bit):
_BlockInput CheckProcessIdentity _CloseDesktop _CreateEmptyCursorObject DestroyProcessesObjects _GetProcessDefaultLayout GetThreadsWithPKL HandleSystemThreadCreationFailure InitializeClientPfnArrays IsHandleEntryAccessibleForIL _LoadCursorsAndIcons NtUserAutoRotateScreen NtUserCheckAccessForIntegrityLevel NtUserCtxDisplayIOCtl NtUserDestroyCursor NtUserGetCaretBlinkTime NtUserGetDoubleClickTime NtUserHardErrorControl NtUserNotifyProcessCreate NtUserPostMessage NtUserPostThreadMessage NtUserProcessConnect NtUserQueryInformationThread NtUserRemoteConnect NtUserRemoteRedrawRectangle NtUserRemoteRedrawScreen NtUserRemoteStopScreenUpdates NtUserSetInformationThread NtUserSetSensorPresence NtUserSetThreadDesktop OpenCacheKeyEx PDEVOBJ::FontManagement PDEVOBJ::GetTrueTypeFile _PostMessageCheckIL _PostMessageExtended _PostThreadMessage PowerOnGdi RecalculateQueueInfo _RegisterHotKey RemoteLogoff RemotePassthruDisable RemoteShadowCleanup RemoteShadowStart RemoteThinwireStats _SetCursorIconData _ShouldFrostCrashedWindow _ShouldFrostSiblingWindow _ShouldGhostWindow UnmapDesktop UserGetDesktopDC ValidateHwndEx vCleanupUMWindowlessSprite VideoPortCalloutThread WakeRITForShutdown xxxCreateSystemThreads xxxCreateThreadInfo xxxDesktopRecalc xxxGetDeviceChangeInfo xxxInitTerminal xxxInternalKeyEventDirect xxxInternalUserChangeDisplaySettings xxxInterSendMsgEx xxxMouseEventDirect xxxRealDefWindowProc xxxRemoteConsoleShadowStop xxxRemoteDisconnect xxxRemoteNotify xxxRemotePassthruEnable xxxRemoteReconnect xxxRemoteShadowSetup xxxRemoteShadowStop xxxSetDeskWallpaper xxxSetProcessInitState xxxSetThreadDesktop xxxSystemParametersInfo xxxWrapRealDefWindowProc zzzClipCursor zzzResetSharedDesktops
Please note that while CSRSS is a system-critical process, it can be easily executed code within by an administrator – after acquiring a SeDebugPrivilege token, one can use pretty much all process-manipulation API functions (such as WriteProcessMemory or CreateRemoteThread) over the process. Now, the big question is – to what extent do these routines trust CSRSS to provide sensible (and safe) input data? If we look at the past, they seem to have been rather reckless at the time and didn’t perform any sanity checking at all, as described by Alex from ntinternals.org in the “Windows XP SP2/SP3 (NtUserConsoleControl) – Local Privilege Escalation” article available here. Even though these issues are long gone now after Microsoft has developed adequate fixes (mostly consisting of calling ProbeForRead and ProbeForWrite in the right places), they didn’t really stop indirectly trusting the subsystem process in numerous other places. For example, let’s take a look at the win32k!NtUserSetInformationThread function implementation in snippets:
.text:BF93ED97 call ds:__imp__PsGetCurrentProcess@0 ; PsGetCurrentProcess() .text:BF93ED9D cmp eax, _gpepCSRSS .text:BF93EDA3 jnz loc_BFA7E296
First of all, it obviously verifies that the caller is indeed a CSRSS process.
.text:BF93EDA9 mov esi, [ebp+length] .text:BF93EDAC cmp esi, 0Ch .text:BF93EDAF ja loc_BFA7E2AA .text:BF93EDB5 test esi, esi .text:BF93EDB7 jz loc_BF93EE4D .text:BF93EDBD and [ebp+ms_exc.disabled], 0 .text:BF93EDC1 mov ebx, [ebp+Address] .text:BF93EDC4 test bl, 3 .text:BF93EDC7 jnz short loc_BF93EE39 .text:BF93EDC9 lea ecx, [ebx+esi] .text:BF93EDCC mov eax, _W32UserProbeAddress .text:BF93EDD1 cmp ecx, eax .text:BF93EDD3 ja short loc_BF93EE3F .text:BF93EDD5 cmp ecx, ebx .text:BF93EDD7 jb short loc_BF93EE3F .text:BF93EDD9 .text:BF93EDD9 loc_BF93EDD9: ; CODE XREF: NtUserSetInformationThread(x,x,x,x)+EA .text:BF93EDD9 push esi ; size_t .text:BF93EDDA push ebx ; void * .text:BF93EDDB lea eax, [ebp+local_buffer] .text:BF93EDDE push eax ; void * .text:BF93EDDF call _memcpy .text:BF93EDE4 mov [ebp+ms_exc.disabled], 0FFFFFFFEh .text:BF93EDEB add esp, 0Ch
Further on, the input data length is enforced to be less or equal to 0ch (12d), after which the data itself is copied into a stack-based buffer located at ebp+var_24. This is not the subject of the post, but please note that there is a bug here already – the routine only checks if the length is not greater than the size of the input structure, but later assumes that it is equal to the size. Therefore, if we pass anything smaller than 12 as the data length (say zero), parts of the buffer will remain uninitialized and later read from – the behavior may consequently lead to a certain degree of information disclosure… Anyway, let’s dig further in the function:
.text:BF93EDEE push esi ; input data length .text:BF93EDEF lea eax, [ebp+local_buffer] .text:BF93EDF2 push eax ; stack-based input buffer of size 12 .text:BF93EDF3 push [ebp+cmd_code] ; fully controlled control code .text:BF93EDF6 push [ebp+handle] ; input handle .text:BF93EDF9 call _xxxSetInformationThread@16 ; xxxSetInformationThread(x,x,x,x)
There’s a call to an internal win32k!xxxSetInformationThread function with controlled parameters and a pointer to controlled structure. Let’s look inside.
.text:BF93EE6E lea ecx, [ebp+Object] .text:BF93EE71 push ecx ; Object .text:BF93EE72 push 1 ; AccessMode .text:BF93EE74 push dword ptr [eax] ; ObjectType .text:BF93EE76 push 20h ; DesiredAccess .text:BF93EE78 push [ebp+handle] ; Handle .text:BF93EE7B call ds:__imp__ObReferenceObjectByHandle@24 ; ObReferenceObjectByHandle(x,x,x,x,x,x) [...] .text:BF93EE8B mov ebx, [ebp+buffer] [...] .text:BF93EE8F mov edi, [ebp+cmd_code] [...] .text:BF93EE9C cmp edi, 9 .text:BF93EE9F jz short loc_BF93EED6 [...] .text:BF93EED6 loc_BF93EED6: ; CODE XREF: xxxSetInformationThread(x,x,x,x)+3F .text:BF93EED6 add ebx, 4 .text:BF93EED9 call _xxxRestoreCsrssThreadDesktop@4 ; xxxRestoreCsrssThreadDesktop(x)
What happens here is that the handle parameter from the original syscall is referenced as a thread type, and later if the control code is equal to nine, another internal function called win32k!xxxRestoreCsrssThreadDesktop is invoked with the (buffer+4) address passed as a parameter inside ecx, and the return value of the preeceding ObReferenceObjectByHandle (NTSTATUS) as a parameter in register esi. By following the call chain, we can observe the following code:
.text:BF93EFEF ; __stdcall xxxRestoreCsrssThreadDesktop(x) [...] .text:BF93F02C loc_BF93F02C: ; CODE XREF: xxxRestoreCsrssThreadDesktop(x)+68 .text:BF93F02C mov ecx, [ebx] .text:BF93F02E test ecx, ecx .text:BF93F030 jnz short loc_BF93F059 .text:BF93F032 .text:BF93F032 loc_BF93F032: ; CODE XREF: xxxRestoreCsrssThreadDesktop(x)+72 .text:BF93F032 mov esi, [ebx+4] .text:BF93F035 test esi, esi .text:BF93F037 jz short loc_BF93F042 .text:BF93F039 push edi .text:BF93F03A call _CloseProtectedHandle@8 ; CloseProtectedHandle(x,x) .text:BF93F03F mov [ebx+4], edi .text:BF93F042 .text:BF93F042 loc_BF93F042: ; CODE XREF: xxxRestoreCsrssThreadDesktop(x)+48 .text:BF93F042 mov eax, [ebp+var_4] .text:BF93F045 pop edi .text:BF93F046 pop esi .text:BF93F047 leave .text:BF93F048 retn [...] .text:BF93F059 call ds:__imp_@ObfDereferenceObject@4 ; ObfDereferenceObject(x) .text:BF93F05F mov [ebx], edi .text:BF93F061 jmp short loc_BF93F032
The above assembly translates to the following:
/* * ... irrelevant ... */ if (buffer->object) { ObfDereferenceObject(buffer->object); } if (buffer->handle) { CloseProtectedHandle(buffer->handle); }
Keeping in mind that buffer is still pointing at a user-controlled structure, it becomes clear that the current implementation enables CSRSS to operate on raw kernel-mode addresses and handles; furthermore, if controlled by a rogue user, it makes it possible to either corrupt kernel-mode memory by passing an arbitrary non-object parameter to ObfDeferenceObject, or potentially cause a use-after-free condition by illegally closing a system handle or passing in a pointer to an object that is not intended to be dereferenced at this point. One simple way to exploit the issue is described below.
The nature of memory operations performed by the ObfDereferenceRoutine routine upon an object is fairly straight-forward:
.text:0045447C lea esi, [ebx-18h] ; ebx = object [...] .text:00454499 or edi, 0FFFFFFFFh .text:0045449C lock xadd [esi], edi
Just one thing to keep in mind is that the signed value being decremented by one is not supposed to drop below one; otherwise, the object manager would attempt to free the “object” from the kernel pools which would either render exploitation completely impossible or unnecessarily complicate it by a lot. If we take a step back and look at “A quick insight into the Driver Signature Enforcement” article from two years ago (which was probably around the time I stumbled upon the issue discussed here), we can see that disabling the mechanism on a live system session can be as simple as flipping the nt!g_CiEnabled byte from one to zero. Let’s take a look at our chances inside a Windows 7 64-bit session investigated with WinDbg:
kd> dq nt!g_cienabled-8 fffff800`02e45eb0 fffff880`00cdc470 00000000`00000001 fffff800`02e45ec0 00000000`00000000 00000000`00000000 fffff800`02e45ed0 00000000`00000000 00000000`00000000 [...]
As we can see, there is a 64-bit pointer directly before the value within interest, and just zeros after it. Bearing in mind that we can’t just decrement a 64-bit value to zero or less explicitly, we need to do some non-aligned memory access. If we take a look at the memory address shifted by one, here’s what we get:
kd> dq nt!g_cienabled-1 fffff800`02e45eb7 00000000`000001ff 00000000`00000000
Our target byte becomes the second-least important byte in a qword, the last one being always 0xff (due to of canonical address space addressing). Therefore, we can perform exactly 256 decrementations using the ObfDereferenceObject call with an arbitrary parameter in order to drop the nt!g_CiEnabled byte to zero and keep the preceeding pointer untouched and valid. This is how it looks in practice:
kd> !process PROCESS fffffa80020c5b30 SessionId: 1 Cid: 017c Peb: 7fffffd9000 ParentCid: 0174 DirBase: 19e1b000 ObjectTable: fffff8a005fc18e0 HandleCount: 204. Image: csrss.exe [...] kd> u nt!ObfDereferenceObject+0x2c: fffff800`02ca946c f0480fc11f lock xadd qword ptr [rdi],rbx kd> ? rbx; dq rdi Evaluate expression: -1 = ffffffff`ffffffff fffff800`02e45eb7 00000000`000001ff 00000000`00000000 [...] kd> ? dq rdi fffff800`02e45eb7 00000000`000001f8 00000000`00000000 [...] kd> ? dq rdi fffff800`02e45eb7 00000000`0000012f 00000000`00000000 [...] kd> ? dq rdi fffff800`02e45eb7 00000000`000000ff 00000000`00000000 kd> db nt!g_CiEnabled fffff800`02e45eb8 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
Aaaaaand… that’s it ;-) Although not very spectacular, you can also watch a demo exploitation video below. I believe I privately reported the bug to Microsoft somewhere around 2009, but it was obviously not classified as a security issue (just a reliability one), and apparently hasn’t been found important enough to be fixed up until now. I hope you guys enjoyed the post, more Windows internals to come soon!
Yeah I enjoyed this post.
Although this is a mostly uninteresting kind of exploit, I feel that Windows contains tons of place where one could elevate from standard user to system using similar techniques.
When I scanned the leaked W2k kernel sources I really noticed a lot of unvalidated user-mode arguments and mutation of shared, unsynchronized state. I think it is (or was) a mess.
Ah, the memories! ;)
The nature of memory operations performed by the ObfDereferenceRoutine routine upon an object is fairly straight-forward:
1. .text:0045447C lea esi, [ebx-18h] ; ebx = object
2. […]
3. .text:00454499 or edi, 0FFFFFFFFh
4. .text:0045449C lock xadd [esi], edi
Mabey i am utherly rusted but it takes alot of imagination to go from prinout 1 to 4 and assume
that the function automatically decrement the value by “256” as stated.
As your trace shown.
1ff
1f8 delta: 7
12f delta: 201
0ff delta: 48
000 delta: 255
Obviously there might be other arcane left to the readers to find
but if thats the case why don’t you wait for disclosure to actually publish.
when load driver,the Program Compatibility Assistant dialog show,but the driver has loaded success. So why the PCA dialog show and how to fuck this?
@tobi: erm… I don’t think scanning through W2k kernel sources is by any means legal unless you’re a Microsoft employee. Anyway, it’s true that win32k.sys is intensely messy and there’s a lot of fishy action going on there. Perhaps it’s the largest source of local and remote vulnerabilities in the Windows kernel ever. Looking forward to your reporting some of them ;)
@omeg: hehe ;-)
@marsh mellow: maybe it’s high time to get some imagination? None of the other readers complained.
@hello: not sure if I understand correctly, but I assume that you’re referring to the ability to load unsigned drivers while the system is in debug mode (i.e. with windbg attached remotely). Have you tried loading a driver with remote debugging disabled?
“None of the other readers complained” … well im not complaining im just highlighting the fastforward assumption without details.
I understand you might not want people to recreate it right off the shelf, but if i read something that disclose and issue why not detail it correctly.
And last time i tried to compile imagination, i couldn’t, lack of memory.
@marsh mellow: really, I think it’s pretty clear what the listing shows (four breaks in random intervals during the process of decrementing the value by one 256 times) based on the context, and let’s stop there.
I mean when I exploit success(aka. nt!g_cienabled has been set to 0), then when I load a unsigned driver, the “Program Compatibility Assistant” dialog appear and says “Windows requires a digitally signed driver…”, of course the driver loaded successfully. but I don’t want the PCA dialog show.
Well the sipset from ObDereferenceObject fail to match a nice msdn page.
Why put disassembly thats incomplete when people can read plain english.
http://msdn.microsoft.com/en-us/library/windows/hardware/ff557724(v=vs.85).aspx
“Perhaps it’s the largest source of local and remote vulnerabilities in the Windows kernel ever.”
The funny thing is when NT 4.0 was released back in 1996, WinFrame already existed based on NT 3.51 with *per-session* CSRSS, but NT4 TSE was not released until 1998.