<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>j00ru//vx tech blog</title>
	<atom:link href="http://j00ru.vexillium.org/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://j00ru.vexillium.org</link>
	<description>Coding, reverse engineering, OS internals covered one more time</description>
	<lastBuildDate>Sun, 05 Sep 2010 23:06:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Kernel exploitation &#8211; r0 to r3 transitions via KeUserModeCallback</title>
		<link>http://j00ru.vexillium.org/?p=614</link>
		<comments>http://j00ru.vexillium.org/?p=614#comments</comments>
		<pubDate>Sun, 05 Sep 2010 22:40:30 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[Assembler]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Ring3]]></category>
		<category><![CDATA[Undocumented API]]></category>
		<category><![CDATA[Windows 7]]></category>
		<category><![CDATA[Windows Vista]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=614</guid>
		<description><![CDATA[Hey there! I have recently came across (well, not entirely by myself... cheers Nahuel!) a fairly (un)common problem related to performing ring0-to-ring3 transitions, after a successful kernel vulnerability exploitation. As I have managed to come up with a bunch of possible solutions, and even write exemplary code for some of these, today I would like [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Hey there!</p>
<p style="text-align: justify;">I have recently came across (well, not entirely by myself... cheers Nahuel!) a fairly (un)common problem related to performing ring0-to-ring3 transitions, after a successful kernel vulnerability exploitation. As I have managed to come up with a bunch of possible solutions, and even write exemplary code for some of these, today I would like to present my thoughts, together with some brief explanation.</p>
<p style="text-align: justify;"><span id="more-614"></span></p>
<h3 style="text-align: justify;">Introduction</h3>
<p style="text-align: justify;">Before trying to find a reliable solution to the problem, it should be clearly stated first. And so, we are considering a 32-bit Windows NT-family version (one of the supported ones), suffering from a stack-based buffer overflow inside one of the system call handler functions. The attacker is able to overwrite memory placed <em>after</em> a fixed-size buffer, including the stack frame, return address, syscall arguments and anything else reachable from this point. As opposed to the reality, we assume that there is no stack protection (i.e. a cookie) implemented, so the security flaw can lead straight into malicious code execution and system compromise. Furthermore, the overflow is triggered right inside the <em>syscall handler</em>, not a nested function of any kind.</p>
<p style="text-align: justify;">The following ascii picture, presenting the stack layout at the time of the overflow, should give you a better insight of the described scenario:</p>
<pre style="text-align: justify;">+-----------------------+
|                       |
|  local variables (1)  |
|                       |
+-----------------------+
|      CHAR buf[32]     | -+
+-----------------------+  |
|                       |  |
|  local variables (2)  |  | overflow
|                       |  | direction
+-----------------------+  |
|     stack frame       |  |
+-----------------------+  v
|    return address     |
+-----------------------+
|                       |
|   syscall parameters  |
|                       |
+-----------------------+
|                       |
| KiFastCallEntry stack |
|                       |
|         (...)         |</pre>
<p style="text-align: justify;">So, here we are; able to control roughly any value, which could lead us into code execution... a perfect dream for every vulnerability researcher. There is one more requirement, however - we must, by any means, return to user-mode, in order to exit the exploit process in a legitimate way (such as using <a href="http://msdn.microsoft.com/en-us/library/ms682658%28VS.85%29.aspx">ExitProcess</a>). So, how do we achieve it, assuming that the original values of the return address, and possibly some of the syscall arguments are lost (due to being overwritten by attacker-supplied data)? Let's find out, what the options are.</p>
<h3 style="text-align: justify;">KiFastCallEntry and KiServiceExit</h3>
<p style="text-align: justify;">Under normal system execution (i.e. when its stability and security don't collapse), each system call handler - such as NtOpenFile - returns to its original caller, the <em>KiFastCallEntry</em> function. This routine, in turn, is a dispatcher most often used upon the <a href="http://www.intel.com/software/products/documentation/vlin/mergedprojects/analyzer_ec/mergedprojects/reference_olh/mergedprojects/instructions/instruct32_hh/vc311.htm"><em>sysenter</em></a> instruction being utilized by ring-3 code (however, it is also used by kernel modules, when taking advantage of system services). After calling an adequate handler from <em>KeServiceDescriptorTable</em>, the dispatcher is supposed to lower the processor privilege level, by returning to where the <em>syscall</em> instruction was triggered.</p>
<p style="text-align: justify;">The latter part of the job is implemented by the <em>KiServiceExit</em> routine, responsible for coming back to the service caller, whatever it is. Interestingly enough, <em>KiFastCallEntry</em> doesn't need to call the exit function, thanks to a specific assembly code layout, designed by the system developers:</p>
<pre style="text-align: justify;">+-----------------------+
| nt!KiFastCallEntry    |
|                       | --+
|      /* code */       |   |
|       CALL EBX        | &lt;-|-- EBX = syscall handler address
|                       |   |
|-----------------------|   |
| nt!KiServiceExit      |   |
|                       |   | code execution direction
|      /* code */       |   v
|       SYSEXIT         |
|                       |
</pre>
<p style="text-align: justify;">As the <em>KiServiceExit</em> implementation directly follows the "end" of <em>KiFastCallEntry</em>, the code execution automatically moves from one routine, into another. This way, no actual <em>call</em> instruction is required, as the smart layout causes <em>KiServiceExit</em> to always execute after returning from the syscall handler. Due to the fact, that by exchanging the original return address with the one pointing at our shellcode, we do not land inside <em>KiServiceExit</em> automatically, anymore. What makes the situation even worse, is the fact that the exit routine is an internal symbol, not publicly exported to other, ring-0 modules.</p>
<p style="text-align: justify;">Considering the above conditions, finding a reliable way of returning into user-mode might appear to be somewhat problematic. The next couple of sections aim to show the bright and dark sides of some possible solutions, which I have been able to think of - if there is something I have apparently missed, please let me know - I will be glad to extend the article with additional material <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<h3 style="text-align: justify;">Obtaining internal kernel symbols</h3>
<p style="text-align: justify;">The first, and probably most straight-forward solution one could think of, requires the attacker to recognize the precise version of the kernel image being used, and take advantage of symbols' packages, publicly available on Microsoft servers. An adequate package could be either downloaded at run-time (provided that the attacked machine is connected to internet at the time of the exploitation), or distributed together with the malicious application. A <em>lighter</em> version of the latter option could rely on hard-coding the KiServiceExit function addresses, for every single kernel image version possible.</p>
<p style="text-align: justify;"><strong>Advantages</strong>: If the exploit was taking advantage of legitimate, Microsoft-supplied symbols or using a static table of supported Windows editions together with the desired kernel addresses, it could achieve a decent level of reliability. If one knows the KiServiceExit memory placement, there isn't much left to be done - just aligning the stack as it would be upon a normal syscall return, and jumping to the routine after the payload completes.</p>
<p style="text-align: justify;"><strong>Disadvantages</strong>: In case the attacker decided to download a complete <em>ntosknrl.exe</em> symbol file from the web, he could probably put the entire operation at risk, as the the .pdb file being retrieved can be as large as 5MB (or more). The exploit could obviously employ various DKOM-style techniques, in order to hide the connection; this would only work for the local machine, though - how about other computers in the network, and/or devices along the way to the global net? The attacker could be either caught in the first place, or leave significant amounts of proof for the forensics researchers.<br />
If, in turn, the attacker went towards using hardcoded-values, he would be forced to keep his exploit up-to-date, in the context of new system patches being released along the way.</p>
<p style="text-align: justify;">Problems of the above nature are, obviously, not an issue, if the attacker has a relatively small number of targets, and is able to figure out the computers' kernel versions by other means (i.e. having a local account on a given machine would usually help a lot).</p>
<h3 style="text-align: justify;">Signature scan</h3>
<p style="text-align: justify;">Another, well-known way of retrieving the address-of-whatever relies on performing a quick &amp; dirty signature scan of the memory. In this particular case, one would have to scan the entire <em>ntoskrnl.exe</em> image memory area, in search of a previously-extracted signature, unique for the KiServiceExit routine. The signature could (or probably: <em>should</em>) be constructed so that it would work for every operating system out there, or be kept inside a hard-coded table of supported kernel versions (as mentioned in the previous section).</p>
<p style="text-align: justify;"><strong>Advantages</strong>: The exploit doesn't have to establish any outgoing connections. In fact, it doesn't make use of the internet, at all. Depending on the length and quality of the signature, as well as the numbers of kernel modifications applied by Microsoft, this technique could turn out to be either reliable, or the very opposite. According to the author, it is usually best to consider signature-scanning unreliable, regardless of the conditions. If, however, the attacker proved that the KiServiceExit address can be easily obtained, using a signature valid for all existing systems and is unlikely to change - I would claim such solution to be a relatively good one.</p>
<p style="text-align: justify;"><strong>Disadvantages</strong>: As far as my experience goes, using constant signatures is rarely a good idea, especially if there are other options to pick. The exploit developer can be never certain that Microsoft doesn't unexpectedly change the kernel code, stack layout, or anything affecting the function assembly being relied on. What is worse, the problem is not only about changing the KiServiceExit contents itself - it is enough that a new byte sequence, matching the existing pattern appears <em>anywhere</em> in the kernel image; and the exploit is fooled. Concluding - not a recommended technique, when it comes to my opinion.</p>
<h3 style="text-align: justify;">Own <em>KiServiceExit</em> implementation</h3>
<p style="text-align: justify;">The next solution to be considered, would require the exploit developer to create his own implementation of the exit routine, rather than keep trying to (non-deterministically) find it's virtual address in memory. This is possible because of the fact that we're executing with the same rights as the kernel itself, and are able to use any privileged instruction it uses. The only problem here could be potentially caused by the complexity of the function - fortunately, it is not the case for KiServiceExit.</p>
<p style="text-align: justify;"><strong>Advantages</strong>: The major upside of this method, resides in the fact that we are not dependent on virtual addresses of any kind (apart from the actual payload, which might require these). In other words, it is possible to implement one payload <em>epilogue</em>, and use it across numerous system versions, as long as the stack layout (most importantly - the trap frame) doesn't change. According to my observations, the KiServiceExit routine either doesn't change at all, or is changed in minor parts (i.e. single instructions). Even though there might be a few differences between Windows 2000 and Windows Vista; such low-level parts of the system aren't modified in one day. And so, carefully preparing one, separate implementation of the function for each Windows NT-family release (2000, XP, Vista, 7) should be sufficient to keep the reliability on a very high level.</p>
<p style="text-align: justify;"><strong>Disadvantages</strong>: One actual drawback, which could be pointed out is that the solution is still not as elegant, as it could possibly be. That's due to the fact that the kernel-to-user transition is being performed, using highly undocumented (except for the \ntos\ke\i386\trap.asm file, present inside the <a href="http://www.microsoft.com/resources/sharedsource/windowsacademic/researchkernelkit.mspx">Windows Research Kernel</a> package) system behavior and internal offsets. As a consequence, even though it is very likely that someone's implementation of the exit routine will work on any build of a specific Windows version, there is no certainty about it - especially in the context of future Windows versions.</p>
<h3 style="text-align: justify;">The <em>KeUserModeCallback</em> technique</h3>
<p style="text-align: justify;">Last, but not least - the technique that was my first thought, when I started reflecting on the problem. Since the mechanism taken advantage of, in this method, has been already described numerous times (such as the "<a href="http://uninformed.org/index.cgi?v=10&amp;a=2#SECTION00042000000000000000">KeUserModeCallback utilization</a>" section of <a href="http://www.uninformed.org/?v=10&amp;a=2">mxatone's article</a>, or <a href="http://www.nynaeve.net/?p=204">Nynaeve's post</a>), I will only give a brief explanation of its concept.</p>
<p style="text-align: justify;">Under normal conditions, ring-3 code can only interact with the kernel modules via <em>system calls</em> (regular interrupts are mostly deprecated, while call-gates are not used, at all). This basic scheme relies on the fact, that user applications send specific requests, asking the kernel either to perform operations, which require higher processor privileges, or to be supplied with necessary information. A request is made (via the<em> </em>INT 2E or sysenter instruction), kernel dispatches the requests and possibly returns some information - then comes back to user mode (via either iretd or sysexit). Following the above scheme, one could consider system calls to be a specific type of callback functions - whenever an application wants to interact with the system, it <em>calls back</em> an adequate function from the kernel.</p>
<p style="text-align: justify;">As it turns out, the kernel might want to <em>call back</em> into user-mode, as well! More precisely, the standard graphical driver (win32k.sys), needs to use ring-3 routines in numerous situations; in order to send notifications about graphical events going on, or to request some information. In order to meet the requirements, a special interface called <em>user-mode callbacks</em> was developed inside the NT kernel. The interface actually consists of one public, and a few internal kernel routines:</p>
<pre style="text-align: justify;">NTSTATUS KeUserModeCallback (
    IN ULONG ApiNumber,
    IN PVOID InputBuffer,
    IN ULONG InputLength,
    OUT PVOID *OutputBuffer,
    IN PULONG OutputLength
    );
</pre>
<p style="text-align: justify;">
<p>By using the above function, exported by ntoskrnl.exe, the graphical module is able to perform a legitimate ring-0 into ring-3 transition. What happens next, is that some basic information regarding the execution state is stored on the kernel stack, and the execution is passed to the user-mode <span style="text-decoration: underline;">ntdll.KiUserCallbackDispatcher</span> function, of the following prototype:</p>
<pre style="text-align: justify;">VOID KiUserCallbackDispatcher(
    IN ULONG ApiNumber,
    IN PVOID InputBuffer,
    IN ULONG InputLength
    );
</pre>
<p style="text-align: justify;">
<p>The dispatcher is then responsible for forwarding the execution into one of the callback routines (the EDX register contains the ApiNumber parameter):</p>
<pre style="text-align: justify;">mov eax, large fs:18h
mov eax, [eax+30h]
mov eax, [eax+2Ch]
call dword ptr [eax+edx*4]</pre>
<p style="text-align: justify;">
<p>Seemingly, the user-side dispatch table is pointed to by one of the PEB (<a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Process/PEB.html">Process Environment Block</a>) fields. After the given callback completes its task, it resumes the win32k.sys execution by either using a dedicated interrupt (INT 2D, internally <em>called KiCallbackReturn</em>), or triggering the <em>NtCallbackReturn</em> system call. The question is - how does the above information help us achieve the desired exploitation effect?</p>
<p style="text-align: justify;">Thanks to the fact that KeUserModeCallback is a public symbol, any active module running in kernel-mode can call the function in a fully reliable manner. What is more, we can also hook the KiUserCallbackDispatcher function, or better yet - redirect the dispatch table pointer, residing inside PEB. If we perform the above steps, we become able to trigger our own, fully controlled, kernel-to-user transitions. Thanks to the clever NT kernel, we don't really have to care about what is left on the kernel stack, as it will be gracefully cleaned up, upon the process termination. Below, you can find exemplary code snippets, responsible for accomplishing each stage of the safe kernel-to-user transition:</p>
<ol style="text-align: justify;">
<li>Loading the graphical library - before we decide to touch any of the win32-related PEB fields, we should make sure that the user32.dll library has been previously loaded. This way, we are guaranteed, that both the user- and kernel- parts of the system graphics are correctly initialized for our process.<br />
<br/>
<pre>LoadLibraryA("user32.dll");</pre>
<p><br/>
</li>
<li>Replace the original dispatch table pointer, with the one controlled by us.<br />
<br/>
<pre>LPVOID GetFSBase(void)
{
  LDT_ENTRY ldt;
  GetThreadSelectorEntry(GetCurrentThread(), GetFS(), &amp;ldt);
  return (LPVOID)(ldt.BaseLow | (ldt.HighWord.Bytes.BaseMid &lt;&lt; 16) | (ldt.HighWord.Bytes.BaseHi &lt;&lt; 24));
}

(...)

 for( i=0;i&lt;DISPATCH_TABLE_SIZE;i++ )
   DispatchTable[i] = CallbackHandler;

 BYTE* Teb = GetFSBase();
 Teb = *(DWORD*)(Teb+0x18);
 Teb = *(DWORD*)(Teb+0x30);
 *(DWORD*)(Teb+0x2C) = DispatchTable;
</pre>
<p><br/>
</li>
<li>Retrieve the nt!KeUserModeCallback address. This step can be achieved, by taking advantage of the PSAPI interface (to retrieve the ImageBase of the kernel image; <a href="http://msdn.microsoft.com/en-us/library/ms682617%28VS.85%29.aspx">EnumDeviceDrivers</a> and <a href="http://msdn.microsoft.com/en-us/library/ms683184%28VS.85%29.aspx">GetDeviceDriverBaseNameA</a> are of much use), loading the very same image in the context of our application, and performing some simple maths. I have made use of my personal <em>GetKernelProcAddress</em> function this time - implementing this one is left as an exercise to the reader <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /><br />
<br/>
<pre>KeUserModeCallback = (typeof(KeUserModeCallback))GetKernelProcAddress("ntoskrnl.exe","KeUserModeCallback");</pre>
<p><br/>
</li>
<li>Trigger the buffer overflow, leading to the Payload() function being executed. <em>Shellcode</em> represents the actual code for elevating user privileges, starting up a reverse shell, or whatever else you can think of.<br />
<br/>
<pre>VOID Payload()
{
  ((VOID(*)())Shellcode)();
  KeUserModeCallback(0,0,0,0,0);
}
</pre>
<p><br/>
</li>
<li>Catch the user-mode callback inside CallbackHandler(), and gracefully terminate the process.<br />
<br/>
<pre>DWORD CallbackHandler()
{
  if(b0f_triggered) ExitProcess();

  NtCallbackReturn(0,0,ERROR_SUCCESS);
  return ERROR_SUCCESS;
}
</pre>
<p><br/>
</li>
<li>That's it, we're done!</li>
</ol>
<p style="text-align: justify;">What should be eventually noted, is that the KeUserModeCallback leads to the KiServiceExit function in the end, as the following call chain shows:</p>
<pre style="padding-left: 30px; text-align: justify;">| nt!KeUserModeCallback
| nt!KiCallUserMode
v nt!KiServiceExit</pre>
<p style="text-align: justify;">
<p>Let's take a closer look at the actual pros and cons of the presented technique.</p>
<p style="text-align: justify;"><strong>Advantages</strong>: The entire solution basically relies on two steps: calling a <span style="text-decoration: underline;">public</span> nt!KeUserModeCallback routine after successful exploitation, and "catching" the execution flow at the <span style="text-decoration: underline;">public</span> ntdll!KiUserCallbackDispatcher function, or at one of the callback handlers, pointed to by the PEB. Seemingly, both steps can be accomplished in a fully reliable way, as long as Microsoft decides to either completely remove one of the utilized functions, or make it an internal symbol. Since such a scenario is highly unlikely, we can safely assume that the technique is, and will be perfect for returning into user-code from difficult situations (such as a seriously damaged stack).</p>
<p style="text-align: justify;"><strong>Disadvantages</strong>: One, possible disadvantage that comes into my mind, is that replacing the PEB pointer, containing the dispatch table might not be as easy as one might suppose. Due to the fact that high PEB offsets are likely to change between different Windows versions, the attacker should take this fact into consideration when planning a world-wide, cross-version attack. This downside doesn't change anything though, as it is possible to disrupt the execution yet inside the exported KiUserCallbackDispatcher, as mentioned before. If you know about any other drawbacks I am not aware of, please let me know.</p>
<h3 style="text-align: justify;">Why so serious (about ring-3)?</h3>
<p style="text-align: justify;">Looking at the above text, one might wonder, why the problem is stated so that the kernel-to-user transition <span style="text-decoration: underline;">must</span> take place, when it doesn't have to under normal circumstances. The answer is - because. When it comes to kernel-mode, there are bunches of bunches of possible scenarios, machine states, and other factors which sometimes can be predicted, and sometimes not; returning to user-mode <em>might</em> be the best choice, at times. One should keep in mind, however, that there are ways to terminate the current process from within ring-0 (such as <a href="http://msdn.microsoft.com/en-us/library/ff567115%28VS.85%29.aspx">nt!ZwTerminateProcess</a>). Or better yet - once code execution is achieved, the process could simply load a regular rootkit driver (hiding the existence of the process), and remain in the idle state until machine reboot, by infinitely calling nt!ZwYieldExecution.</p>
<h3 style="text-align: justify;">Conclusion</h3>
<p style="text-align: justify;">In this post, I aimed at presenting yet another, interesting scenario related to the kernel exploitation field, with a couple of possible solutions. Even thought situations of the described nature don't tend to happen very often, they do. Besides that, all four techniques are directed towards universality, so they can be used not only when a stack-based buffer overflow takes place, but whatever kind of situation when it is hard, or impossible to resume the original track of kernel code execution. So, that's it... comments are welcome, as always! <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p style="text-align: justify;">Have fun.</p>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=614</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Windows CSRSS Write Up: Inter-process Communication (part 2/3)</title>
		<link>http://j00ru.vexillium.org/?p=527</link>
		<comments>http://j00ru.vexillium.org/?p=527#comments</comments>
		<pubDate>Tue, 27 Jul 2010 21:41:30 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[Assembler]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[CSRSS]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Ring3]]></category>
		<category><![CDATA[Undocumented API]]></category>
		<category><![CDATA[Windows 7]]></category>
		<category><![CDATA[Windows Vista]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[hacking]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=527</guid>
		<description><![CDATA[A quick beginning note: My friend d0c_s4vage has created a technical blog and posted his first text just a few days ago. The post entry covers a recent, critical libpng vulnerability discovered by this guy; the interesting thing is that, among others, the latest Firefox and Chrome versions were vulnerable. Feel free to take a [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;"><strong>A quick beginning note:</strong> My friend <a href="http://d0cs4vage.blogspot.com/">d0c_s4vage</a> has created a technical blog and posted his first text just a few days ago. The post entry covers a recent, critical <a href="http://www.libpng.org/pub/png/libpng.html">libpng</a> vulnerability discovered by this guy; the interesting thing is that, among others, the latest Firefox and Chrome versions were vulnerable. Feel free to take a minute and read the article <a href="http://d0cs4vage.blogspot.com/2010/07/libpng-extra-row-cve-2010-1205.html">here</a>.</p>
<p style="text-align: justify;">Additionally, the video and mp3 recordings from the presentation performed by me and <a href="http://gynvael.coldwind.pl/">Gynvael</a> on the <a href="http://j00ru.vexillium.org/?p=363">CONFidence 2010</a> conference, are now publicly available on the official website: <a href="http://2010.confidence.org.pl/materials"><strong>link</strong></a> (<span style="text-decoration: underline;">Case study of recent Windows vulnerabilities</span>).</p>
<p><span id="more-527"></span></p>
<h3 style="text-align: justify;">Foreword</h3>
<p style="text-align: justify;">A majority of the LPC /supposedly an acronym for <em>Local Inter-Process Communication</em> rather than <em>Local Procedure Calls</em>, as stated in <a href="http://www.microsoft.com/resources/sharedsource/windowsacademic/researchkernelkit.mspx">WRK</a>/ basics have been described in the <a href="http://j00ru.vexillium.org/?p=502"><span style="text-decoration: underline;">first post</span></a> of Inter-process Communication chapter, together with the corresponding, undocumented native functions related to LPC Ports. As you already have the knowledge required to understand higher abstraction levels, today I would like to shed some light on the internal Csr~ interface provided by NTDLL and extensively utilized by the Win32 API DLLs (kernel32 and user32).</p>
<h3 style="text-align: justify;"><img class="alignleft" style="margin-right: 10px;" title="API levels" src="http://j00ru.vexillium.org/blog/27_07_10/API_levels.png" alt="" width="206" height="368" />Introduction</h3>
<p style="text-align: justify;">As explained previously, LPC is an (officially) undocumented, packet-based IPC mechanism. It basically relies on two things - a Port Object and internal LPC structures, such as _PORT_HEADER - both unexposed to the Windows API layer. Due to the fact that CSRSS implements his own protocol on top of LPC, it would become highly inconvenient (and impractical) for the win32 libraries to take care of both LPC and CSRSS internals, at the same time. And so, an additional layer between the port-related functions and high-level API was created - let's call it <span style="text-decoration: underline;">Native Csr Interface</span>.</p>
<p style="text-align: justify;">The medium level of the call chain provides a set of helper functions, specifically designed to hide the internals of the communication channel from high-level API implementation. Therefore, it should be theoretically possible to re-implement the Csr-Interface using a different communication mechanism with similar properties, without any alterations being applied on the API level. This has been partially accomplished by replacing the deprecated LPC with an improved version of the mechanism - Advanced / Asynchronous LPC on modern NT-family systems (Vista, 7).</p>
<p style="text-align: justify;">In this post, the precise meaning, functionalities and definitions of the crucial Csr~ routines will be focused on. After reading the article, one should be able to recognize and understand specific CSR API calls found inside numerous, documented functions related to console management, process / thread creation and others.</p>
<h3 style="text-align: justify;">Connection Initialization</h3>
<p style="text-align: justify;">What has already been mentioned is the fact that every application belonging to the win32-subsystem is connected to the Windows Subsystem process (CSRSS) at its startup, by default. Although it is technically possible to disconnect from the port before the program is properly terminated, such behavior is beyond the scope of this post entry. However, some details regarding a security flaw related to CSRSS-port disconnection in the context of a live process, can be found <a href="http://vexillium.org/dl.php?HISPASEC_CSRSS_Priv_Escal.pdf">here</a> and <a href="http://www.microsoft.com/technet/security/bulletin/ms10-011.mspx">here</a> (discovered by me and Gynvael).</p>
<p style="text-align: justify;">From this point on, it will be assumed that when the process is given execution (i.e. Entry Point, imported module's <a href="http://msdn.microsoft.com/en-us/library/ms682583(VS.85).aspx">DllMain</a> or TLS callback is called), the CSRSS connection is already established. And so, the question is - how, and where the connection is set up during the process initialization. This section provides answers for both of these questions.</p>
<h4 style="text-align: justify;">Opening named LPC port</h4>
<p style="text-align: justify;">During a process creation, numerous parts of the system come into play and perform their part of the job. It all starts with the parent application calling an API function (<a href="http://msdn.microsoft.com/en-us/library/ms682425(VS.85).aspx">CreateProcess</a>) - the execution then goes through the kernel, a local win32 subsystem, and finally - ring-3 process self-initialization (performed by the system libraries). A step-by-step explanation of the Windows process creation can be found in the <a href="http://www.amazon.com/Windows%C2%AE-Internals-Including-Windows-PRO-Developer/dp/0735625301">Windows Internals 5</a> book, Chapter "Processes, Threads and Jobs".</p>
<p style="text-align: justify;">As the CSRSS connection is not technically crucial for the process to exist (and execute), it can be performed later than other parts of the process initialization. And so, the story of establishing a connection with the subsystem begins in the context of a newly-created program - more precisely, inside the kernel32 entry point (kernel32!BaseDllInitialize). At this point, the CSRSS-related part of the routine performs the following call:</p>
<pre class="brush: c">BOOL WINAPI _BaseDllInitialize(HINSTANCE, DWORD, LPVID)
{
(...)
CsrClientConnectToServer(L&quot;\\Windows&quot;,BASESRV_INDEX,...);
(...)
}</pre>
<p style="text-align: justify;">thus forwarding the execution to the ntdll.dll module, where a majority of the subsystem-related activities are performed. Before we dive into the next routine, two important things should be noted here:</p>
<ol style="text-align: justify;">
<li>
<p style="text-align: justify;">The Base Dll (kernel32) has complete control over the Port Object directory and makes the final decision regarding the referenced port's name prefix. As it turns out, it is also possible for a different Object Directory to be used - let's take a look at the following pseudo-code listing:</p>
<pre>
<pre class="brush: c">if(SessionId)
  swprintf(ObjectDirectory,L&quot;%ws\\%ld%ws&quot;,L&quot;\\Sessions&quot;,SessionId,L&quot;\\Windows&quot;);
else
  wcscpy(ObjectDirectory,L&quot;\\Windows&quot;);</pre>
</pre>
<p style="text-align: justify;">The "SessionId" symbols represents a global DWORD variable, initialized  inside the BaseDllInitialize function, as well:</p>
<pre>
<pre class="brush: php"> mov     eax, large fs:18h
 mov     eax, [eax+30h]
 mov     eax,  [eax+1D4h]
 mov     _SessionId, eax</pre>
</pre>
<p>... translated to the following high-level pseudo-code:</p>
<pre>
<pre class="brush: c">SessionId = NtCurrentTeb()-&gt;SessionId;</pre>
</pre>
<p style="text-align: justify;">If one takes a look into the PEB structure definition, he will certainly find the variable:</p>
<pre>
<pre class="brush: php">kd&amp;gt; dt _PEB
 nt!_PEB
 (...)
   +0x154 TlsExpansionBitmapBits : [32] Uint4B
   +0x1d4 SessionId        : Uint4B
   +0x1d8 AppCompatFlags   : _ULARGE_INTEGER
 (...)</pre>
</pre>
</li>
<li>
<p style="text-align: justify;">If one decides to connect to the win32 subsystem, he must specify a particular ServerDll to connect to (csrsrv, basesrv, winsrv); the identification number is be passed as the second argument of <em>CsrClientConnectToServer</em>. As can be seen, kernel32 specifies the BASESRV_INDEX constant, as it desires to connect to a certain module - being basesrv in this case. Basesrv.dll is the kernel32 equivalent on the subsystem side - a Csr connection between these two modules is required for some of the basic win32 API calls to work properly.</p>
<p style="text-align: justify;">On the other hand, all of the console-management functionality is implemented by winsrv (to be exact - the <em>consrv</em> part of the module). And so - in order to take advantage of functions, such as <a href="http://msdn.microsoft.com/en-us/library/ms681944%28VS.85%29.aspx">AllocConsole</a>, <a href="http://msdn.microsoft.com/en-us/library/ms683150%28VS.85%29.aspx">FreeConsole</a>, <a href="http://msdn.microsoft.com/en-us/library/ms686050%28VS.85%29.aspx">SetConsoleTitle</a> or <a href="http://msdn.microsoft.com/en-us/library/ms687401%28VS.85%29.aspx">WriteConsole</a> - a valid connection with winsrv is also required. Fortunately - kernel32 remembers about it and issues a call to another internal function - ConDllInitialize() - after the LPC Port connection is successfully established. The routine's obvious purpose is to set up the console-related structures inside the Base dll image, and use the CsrClientConnectToServer function with the second argument set to CONSRV_INDEX.</p>
</li>
</ol>
<p style="text-align: justify;">When we make a step into CsrClientConnectToServer and analyze further, a great amount of CSRSS-related initialization code surrounds us. Don't worry - a huge part of the routine deals with user-mode structures and other irrevelant stuff - our interest begins, where the following call is made:</p>
<pre class="brush: c">if(!CsrPortHandle)
{
ReturnCode = CsrpConnectToServer(ObjectDirectory); // ObjectDirectory is kernel32-controlled
if(!NT_SUCCESS(ReturnCode))
return (ReturnCode);
}</pre>
<p style="text-align: justify;">As the above indicates, the global CsrPortHandle variable is compared with zero - if this turns out to be true, <span style="text-decoration: underline;">CsrpConnectToServer</span> is called, taking the object directory string as its only argument. So - let's face another routine ;&gt;</p>
<p style="text-align: justify;">The proc starts with the following code:</p>
<pre class="brush: c"> CsrPortName.Length    = 0;
CsrPortName.MaxLength = 2*wcslen(ObjectDirectory)+18;
CsrPortName.Buffer    = RtlAllocateHeap(CsrHeap,NtdllBaseTag,CrsPortName.MaxLength);

RtlAppendUnicodeToString(&amp;CsrPortName,ObjectDirectory);
RtlAppendUnicodeToString(&amp;CsrPortName,L&quot;\\&quot;);
RtlAppendUnicodeToString(&amp;CsrPortName,L&quot;ApiPort&quot;);</pre>
<p style="text-align: justify;">Apparently, the final Port Object name is formed here, and stored inside a local "UNICODE_STRING CsrPortName" structure. Next then, a special section is created, using an adequate native call:</p>
<pre class="brush: c"> LARGE_INTEGER SectionSize = 0x10000;
NtStatus = NtCreateSection(&amp;SectionHandle, SECTION_ALL_ACCESS, NULL, &amp;SectionSize, PAGE_READWRITE, SEC_RESERVE, NULL);

if(!NT_SUCCESS(NtStatus))
return NtStatus;</pre>
<p style="text-align: justify;">This section is essential to the process&lt;-&gt;subsystem communication, as this memory area is mapped in both the client and win32 server, and then used for exchanging large portions of data between these two parties. And so, when the section is successfully created, the routine eventually tries to connect to the named port!</p>
<pre class="brush: c"> /* SID Initialization */
NtStatus = RtlAllocateAndInitializeSid(...,&amp;SystemSid);
if(!NT_SUCCESS(NtStatus))
return NtStatus;

NtStatus = NtSecureConnectPort(&amp;CsrPortHandle,&amp;CsrPortName,...);
RtlFreeSid(SystemSid);
NtClose(&amp;SectionHandle);</pre>
<p style="text-align: justify;">For the sake of simplicity and reading convenience, I've stripped the remaining arguments from the listing; they describe some advanced connection characteristics, and are beyond the scope of this post. When everything is fine up to this point, we have an established connection (yay, CSRSS accepted our request) and an open handle to the port. Therefore, we can start sending first packets, in order to let CSRSS (and its modules - ServerDlls) know about ourselves.</p>
<p style="text-align: justify;">So - after returning back to ntdll!CsrClientConnectToServer:</p>
<pre class="brush: c"> NtStatus = CsrpConnectToServer(ObjectName);
if(!NT_SUCCESS(NtStatus))
return NtStatus;</pre>
<p style="text-align: justify;">the following steps are taken:</p>
<pre class="brush: c"> if(ConnectionInformation)
{
CaptureBuffer = CsrAllocateCaptureBuffer(1,InformationLength);
CsrAllocateMessagePointer(CaptureBuffer,InformationLength,&amp;conn.ConnectionInformation);
RtlMoveMemory(conn.ConnectionInformation,ConnectionInformation,InformationLength);
}
CsrClientCallServer(&amp;Message, CaptureBuffer, CSR_API(CsrpClientConnect), sizeof(ConnStructure));</pre>
<p style="text-align: justify;">First of all, the ConnectionInformation pointer is checked - in case it's non-zero, the <span style="text-decoration: underline;">CsrAllocateCaptureBuffer</span>, <span style="text-decoration: underline;">CsrAllocateMessagePointer</span> and <span style="text-decoration: underline;">RtlMoveMemory</span> functions are called, respectively. The purpose of these operations is to move the data into a shared heap in such a way, that both our application and CSRSS can easily read its contents. After the "if" statement, a first, real message is sent to the subsystem using CsrClientCallServer, of the following prototype:</p>
<pre class="brush: php">NTSTATUS CsrClientCallServer(PCSR_API_MSG m, PCSR_CAPTURE_HEADER CaptureHeader, CSR_API_NUMBER ApiNumber, ULONG ArgLength);</pre>
<p style="text-align: justify;">For a complete, cross-version compatible table and/or list of Csr APIs, check the following references: <a href="http://j00ru.vexillium.org/csrss_list/api_list.html">CsrApi List</a> and <a href="http://j00ru.vexillium.org/csrss_list/api_table.html">CsrApi Table</a>. And so, in the above snippet, the "CsrpClientConnect" API is used, providing additional information about the connecting process. This message is handled by an internal csrsrv.CsrSrvClientConnect routine, which redirects the message to an adequate callback function, specified by the ServerDll being connected to (in this case - basesrv!BaseClientConnectRoutine).</p>
<p style="text-align: justify;">After sending the above message, the connection between the client- and server-side DLLs (i.e. kernel32 and basesrv) can be considered fully functional.</p>
<p style="text-align: justify;">As it turns out, parts of the execution path presented above can be also true for CSRSS itself! Because of the fact that ntdll!CsrClientConnectToServer can be reached from inside the subsytem process, the CsrClientConnectToServer routine must handle such case properly. And so - before any actions are actually taken by the function, the current process instance is checked, first:</p>
<pre class="brush: c">NtHeaders = RtlImageHeader(NtCurrentPeb()-&gt;ImageBaseAddress);
CsrServerProcess = (NtHeaders-&gt;OptionalHeader.Subsystem == IMAGE_SUBSYSTEM_NATIVE);

if(CsrServerProcess)
{
// Take normal steps
}
else
{
// Do nothing, except for the _CsrServerApiRoutine pointer initialization
_CsrServerApiRoutine = GetProcAddress(GetModuleHandle(&quot;csrsrv&quot;),&quot;CsrCallServerFromServer&quot;);
}</pre>
<p style="text-align: justify;">Apparently, every process connecting to the LPC Port that has the SUBSYSTEM_NATIVE header value set, is assumed to be an instance of CSRSS. This, in turn, implies that CSRSS is the only native, system-critical process which makes use of the Csr API calls.</p>
<h3 style="text-align: justify;">Data tranmission</h3>
<p style="text-align: justify;">Having the connection up and running, a natural order of things is to exchange actual data. In order to achieve this, one native call is exported by ntdll - the CsrClientCallServer function, already mentioned in the text. Because of the fact that each Csr API requires a different amount of input/output data (while some don't need these, at all) from the requestor, as well as due to the LPC packet-length limitations, the messages can be sent in a few, different ways.</p>
<p style="text-align: justify;">In general, all of the CSR-supported packets can be divided into three, main groups: <em>empty</em>, <em>short</em>, and <em>long</em> packets. Based on the group a given packet belongs to, it is sent using an adequate mechanism. This section provides a general overview of the data transmission-related techniques, as well as exemplary (practical) use of each type.</p>
<h4 style="text-align: justify;">Empty packets</h4>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Description</p>
<p style="text-align: justify;">"Empty packets" is a relatively small group of purely-informational messages, which are intended to make CSRSS perform a specific action. These packets don't supply any input data - their API ID is the only information needed by the win32 subsystem. A truely-empty packets don't generate any output data, either.</p>
</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Sending</p>
<p style="text-align: justify;">Due to the fact that "empty packets" don't supply any additional information, the only data being transferred is the internal _PORT_HEADER structure. The address of a correctly initialized PortHeader should be then passed as the first CsrClientCallServer parameter. The shared section doesn't take part while sending and handling these packets. What is more, no serious input validation is required by the API handler, because there is no input in the first place. The routine is most often supposed to perform one, certain action and then return. Unsupported APIs, statically returning the STATUS_UNSUCCESSFUL or STATUS_NOT_SUPPORTED error codes, can also be considered "empty packets", as they always behave the same way, regardless of the input information.</p>
</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Examples</p>
<p style="text-align: justify;">One, great example of an empty-packet is winsrv!SrvCancelShutdown. As the name implies, the APIs purpose is pretty straight-forward - cancelling the shutdown. Seemingly, no input / output arguments are necessary:</p>
<pre>
<pre class="brush: php">; __stdcall SrvCancelShutdown(x, x)
_SrvCancelShutdown@8 proc near
  call    _CancelExitWindows@0 ; CancelExitWindows()
  neg     eax
  sbb     eax, eax
  and     eax, 3FFFFFFFh
  add     eax, 0C0000001h
  retn    8
_SrvCancelShutdown@8 endp</pre>
</pre>
<p>As shown above, the handler issues a call to the CancelExitWindows() function, and doesn't make use of any of the two parameters. Another CsrApi function of this kind is basesrv!BaseSrvNlsUpdateCacheCount, always performing the same task:</p>
<pre>
<pre class="brush: php">; __stdcall BaseSrvNlsUpdateCacheCount(x, x)
_BaseSrvNlsUpdateCacheCount@8 proc near
  cmp     _pNlsRegUserInfo, 0
  jz      short loc_75B28AFC
  push    esi
  mov     esi, offset _NlsCacheCriticalSection
  push    esi
  call    ds:__imp__RtlEnterCriticalSection@4 ; RtlEnterCriticalSection(x)
  mov     eax, _pNlsRegUserInfo
  inc     dword ptr [eax+186Ch]
  push    esi
  call    ds:__imp__RtlLeaveCriticalSection@4 ; RtlLeaveCriticalSection(x)
  pop     esi
loc_75B28AFC:
  xor     eax, eax
  retn    8
_BaseSrvNlsUpdateCacheCount@8 endp</pre>
</pre>
<p>A few more examples can be found - looking for these is left as an exercise for the reader.</li>
</ul>
<h4 style="text-align: justify;">Short packets</h4>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Description</p>
<p style="text-align: justify;">The "short packets" group describes a great part of the Csr messages. Every request, passing actual data to / from CSRSS but fitting in the LPC-packet length restriction belongs to this family. And so - most fixed-size (i.e. these, that don't contain volatile text strings or other, possibly long chunks of data) structures are indeed smaller than the 304-byte limitation.</p>
</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Sending</p>
<p style="text-align: justify;">As this particular type requires additional data to be appended at the end of the _PORT_MESSAGE structure, a set of API-specific structs has been created. All of these types begin with the standard LPC PortMessage header, and then specify the actual variables to send, e.g.:</p>
<pre>
<pre class="brush: c">struct CSR_MY_STRUCTURE
{
  struct _PORT_HEADER PortHeader;
  BOOL  Boolean;
  ULONG Data[0x10];
  DWORD Flags;
};</pre>
</pre>
<p>Such amount of data can be still sent in a single LPC packets. And so, a custom structure, beginning with the _PORT_HEADER field must be used as a first CsrClientCallServer argument. The Capture Buffer technique remains unused, thus the second parameter should be set to NULL.</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Examples</p>
<p style="text-align: justify;">As for the examples, it is really easy to list a couple:</p>
<ol>
<li>winsrv!SrvGetConsoleAliasExesLength</li>
<li>winsrv!SrvSetConsoleCursorMode</li>
<li>winsrv!SrvGetConsoleCharType</li>
<li>basesrv!BaseSrvExitProcess</li>
<li>basesrv!BaseSrvBatNotification</li>
</ol>
<p>The above handlers take a constant number of bytes as the input, and  optionally return some data (of static length, as well).</li>
</ul>
<h4 style="text-align: justify;">Long packets</h4>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Description</p>
<p style="text-align: justify;">From the researcher's point of view, the "long packets" group is doubtlessly the most interesting one. Due to the fact that they are used to send/receive large amounts of data (beyond the maximum size of a LPC message), a special mechanism called a Shared Section is used for transferring these messages. Let's take a look at the details.</p>
</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Initialization</p>
<p style="text-align: justify;">Do you remember the ntdll!CsrpConnecToServer function? At some point, between forming the port name and establishing the connection, we could see a weird NtCreateSection(0x10000) call. As it turns out, this section is a special memory area, mapped in both the client and server processes. After creating the section, its handle is passed to CSRSS through the NtSecureConnectPort native call. Once the win32 subsystem receives a connection request and accepts it, the section is mapped into the server's virtual address space. Next then, CSRSS provides its client with some basic memory mapping information - such as the server-side base address and view size. Based on the supplied info, a few global variables are initialized (CsrProcessId, CsrObjectDirectory), with CsrPortMemoryRemoteDelta being the most important one for us:</p>
<pre>
<pre class="brush: c">CsrPortMemoryRemoteDelta = (CSRSS.BaseAddress - LOCAL.BaseAddress);</pre>
</pre>
<p style="text-align: justify;">Basically, the above variable is filled with the distance between the server- and user- mappings of the shared memory. This information is going to appear to be crucial to exchange information, soon. Furthermore, a commonly known structure called "heap" is created on top of the allocation:</p>
<pre>
<pre class="brush: c">CsrPortHeap = RtlCreateHeap(0x8000u, LOCAL.BaseAddress, LOCAL.ViewSize, PageSize, 0, 0);</pre>
</pre>
<p style="text-align: justify;">From this point on, the shared heap is going to be used thorough the whole communication session, for passing data of various size and content. The functions taking advantage of the heap are:</p>
<ol>
<li>CsrAllocateCaptureBuffer</li>
<li>CsrFreeCaptureBuffer</li>
<li>CsrAllocateMessagePointer (indirect)</li>
<li>CsrCaptureMessageBuffer   (indirect)</li>
<li>CsrCaptureMessageString   (indirect)</li>
</ol>
<p>All of the above routines are apparently related to the "Capture Buffer" mechanism, described in the following section.</p>
<ul>
<li>
<p style="text-align: justify;">Capture Buffers</p>
<p style="text-align: justify;">In order to fully understand the idea behind Capture Buffers, one should see it as a special box, a container designed to hold data in such a way, that it can be easily accessed by both sides of the communication (i.e. be offset-based rather than VA-based etc). Such structure is determined by the following characteristics:</p>
<ol>
<li>
<p style="text-align: justify;">Number of memory blocks: one Capture Buffer is able to hold mulitple data blocks - e.g. a couple of strings, describing a specific object (like a console window).</p>
</li>
<li>
<p style="text-align: justify;">Total size: the total size of the container, including its header, pointer table, and the data blocks themselves.</p>
</li>
</ol>
<p style="text-align: justify;">So - these "data boxes" are used to transfer data between the two parties. In order to illustrate this complex the mechanism, suppose we've got the following structure:</p>
<pre>
<pre class="brush: c">struct CSR_MESSAGE
{
  _PORT_HEADER PortHeader;
  LPVOID FirstPointer;
  LPVOID SecondPointer;
  LPVOID ThirdPointer;
  LPVOID ForthPointer;
  LPVOID FifthPointer;
} m;</pre>
</pre>
<p style="text-align: justify;">The above packet is going to be sent to CSRSS after the initialization takes place. Having the above declared, we can take a closer look at each of the CA-related functions:</p>
<ol>
<li>
<pre>CsrAllocateCaptureBuffer(ULONG PointerCount, ULONG Size)</pre>
<p style="text-align: justify;">Allocates an adequate number of bytes from CsrHeap:</p>
<pre>(Size + sizeof(CAPTURE_HEADER) + PointerCount*sizeof(LPVOID))</pre>
<p style="text-align: justify;">... and returns the resulting pointer to the user. Right after the allocation, the CaptureBuffer structure contents look like this:</p>
<pre>CaptureBuffer = AllocateCaptureBuffer(5,20);</pre>
<p><img class="alignnone" title="Original CaptureBuffer" src="http://j00ru.vexillium.org/blog/27_07_10/CaptureBuffer.png" alt="" width="572" height="87" /></p>
<p style="text-align: justify;">Due to the fact that no messages have been allocated from the CaptureBuffer yet, <em>Capture.Memory</em> is a single memory block, while the <em>Capture.Pointers[]</em> array remains empty.</p>
</li>
<li>
<pre>CsrFreeCaptureBuffer(LPVOID CaptureBuffer)</pre>
<p style="text-align: justify;">Frees a given CaptureBuffer memory area, by issuing a simple call:</p>
<pre>
<pre class="brush: php">RtlFreeHeap(CsrHeap,0,CaptureBuffer);</pre>
</pre>
</li>
<li>
<pre>CsrAllocateMessagePointer(LPVOID CaptureBuffer, ULONG Length, PVOID* Pointer)</pre>
<p style="text-align: justify;">The routine allocates "Length" bytes from the CaptureBuffer's general memory block. The address of the newly allocated block is stored inside *Pointer, while Pointer is put into one of the Capture.Pointers[] items.</p>
<p>Example:</p>
<pre>
<pre class="brush: php">CsrAllocateMessagePointer(CaptureBuffer,3,&amp;amp;m.FirstPointer);</pre>
</pre>
<p><img class="alignnone" src="http://j00ru.vexillium.org/blog/27_07_10/CaptureBuffer2.png" alt="" width="570" height="141" /></li>
<li>
<p style="text-align: justify;">Having three (out of twenty) bytes allocated, one can copy some data:</p>
<pre>
<pre class="brush: php">RtlCopyMemory(m.FirstPointer,&quot;\xcc\xcc\xcc&quot;,3);</pre>
</pre>
<p>After all of the five allocations are made, the CaptureBuffer structure layout can look like this:</p>
<p><img class="alignnone" src="http://j00ru.vexillium.org/blog/27_07_10/CaptureBuffer3.png" alt="" width="570" height="141" /></p>
<p>It is important to keep in mind that the pointers into CaptureBuffer.Memory[] must reside in the actual LPC message being sent to the server - the reason of this requirement will be disclosed, soon <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </li>
<li>
<pre>CsrCaptureMessageBuffer(LPVOID CaptureBuffer, PVOID Buffer, ULONG Length, PVOID *OutputBuffer)</pre>
<p style="text-align: justify;">The routine is intended to simplify things for the developer, by performing the CaptureBuffer-allocation and copying the user specified data at the same time.</p>
<p>Pseudocode:</p>
<pre>
<pre class="brush: php"> CsrAllocateMessagePointer(CaptureBuffer,Length,OutputBuffer);
 RtlCopyMemory(*OutputBuffer,Buffer,Length);</pre>
</pre>
</li>
<li>
<pre>CsrCaptureMessageString(LPVOID CaptureBuffer, PCSTR String, ULONG Length, ULONG MaximumLength, PSTRING OutputString)</pre>
<p style="text-align: justify;">Similar to the previous routine - allocates the requested memory space, and optionally copies a specific string into the new allocation.</p>
</li>
</ol>
<p style="text-align: justify;">After the Capture Buffer is allocated and initialized (all N memory blocks are in use), it's time to send the message, already! This time, we fill in the second parameter of the CsrClientCallServer routine with our CaptureBuffer pointer. When the following call is issued:</p>
<pre>
<pre class="brush: php">CsrClientCallServer(&amp;amp;m,CaptureBuffer,API_NUMBER,sizeof(m)-sizeof(_PORT_HEADER));</pre>
</pre>
<p style="text-align: justify;">... and the 2nd argument is non-zero, a couple of interesting conversions are taking place in the above routine. This is the time when the CsrPortMemoryRemoteDelta value comes into play. First of all, the data-pointers residing in the CSR_MESSAGE structure (&amp;m) are translated to a server-compatible virtual address, by adding the RemoteDelta. From now on, the m.FirstPointer, m.SecondPointer, ..., m.FifthPointer are invalid in the context of the local process, but are correct in terms of server-side memory mapping.</p>
<pre>
<pre class="brush: php">for( UINT i=0;iPointerCount;i++ )
  *CaptureBuffer.Pointers[i] += CsrPortMemoryRemoteDelta;</pre>
</pre>
<p style="text-align: justify;">Furthermore, the CaptureBuffer.Pointers[] array is altered, using the following pseudo-code:</p>
<pre>
<pre class="brush: php">for( UINT i=0;iPointerCount;i++ )
   CaptureBuffer.Pointers[i] -= &amp;amp;m;</pre>
</pre>
</li>
</ul>
<p style="text-align: justify;">So, to sum everything up - after the address/offset translation is  performed, we've got the following connection between the LPC message  and shared buffer:</p>
<ul>
<li>m.CaptureBuffer points to the server's virtual address of the CaptureBuffer base,</li>
<li>CaptureBuffer-&gt;Pointers[] contain the relative offsets of the data pointers, i.e. (&amp;m+CaptureBuffer-&gt;Pointers[0]) is the pointer to the first capture buffer,</li>
<li>(&amp;m+CaptureBuffer-&gt;Pointers[n]) points to the server's virtual address of the n-th capture buffer.</li>
</ul>
<p style="text-align: justify;">Or, the same connection chain illustrated graphically looks like this:</p>
<p><img class="alignnone" src="http://j00ru.vexillium.org/blog/27_07_10/CaptureBuffer4.png" alt="" width="654" height="132" /></p>
<p>When both the local CSR_MESSAGE and shared CaptureBuffer structures are properly modified, <span style="text-decoration: underline;">ntdll!CsrClientCallServer</span> calls the standard <a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtRequestWaitReplyPort.html">NtRequestWaitReplyPort</a> LPC function, and waits for an optional output. When the native calls returns, all of the modified struct fields are restored to their original values, so that the user (or, more likely - win32 APIs) can easily read the error code and optional subsytem's output.</p>
<p>Due to the fact that the VA- and offset-related conversions are non-trivial to be explained in words, I strongly advice you to check the information presented in this post by yourself. This should give you even better insight at how the cross-process data exchange reliability is actually achieved.</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Sending</p>
<p style="text-align: justify;">What's been already described - if one wants to make use of large data transfers, he must allocate a CaptureBuffer, specifying the number of memory blocks and the total byte count, fill it with the desired data (using CsrCaptureMessageBuffer or CsrCaptureMessageString), and call the CsrClientCallServer, supplying an LPC structure, (containing the data-pointers into CaptureBuffer) as the first parameter, and the CaptureBuffer itself - as the second one. The rest of the job is up to ntdll. Please keep in mind that one CaptureBuffer can be technically utilized only once - and therefore, it should be freed after its first (and last) usage, using CsrFreeCaptureBuffer.</p>
</li>
</ul>
<ul style="text-align: justify;">
<li>
<p style="text-align: justify;">Examples</p>
<p style="text-align: justify;">In this particular case, every CsrApi handler using the CsrValidateMessageBuffer import makes a good example, let it be:</p>
<ul>
<li>winsrv!SrvAllocConsole</li>
<li>winsrv!SrvSetConsoleTitle</li>
<li>winsrv!SrvAddConsoleAlias</li>
</ul>
<p>... and numerous other functions, which are pretty easy to find by oneself.</li>
</ul>
<h3 style="text-align: justify;">Conclusion</h3>
<p style="text-align: justify;">This post entry aimed to briefly present the "Native Csr Interface" - both in terms of the functions, structures and mechanisms playing some role in the Inter-Process Communication. As you must have noted, only client-side perspective has been described here, as the precise way of CSRSS receiving, handling and responding to the request is a subject for another, long article (or two). And so - if you feel like some important Csr~ routines should have been described or mentioned here - let me know. On the other hand, I am going to cover the remaining, smaller functions (such as CsrGetProcessId) in one, separate post called <span style="text-decoration: underline;">CSRSS Tips &amp; Tricks</span>.</p>
<p style="text-align: justify;">Watch out for (part 3/3) and don't hesitate to leave constructive comments! <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /><br />
Cheers!</p>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=527</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Blog customization, old PHP advisories</title>
		<link>http://j00ru.vexillium.org/?p=563</link>
		<comments>http://j00ru.vexillium.org/?p=563#comments</comments>
		<pubDate>Tue, 20 Jul 2010 00:09:11 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[C]]></category>
		<category><![CDATA[Other]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[hacking]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=563</guid>
		<description><![CDATA[Hey there! Today, I would like to post a less-technical text, discussing two issues I have recently came across, or been busy with; don't worry though, as CSRSS Write-Up: IPC (part 2/3) is on the way. The first matter is about recent changes applied to the blog appearance and functionality, while the latter regards the [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Hey there!<br />
Today, I would like to post a less-technical text,  discussing two issues I have recently came across, or been busy with; don't worry though, as <span style="text-decoration: underline;">CSRSS Write-Up: IPC (part 2/3)</span> is on the way. The first matter is  about recent changes applied to the blog appearance and functionality,  while the latter regards the results of a source-code audit performed by  me and my <a href="http://hispasec.com/">Hispasec</a> colleagues (<a href="http://gynvael.coldwind.pl/">Gynvael Coldwind</a> and <a href="http://icewall.pl">Icewall</a>) something like a year ago (last summer <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> ).</p>
<p><span id="more-563"></span></p>
<p style="text-align: justify;"><img class="alignleft" style="margin-right: 10px;" title="New background" src="http://j00ru.vexillium.org/blog/20_07_10/bg_small.jpg" alt="" width="200" height="150" />As you must have already noticed (unless using a RSS reader), the monolithic, light blue (otherwise known as <span style="color: #00ffff;">cyan</span>) background color has been replaced with a  stylish, purple image sized 1680 x 1260. Due to the fact that I've been encountering some major compatibility problems (scroll-lagging, incompatibility with certain browsers), the picture does not cover the entire area in case a higher resolution is used. Should you experience any problems (e.g. wrong rendering) with the current appearance, please let me know ASAP (specifying the browser name/version by the way). Comments with opinions regarding the current style are also welcome! Furthermore, I decided to make use of the <span style="text-decoration: underline;">"display summary"</span> setting, so that the latest post entries are printed in a shortened form, instead of flooding the screen with entire contents. The behavior applies to other functionalities, such as the search engine, as well. Hopefully, this will improve the general blog readability <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p style="text-align: justify;">As for the second part of the post - a few days ago, I have recalled of a small, experimental research performed by the HSPL team roughly a year ago. Having the <a href="http://www.amazon.com/Art-Software-Testing-Glenford-Myers/dp/0471043281#"><em>The</em><em> </em><em>Art of Software Testing</em></a> (by Glenford J. Myers) book read, we decided to try one of the presented code auditing approaches in practice, on real software. As it was closely related to the vulnerability-hunting subject, a popular, yet open-source project was about to be focused on. After a few minutes of live discussion, it became clear that the target would be none other than PHP! (or more precisely, one of its default extensions, such as <a href="http://php.net/manual/en/book.exif.php">php_exif</a>).</p>
<p style="text-align: justify;">The general idea behind the auditing technique is fairly simple - once the adequate number of copies of the target source code is printed on a piece of paper (i.e. one for each person taking part), one of the participants starts reading the source code <strong>aloud</strong>, <strong>line by line</strong>, starting from the attacker's entry point (like an exported routine, available to the .php script). In the meantime, the rest of the team exchanges their thoughts and tries to think of every possible inconsistency in the current code row - when one is found, the general analysis is suspended for the time of testing a potential vulnerability. If a function call is encountered, the <em>execution</em> position is written down on some paper, and the analysis is continued from the beginning of the new routine. What's more, the technique could be employed with a monitor together with a text-editor fired up (making some of the activities much easier), as well; the choice is up to your crew.</p>
<p style="text-align: justify;">The actual efficiency is obviously different for various projects - keep in mind that every open-source application follows its own set of coding / security rules and possibly imposes certain types of security flaws. Besides, I highly recommend every bug-hunter to make an effort and test this approach on himself - it is a very informative experience, in my opinion. When it comes to our results,  we ended up with three different ways of crashing the latest (at the time of the research) version of PHP on every software platform, plus another three minor <em>memory disclosure</em> flaws. All of these were related to image format parsing (TIFF and JPEG) functionalities. After sending the explanatory advisories to the PHP devs, the bugs were eventually fixed - however, no real credit has been given to our work. Shit happens.</p>
<p style="text-align: justify;">If you are highly interested in the PHP engine/extension code quality, feel free to take a look at the provided advisory documents.</p>
<p style="text-align: justify;">A complete package, containing a bunch of short, technical papers (there are six in total) can be found <a href="http://j00ru.vexillium.org/blog/20_07_10/php_adv.zip"><strong>here</strong></a><strong> (ZIP, 11kB)</strong>.<br />
Sorry for not providing the PoCs - these are way too dirty to see the public light <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p style="text-align: justify;">So... that's it! Have fun reading the docs, necessarily leave a comment on the blog appearance and... see you in a next post!</p>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=563</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Windows CSRSS Write Up: Inter-process Communication (part 1/3)</title>
		<link>http://j00ru.vexillium.org/?p=502</link>
		<comments>http://j00ru.vexillium.org/?p=502#comments</comments>
		<pubDate>Tue, 13 Jul 2010 16:19:56 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[CSRSS]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Ring3]]></category>
		<category><![CDATA[Undocumented API]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=502</guid>
		<description><![CDATA[In the second post of the Windows CSRSS Write Up series, I would like to explain how the practical communication between the Windows Subsystem and user's process takes place under the hood. Due to the fact that some major improvements have been introduced in Windows Vista and later, the entire article is split into two [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">In the second post of the <span style="text-decoration: underline;">Windows CSRSS Write Up</span> series, I would like to explain how the <em>practical</em> communication between the Windows Subsystem and user's process takes place under the hood. Due to the fact that some major improvements have been introduced in Windows Vista and later, the entire article is split into two parts - the first one giving an insight at what the communication channel really is, as well as how is it taken advantage of by both CSRSS and a user processes. The second one, on the other hand, is going to talk through the modifications and new features shipped with the Windows systems starting from Vista, as most of the basic ideas remain the same for decades. As you already know what to expect, proceed to the next section <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p><span id="more-502"></span></p>
<h3 style="text-align: justify;">Local Procedure Calls</h3>
<p style="text-align: justify;">Before starting to analyze the mystery API interface implemented by CSRSS (otherwise known as CsrApi), one must first get some basic knowledge regarding the internal mechanism, used to establish a stable inter-process connection and actually exchange information.</p>
<h4 style="text-align: justify;">The basics</h4>
<p style="text-align: justify;">LPC is a packet-based, inter-process communication mechanism implemented in the NT kernel (supported since the very first Windows NT versions - most likely 3.51). The mechanism was originally designed so that it was possible to communicate between modules running in different processor privilege levels - i.e. <span style="text-decoration: underline;">process - process</span>, <span style="text-decoration: underline;">process - driver</span> and <span style="text-decoration: underline;">driver - driver</span> connections are equally well supported. This is possible thanks to the fact that the required API functions are exposed to both user-mode (via ntdll.dll) and kernel-mode (via ntoskrnl.exe). Even though we are mostly concerned by the first scenario (where numerous ring-3 processes communicate with csrss.exe), practical examples of the remaining two also exist - let it be the <em>Kernel Mode Security Support Provider Interface </em>(KSecDD.sys) communicating with LSASS.exe, for instance. Apart from being used by certain system processes talking to each other (e.g. Lsass verifying user credentials on behalaf of  Winlogon), LPC is also a part of the RPC (<a href="http://msdn.microsoft.com/en-us/library/aa378651%28VS.85%29.aspx">Remote Procedure Call</a>) implementation.</p>
<p style="text-align: justify;">What should be also noted is that the LPC mechanism is directed towards synchronous communication, and therefore enforces a blocking scheme, where the client must wait until its request is dispatched and handled, instead of continuing its execution. As mentioned in the Introduction section, Windows Vista has brought some major changes in this matter - one of these changes was the implementation of a brand new mechanism called ALPC (standing for <em>Advanced</em> or <em>Asynchronous</em> LPC - <a href="http://j00ru.vexillium.org/?p=349#comment-1503">which one?</a>), deprecating the old LPC mechanism. Since then, all the <span style="text-decoration: underline;">client - server</span> requests are performed in an asynchronous manner, so that the client is not forced to wait for the response, for ages.</p>
<h4 style="text-align: justify;">Underlying port objects</h4>
<p style="text-align: justify;">As it turns out, a great part of the Windows system functionalities internally rely on special, dedicated <em>objects </em>(implemented by the <a href="http://en.wikipedia.org/wiki/Object_Manager_%28Windows%29">Object Manager</a>) - let it be File System operations, Windows Registry management, <a href="http://j00ru.vexillium.org/?p=118">thread suspension</a> or whatever you can think of - the LPC mechanism isn't any different. In this particular case, we have to deal with a <em>port object</em>, otherwise known as LpcPortObjectType. The OBJECT_TYPE structure, describing the object in consideration, is defined as follows:</p>
<pre style="padding-left: 30px; text-align: justify;">kd&gt; dt _OBJECT_TYPE 81feca90 /r
ntdll!_OBJECT_TYPE
   +0x000 Mutex            : _ERESOURCE
   +0x038 TypeList         : _LIST_ENTRY [ 0x81fecac8 - 0x81fecac8 ]
<strong>   +0x040 Name             : _UNICODE_STRING "Port"</strong>
     +0x000 Length           : 8
     +0x002 MaximumLength    : 0xa
<strong>     </strong>+0x004 Buffer           : 0xe1007110  "Port"
   +0x048 DefaultObject    : 0x80560960 Void
   +0x04c Index            : 0x15
   +0x050 TotalNumberOfObjects : 0xdb
   +0x054 TotalNumberOfHandles : 0xd9
   +0x058 HighWaterNumberOfObjects : 0xdb
   +0x05c HighWaterNumberOfHandles : 0xd9
   +0x060 TypeInfo         : _OBJECT_TYPE_INITIALIZER
      +0x000 Length           : 0x4c
      +0x002 UseDefaultObject : 0x1 ''
      +0x003 CaseInsensitive  : 0 ''
      +0x004 InvalidAttributes : 0x7b2
      +0x008 GenericMapping   : _GENERIC_MAPPING
<strong>      +0x018 ValidAccessMask  : 0x1f0001</strong>
      +0x01c SecurityRequired : 0 ''
      +0x01d MaintainHandleCount : 0 ''
      +0x01e MaintainTypeList : 0 ''
     <strong> +0x020 PoolType         : 1 ( PagedPool )</strong>
     <strong> +0x024 DefaultPagedPoolCharge : 0xc4
</strong>     <strong> +0x028 DefaultNonPagedPoolCharge : 0x18</strong>
      +0x02c DumpProcedure    : (null)
      +0x030 OpenProcedure    : (null)
      +0x034 CloseProcedure   : 0x805904f3        void  nt!ObReferenceObjectByName+0
      +0x038 DeleteProcedure  : 0x805902e1        void  nt!ObReferenceObjectByName+0
      +0x03c ParseProcedure   : (null)
      +0x040 SecurityProcedure : 0x8056b84f        long  nt!CcUnpinDataForThread+0
      +0x044 QueryNameProcedure : (null)
      +0x048 OkayToCloseProcedure : (null)
 +0x0ac Key              : 0x74726f50
 +0x0b0 ObjectLocks      : [4] _ERESOURCE
</pre>
<p style="text-align: justify;">This object can be considered a specific <em>gateway</em> between two modules - it is being used by both sides of the communication channel, while not <em>seeing</em> each other directly at the same time. More precisely, the subject of our considerations are <span style="text-decoration: underline;"><em>named</em></span> ports, only; this is caused by the fact that the object must be easily accessible for every possible process.</p>
<p style="text-align: justify;">After the server correctly initializes a named port object - later utilized by the clients - it waits for an incoming connection. When a client eventually decides to connect, the server can verify whether further communication should or shouldn't be allowed  (usually based on the client's <a href="http://www.nirsoft.net/kernel_struct/vista/CLIENT_ID.html">CLIENT_ID</a> structure). If the request is accepted, the connection is considered <em>established</em> - the client is able to send input messages and optionally wait for a response (depending on the packet type).</p>
<p style="text-align: justify;">Every single packet exchanged between the client and server (including the initial connection requests) begins with a PORT_MESSAGE structure, of the following definition:</p>
<pre style="padding-left: 30px; text-align: justify;"> //
 // LPC Port Message
 //
 typedef struct _PORT_MESSAGE
 {
   union
   {
     struct
     {
       CSHORT DataLength;
       CSHORT TotalLength;
     } s1;
     ULONG Length;
   } u1;

   union
   {
     struct
     {
       CSHORT Type;
       CSHORT DataInfoOffset;
     } s2;
     ULONG ZeroInit;
   } u2;

   union
   {
     LPC_CLIENT_ID ClientId;
     double DoNotUseThisField;
   };

   ULONG MessageId;

   union
   {
     LPC_SIZE_T ClientViewSize;
     ULONG CallbackId;
   };
 } PORT_MESSAGE, *PPORT_MESSAGE;</pre>
<p style="text-align: justify;">
<p style="text-align: justify;">The above header consist of the most essential information concerning the message, such as:</p>
<ul style="text-align: justify;">
<li><strong>DataLength</strong><br />
Determines the size of the buffer, following the header structure (in bytes)</li>
<li><strong>TotalLength</strong><br />
Determines the entire size of the packet, must be equal sizeof(PORT_MESSAGE) + DataLength</li>
<li><strong>Type</strong><br />
Specifies the packet type, can be one of the following:</p>
<pre> //
 // LPC Message Types
 //
 typedef enum _LPC_TYPE
 {
   LPC_NEW_MESSAGE,
   LPC_REQUEST,
   LPC_REPLY,
   LPC_DATAGRAM,
   LPC_LOST_REPLY,
   LPC_PORT_CLOSED,
   LPC_CLIENT_DIED,
   LPC_EXCEPTION,
   LPC_DEBUG_EVENT,
   LPC_ERROR_EVENT,
   LPC_CONNECTION_REQUEST,
   LPC_CONNECTION_REFUSED,
   LPC_MAXIMUM
 }  LPC_TYPE;
</pre>
</li>
</ul>
<ul style="text-align: justify;">
<li><strong>ClientId</strong><br />
Identifies the packet sender by Process ID and Thread ID</li>
<li><strong>MessageId</strong><br />
A unique value, identifying a specific LPC message</li>
</ul>
<p style="text-align: justify;">Due to the fact that LPCs can be used to send both small and large  amounts of data - two, distinct mechanisms of passing memory between the  client and server were developed. In case 304 or less bytes are  requested to be sent, a special LPC buffer is used and sent together  with the header (described by <em>Length </em>and<em> DataLength</em>), while greater messages are passed using shared memory  sections, mapped in both parties taking part in the data exchange.</p>
<h4 style="text-align: justify;">LPC Api</h4>
<p style="text-align: justify;">Due to the fact that LPC is an internal, undocumented mechanism (mostly employed by the system executables), one cannot make use of it based on the win32 API alone. However, a set of LPC-management native routines is exported by the ntdll module; using these functions, one is able to build his own LPC-based protocol and use it on his own favor (e.g. as a  fast and convenient IPC technique). A complete list of the Native Calls follows:</p>
<ol style="text-align: justify;">
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtCreatePort.html">NtCreatePort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtConnectPort.html">NtConnectPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtListenPort.html">NtListenPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtAcceptConnectPort.html">NtAcceptConnectPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtCompleteConnectPort.html">NtCompleteConnectPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtRequestPort.html">NtRequestPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtRequestWaitReplyPort.html">NtRequestWaitReplyPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtReplyPort.html">NtReplyPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtReplyWaitReplyPort.html">NtReplyWaitReplyPort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtReplyWaitReceivePort.html">NtReplyWaitReceivePort</a></li>
<li><a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtImpersonateClientOfPort.html">NtImpersonateClientOfPort</a></li>
<li>NtSecureConnectPort</li>
</ol>
<p style="text-align: justify;">The above list is somewhat correspondent to the cross-ref table for _LpcPortObjectType (excluding <a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Port/NtQueryInformationPort.html">NtQueryInformationPort</a>, <a href="http://undocumented.ntinternals.net/UserMode/Undocumented%20Functions/NT%20Objects/Thread/NtRegisterThreadTerminatePort.html">NtRegisterThreadTerminatePort </a>and a couple of other routines).. All of the functions are more or less documented by independent researchers, Tomasz Nowak and Bo Branten - a brief description of each export is available on the net, though most of the symbols speak by themselves anyway. Having the function names, let's take a look at how the functions can be actually taken advantage of!</p>
<h4 style="text-align: justify;">Server - Setting up a port</h4>
<p style="padding-left: 30px; text-align: justify;">In order to make the server <em>reachable</em> for client modules, it must create <em>Named Port</em> by calling NtCreatePort (specyfing the object's name and an optional security descriptor):</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS NTAPI
<strong>NtCreatePort</strong>
 (OUT PHANDLE PortHandle,
  IN POBJECT_ATTRIBUTES ObjectAttributes,
  IN ULONG MaxConnectInfoLength,
  IN ULONG MaxDataLength,
  IN OUT PULONG Reserved OPTIONAL );
</pre>
<p style="padding-left: 30px; text-align: justify;">
<p style="padding-left: 30px; text-align: justify;">When the LPC port is successfully created, it becomes visible to other, external modules - potential clients.</p>
<h4 style="text-align: justify;">Server - Port Listening</h4>
<p style="padding-left: 30px; text-align: justify;">In order to accept an inbound connection, the server starts listening on  the newly created port, awaiting for the clients. This is achieved using a NtListenPort routine of the following definition:</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
NTAPI
<strong>NtListenPort</strong>
(IN HANDLE PortHandle,
 OUT PLPC_MESSAGE ConnectionRequest);</pre>
<p style="padding-left: 30px; text-align: justify;">
<p style="padding-left: 30px; text-align: justify;">Being dedicated to the synchronous approach, the function blocks the thread and waits until someone tries to make use of the port. And so, while the server is waiting, some client eventually tries to connect...</p>
<h4 style="text-align: justify;">Client - Connecting to a Port</h4>
<p style="padding-left: 30px; text-align: justify;">Knowing that the port has already been created and is currently waiting (residing inside NtListenPort), our client process is able to connect, specifying the port name used during the creation proces. The following function will take care of the rest:</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
NTAPI
<strong>NtConnectPort</strong>
(OUT PHANDLE ClientPortHandle,
 IN PUNICODE_STRING ServerPortName,
 IN PSECURITY_QUALITY_OF_SERVICE SecurityQos,
 IN OUT PLPC_SECTION_OWNER_MEMORY ClientSharedMemory OPTIONAL,
 OUT PLPC_SECTION_MEMORY ServerSharedMemory OPTIONAL,
 OUT PULONG MaximumMessageLength OPTIONAL,
 IN ConnectionInfo OPTIONAL,
 IN PULONG ConnectionInfoLength OPTIONAL );
</pre>
<h4 style="text-align: justify;">Server - Accepting (or not) the connection</h4>
<p style="padding-left: 30px; text-align: justify;">When a client tries to connect at one side of the port, the server's execution track returns from NtListenPort, having the PORT_MESSAGE header filled with information. In particular, the server can access a CLIENT_ID structure, identifying the source process/thread. Based on that data, the server can make the final decision whether to allow or refuse the connection. Whatever option is chosen, the server calls a NtAcceptConnectPort function:</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
NTAPI
<strong>NtAcceptConnectPort</strong>
 (OUT PHANDLE ServerPortHandle,
  IN HANDLE AlternativeReceivePortHandle OPTIONAL,
  IN PLPC_MESSAGE ConnectionReply,
  IN BOOLEAN AcceptConnection,
  IN OUT PLPC_SECTION_OWNER_MEMORY ServerSharedMemory OPTIONAL,
  OUT PLPC_SECTION_MEMORY ClientSharedMemory OPTIONAL );</pre>
<p style="padding-left: 30px; text-align: justify;">
<p style="padding-left: 30px; text-align: justify;">In case of a rejection, the execution ends here. The client returns from the NtConnectPort call with an adequate error code (most likely STATUS_PORT_CONNECTION_REFUSED), and the server ends up calling NtListenPort again. If, however, the server decides to proceed with the connection, another routine must be called:</p>
<pre style="padding-left: 60px; text-align: justify;"> NTSTATUS
 NTAPI
<strong> NtCompleteConnectPort</strong>
 (IN HANDLE PortHandle);</pre>
<p style="padding-left: 30px; text-align: justify;">
<p style="padding-left: 30px; text-align: justify;">After the above function is triggered, our connection is confirmed and read to go!</p>
<h4 style="text-align: justify;">Server - Waiting for a message</h4>
<p style="padding-left: 30px; text-align: justify;">After opening up a communication channel, the server must begin listening for incoming packets (or client-related events). Because of the specific nature of LPC, the server is unable to send messages by itself - rather than that, it must wait for the client to send a request, and then possibly respond with a piece of data. And so, in order to (as always - synchronously) await a message, the server should call the following function:</p>
<pre style="padding-left: 30px; text-align: justify;"> NTSTATUS
 NTAPI
 <strong>NtReplyWaitReceivePort</strong>
 (IN HANDLE PortHandle,
  OUT PHANDLE ReceivePortHandle OPTIONAL,
  IN PLPC_MESSAGE Reply OPTIONAL,
  OUT PLPC_MESSAGE IncomingRequest);
</pre>
<h4 style="text-align: justify;">Client - Sending a message</h4>
<p style="padding-left: 30px; text-align: justify;">Having the connection established, our client is now able to send regular messages, at the time of its choice. Moreover, the application can choose between <span style="text-decoration: underline;">one-side</span> packets and <span style="text-decoration: underline;">interactive</span> requests. By sending the first type of message, the client does not expect the server to reply - most likely, it is a short, informational packet. On the other hand, interactive messages require the server to fill in a return buffer of a given size. These two packet types can be sent using different native calls:</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
NTAPI
<strong>NtRequestPort</strong>
(IN HANDLE PortHandle,
 IN PLPC_MESSAGE Request);</pre>
<p style="padding-left: 30px; text-align: justify;">or</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
NTAPI
<strong>NtRequestWaitReplyPort</strong>
(IN HANDLE PortHandle,
 IN PLPC_MESSAGE Request,
 OUT PLPC_MESSAGE IncomingReply);</pre>
<p style="padding-left: 30px; text-align: justify;">
<p style="padding-left: 30px; text-align: justify;">Apparently, the difference between these two definitions are pretty much obvious <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<h4 style="text-align: justify;">Server - Replying to incoming packets</h4>
<p style="padding-left: 30px; text-align: justify;">In case the client requests data from the server, the latter is obligated to respond providing some output data. In order to do so, the following function should be used:</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
NTAPI
<strong>NtReplyPort</strong>
(IN HANDLE PortHandle,
 IN PLPC_MESSAGE Reply);
</pre>
<h4 style="text-align: justify;">Client - Closing the connection</h4>
<p style="padding-left: 30px; text-align: justify;">When, eventually, the client either terminates or decides to close the LPC connection, it can clean up the connection by simply dereferencing the port object - the NtClose (or better, documented <a href="http://msdn.microsoft.com/en-us/library/ms724211%28VS.85%29.aspx">CloseHandle</a>) native call can be used:</p>
<pre style="padding-left: 60px; text-align: justify;">NTSTATUS
 NTAPI
 <strong>NtClose</strong>(IN HANDLE ObjectHandle);</pre>
<p style="text-align: justify;">
<p style="text-align: justify;">The entire IPC process has already been presented in a visual form - some very illustrative flow charts can be found <a href="http://www.zezula.net/en/prog/lpc.html"><strong>here</strong></a> (LPC Communication) and <a href="http://blogs.msdn.com/b/ntdebugging/archive/2007/07/26/lpc-local-procedure-calls-part-1-architecture.aspx"><strong>here</strong></a> (LPC Part 1: Architecture).</p>
<p style="text-align: justify;">All of the described functions are actually used while maintaining the CSRSS connection - you can check it by yourself! What should be noted though, is that the above summary covers the LPC communication (which can be already used to create an IPC framework), but tells nothing about what data, in particular, is being sent over the named port. Obviously, the Windows Subsystem manages its own, internal communication protocol implemented by both client-side (ntdll.dll) and server-side (csrsrv.dll, winsrv.dll, basesrv.dll) system libraries.</p>
<p style="text-align: justify;">In order to make it more convenient for kernel32.dll to make use of the CSR packets, a special subset of routines dedicated to CSRSS-communication exists in <span style="text-decoration: underline;">ntdll.dll</span>. The list of these functions includes, but is not limited to:</p>
<ol style="text-align: justify;">
<li>CsrClientCallServer</li>
<li>CsrClientConnectToServer</li>
<li>CsrGetProcessId</li>
<li>CsrpConnectToServer</li>
</ol>
<p style="text-align: justify;">Thanks to the above symbols, it is possible for kernel32.dll (and most importantly - us) to send custom messages on behalf of the current process, without a thorough knowledge of the protocol structure. Furthermore, ntdll.dll contains all the necessary, technical information required while talking to CSRSS, such as the port name to connect to. The next post is going to talk over both client- and user- sides of the LPC initialization and usage, as it is practically performed - watch out <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3 style="text-align: justify;">Conclusion</h3>
<p style="text-align: justify;"><img class="alignright" style="margin-left: 10px;" title="WinObj" src="http://j00ru.vexillium.org/blog/13_07_10/winobj.gif" alt="" width="250" height="217" />All in all, a great number of internal Windows mechanisms make use of LPC - both low-level ones, such as the Windows debugging facility or parts of exception handling implementation, as well as high-level capabilities, including user credentials verification performed by LSASS. One can list all of the named (A)LPC port object present in the system using the <a href="http://technet.microsoft.com/en-us/sysinternals/bb896657.aspx">WinObj</a> tool by Windows Sysinternals. It is also highly recommended to create one's own implementation of a LPC-based inter-process communication protocol - a very learning experience. An exemplary source code can be found in the following package: <a href="http://www.zezula.net/download/lpc.zip">link</a>.</p>
<p style="text-align: justify;">Have fun, leave comments and stay tuned for respective entries ;D</p>
<h3 style="text-align: justify;">References</h3>
<ol style="text-align: justify;">
<li><a href="http://www.zezula.net/en/prog/lpc.html">LPC Communication</a></li>
<li><a href="http://www.tar.hu/wininternals/ch03lev1sec6.html">Local Procedure Calls (LPCs)</a></li>
<li><a href="http://blogs.msdn.com/b/ntdebugging/archive/2007/07/26/lpc-local-procedure-calls-part-1-architecture.aspx">LPC Part 1: Architecture</a></li>
<li><a href="http://technet.microsoft.com/en-us/sysinternals/bb896657.aspx">Sysinternals WinObj</a></li>
<li><a href="http://recon.cx/2008/a/thomas_garnier/LPC-ALPC-paper.pdf  ">Windows Privilege Escalation through LPC</a></li>
<li><a href="http://blogs.msdn.com/ntdebugging/archive/tags/lpc/default.aspx">Ntdebugging on LPC interface</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=502</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Windows CSRSS Write Up: the basics (part 1/1)</title>
		<link>http://j00ru.vexillium.org/?p=492</link>
		<comments>http://j00ru.vexillium.org/?p=492#comments</comments>
		<pubDate>Thu, 08 Jul 2010 20:38:14 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[CSRSS]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Ring3]]></category>
		<category><![CDATA[Undocumented API]]></category>
		<category><![CDATA[Windows 7]]></category>
		<category><![CDATA[Windows Vista]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=492</guid>
		<description><![CDATA[NOTE: The following post entry opens a series of CSRSS-oriented articles, aiming at describing the uncovered CSRSS mechanism internals, present in the Windows OS for more than fifteen years now. Although some great research has already been carried out by a few curious guys (check out the references), no thorough case study is available until [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;"><strong>NOTE:</strong> The following post entry opens a series of CSRSS-oriented articles, aiming at describing the uncovered CSRSS mechanism internals, present in the Windows OS for more than fifteen years now. Although some great research has already been carried out by a few curious guys (check out the references), no thorough case study is available until now. In this series, I am going to cover both the very basic ideas and their implementations, as well as the recent CSRSS changes applied in modern operating systems (i.e. Windows 7). And so, just have a good read! <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p><span id="more-492"></span></p>
<h3 style="text-align: justify;">Environment subsystems in Windows</h3>
<p style="text-align: justify;">The general idea behind the environment subsystems is to expose a strictly-defined subset of native functions to the user application. Because of the fact that Windows OS was designed to support both native Windows and POSIX (<a href="http://en.wikipedia.org/wiki/POSIX"><em>Portable Operating System Interface for Unix</em></a>) executables, the developers had to separate API for both types of applications. Each of the supported subsystems consists of two, major parts:</p>
<ul style="text-align: justify;">
<li>The subsystem process - a regular ring-3 application, responsible for handling some of the subsystem-specific functions.</li>
<li>The subsystem DLLs - a special set of system libraries dedicated to a certain subsystem; they provide an additional layer between user programs and the native system calls.</li>
</ul>
<p style="text-align: justify;">As it turns out, the Windows Subsystem - otherwise known as CSRSS (<em>Client/Server Runtime Subsystem</em>) is an obligatory part of the system execution environment; in other words, Windows is unable to work correctly without CSRSS.exe running in the background. Because of the duties bound to CSRSS, the subsystem is required on every  Windows, including the <em>server editions</em> (which doesn't deal with  interactive user sessions). On the other hand, POSIX (<em>psxss.exe</em>) belongs to the <em>optional subsystems</em> group - therefore, the process is started on demand, only.</p>
<p style="text-align: justify;">Some specific configuration data, related to all of the supported subsystems can be found in the following registry key:</p>
<pre style="text-align: justify; padding-left: 30px;">HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems
</pre>
<p style="text-align: justify;">
<p style="text-align: justify;">By default, the subsystem configuration looks similar to the following:</p>
<ul style="text-align: justify;">
<li>
<pre>Name:  Debug
Type:  REG_EXPAND_SZ
Value:</pre>
</li>
<li>
<pre>Name:  Kmode
Type:  REG_EXPAND_SZ
Value: \SystemRoot\System32\win32k.sys</pre>
</li>
<li>
<pre>Name:  Optional
Type:  REG_MULTI_SZ
Value: Posix</pre>
</li>
<li>
<pre>Name:  Posix
Type:  REG_EXPAND_SZ
Value: %SystemRoot%\system32\psxss.exe</pre>
</li>
<li>
<pre>Name:  Required
Type:  REG_MULTI_SZ
Value: Debug Windows</pre>
</li>
<li>
<pre>Name:  Windows
Type:  REG_EXPAND_SZ
Value: %SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows SharedSection=1024,20480,768 Windows=On ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3 ServerDll=winsrv:ConServerDllInitialization,2 ServerDll=sxssrv,4 ProfileC</pre>
</li>
</ul>
<p style="text-align: justify;">Most of us already know the subsystem DLLs, supporting the Windows API - these are basically kernel32.dll, user32.dll, gdi32.dll, advapi32.dll, and many more - a majority of these libraries are used by numerous Windows software developers by their daily routine. When it comes to POSIX, only one module is apparently enough to implement the desired Unix API - that's <span style="text-decoration: underline;">psxdll.dll</span>.</p>
<p style="text-align: justify;">Furthermore, each process is associated with one, certain subsystem; this property is being set by the linker (during the compilation process), and resides in the following PE structure field:</p>
<pre style="text-align: justify; padding-left: 30px;">WORD IMAGE_PE_HEADERS.IMAGE_OPTIONAL_HEADER.Subsystem
</pre>
<p style="text-align: justify;">
<p style="text-align: justify;">while the available constant values are listed below:</p>
<pre style="text-align: justify; padding-left: 30px;">#define IMAGE_SUBSYSTEM_UNKNOWN    0
#define IMAGE_SUBSYSTEM_NATIVE    1
#define IMAGE_SUBSYSTEM_WINDOWS_GUI    2
#define IMAGE_SUBSYSTEM_WINDOWS_CUI    3
#define IMAGE_SUBSYSTEM_OS2_CUI        5
#define IMAGE_SUBSYSTEM_POSIX_CUI    7
#define IMAGE_SUBSYSTEM_NATIVE_WINDOWS    8
#define IMAGE_SUBSYSTEM_WINDOWS_CE_GUI    9
#define IMAGE_SUBSYSTEM_EFI_APPLICATION    10
#define IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER    11
#define IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER    12
#define IMAGE_SUBSYSTEM_EFI_ROM    13
#define IMAGE_SUBSYSTEM_XBOX    14
</pre>
<p style="text-align: justify;">
<p style="text-align: justify;">As can be seen, the Portable Executable format is pretty vast, and can be used to build executables for different processor (Intel x86, Intel x86-64, ARM) and system (Microsoft Windows, WIndows CE, EFI, XBOX) architecture.</p>
<h4 style="text-align: justify;">API division based on the kernel and subsystem involvement</h4>
<p style="text-align: justify;">As it turns out, the subsystem process (such as csrss.exe) doesn't necessarily have to take part in every API call being issued by an user application. Due to the fact that the actual API implementation resides in the Enviroment DLLs, they can decide whether to communicate with the NT Kernel and/or subsystem process or not. In particular, one can distinguish three separate types of API routines:</p>
<ul style="text-align: justify;">
<li>The function is entirely implemented by the Subsystem DLL, hence no signal is being sent to either the kernel or subsystem process. This is the case for a majority of simple <em>helper</em> functions, such as <em><a href="http://msdn.microsoft.com/en-us/library/ms683179%28VS.85%29.aspx">GetCurrentProcess</a> </em>or <a href="http://msdn.microsoft.com/en-us/library/ms683182%28VS.85%29.aspx"><em>GetCurrentThread</em></a> (both of these routines return a constant pseudo-handle), functions which implement ring-3 mechanisms, e.g. <a href="http://msdn.microsoft.com/en-us/library/ms686350%28VS.85%29.aspx"><em>SwitchToFiber</em></a>, or functions implementing a specific processor feature, like <em><a href="http://msdn.microsoft.com/en-us/library/ms683614%28v=VS.85%29.aspx">InterlockedIncrement</a> </em>oraz <a href="http://msdn.microsoft.com/en-us/library/ms683590%28v=VS.85%29.aspx"><em>InterlockedExchange</em></a> (read more about <a href="http://msdn.microsoft.com/en-us/library/ms684122%28v=VS.85%29.aspx">Interlocked Variable Access</a>),</li>
</ul>
<ul style="text-align: justify;">
<li>The function must issue one or more calls into the NT Kernel in order to complete correctly. The functionalities provided by these routines require <em>ring-0</em> privileges (e.g. in order to switch the process context), thus cannot be entirely implemented by the DLL. On the other hand, these functions perform subsystem-unrelated operations, and thus don't have to communicate with the subsystem process. Examples of such functions include <a href="http://msdn.microsoft.com/en-us/library/ms680553%28VS.85%29.aspx">ReadProcessMemory</a> and <a href="http://msdn.microsoft.com/en-us/library/ms681674%28VS.85%29.aspx">WriteProcessMemory</a>, both responsible for operating on memory address space belonging to an external process.</li>
</ul>
<ul style="text-align: justify;">
<li>The function takes advantage of mechanisms implemented by the subsystem process, and must therefore send one or more requests e.g. to csrss.exe. In this case, the client application issues a special packet via some Inter-Process communication mechanism [in this case - (Advanced) <a href="http://blogs.msdn.com/b/ntdebugging/archive/2007/07/26/lpc-local-procedure-calls-part-1-architecture.aspx">Local Procedure Calls</a>], using an internal, undocumented protocol. A great example of this API type can be <a href="http://msdn.microsoft.com/en-us/library/ms682425%28VS.85%29.aspx">CreateProcess </a>or <a href="http://msdn.microsoft.com/en-us/library/ms682437%28VS.85%29.aspx">CreateRemoteThread</a>, as both of them must inform CSRSS about creating a new process/thread; otherwise, the entire function would fail (a win32 process/thread cannot be successfully spawned without letting the subsystem know about this being done).</li>
</ul>
<h3 style="text-align: justify;">CSRSS then</h3>
<p style="text-align: justify;">Surprisingly, on Windows versions prior to NT4, the Windows Subsystem was responsible for performing much more work than it is now. More precisely, both the <em>Window Manager</em> and <em>Graphic Services</em> were a part of that user-mode application! Due to the fact that a special, separate process was used, dedicated to dealing with the window / graphics management, enormous efficiency problems began arising. That's because every time a normal graphics-oriented application sent a request to CSRSS, a bunch of time had to be taken in order to switch the process / thread context between the client and server. Even though the developers were doing their best to optimize the communication process, it was still worse than it could be, having the window management implemented in kernel-mode. In particular, the problem was most noticeable in the context of graphics intense programs - and so, in Windows NT4, a majority of the graphics-related code was moved into a special system driver called win32k.sys, which is available to user-mode applications through the standard system-call mechanism (just like the core NT functions are).</p>
<p style="text-align: justify;">The question is - what has actually remained after replacing CSRSS's main functionality with a <em>ring-0 </em>module?</p>
<h3 style="text-align: justify;">CSRSS now</h3>
<p style="text-align: justify;">Nowadays, the Client/Server Runtime subsystem is responsible for performing a few minor tasks, related to:</p>
<ul style="text-align: justify;">
<li>Console Window management (prior to Windows 7),</li>
<li>Process and Thread list management,</li>
<li>Supporting the 16-bit virtual DOS machine emulation (VDM),</li>
<li>Other, miscellaneous functions, such as GetTempFile, DefineDosDevice, ExitWindows and more.</li>
</ul>
<p style="text-align: justify;">All of the above functionalities are generally implemented by three DLLs (otherwise known as <em>ServerDlls</em>), imported by the csrss.exe image:</p>
<ol style="text-align: justify;">
<li>
<pre>BASESRV.DLL (process and thread lists, VDM support)</pre>
</li>
<li>
<pre>WINSRV.DLL (console management, user services)</pre>
</li>
<li>
<pre>CSRSRV.DLL (misc)</pre>
</li>
</ol>
<p style="text-align: justify;">A complete list of the functions supported on different Windows OS versions can be found <a href="http://j00ru.vexillium.org/csrss_list/api_table.html"><strong>here</strong></a> (sorted by the dispatch table) and <a href="http://j00ru.vexillium.org/csrss_list/api_list.html"><strong>here</strong></a> (by OS version).</p>
<h3 style="text-align: justify;">Windows CSRSS instances</h3>
<p style="text-align: justify;">If we already know that CSRSS is a standard, user-mode application - the question is, when is it spawned, how many instances of the process should be concurrently running in the system, and what security token csrss.exe executes with. Let's answer these questions respectively.</p>
<p style="text-align: justify;">The actual time and number of csrss.exe processes being spawned is different on Windows XP and Windows Vista together with Seven. In the first case, CSRSS is created at boot time, by the Session Manager (smss.exe) process, right after loading the <em>win32k.sys</em> module into kernel memory. The process inherits his parent's security token, and therefore runs with the highest possible - SYSTEM - privileges. After spawning csrss.exe, regular win32 applications (such as explorer.exe, or even winlogon.exe) can be successfully launched. This Windows Subsystem process is only started once, and remains running until the system is shut-down. If, however, csrss.exe is unexpectedly terminated during normal system execution (e.g. due to an unhandled exception crash or manual user termination), the NT kernel detects this issue and manually crash the system by issuing a call to <a href="http://msdn.microsoft.com/en-us/library/ff551961%28VS.85%29.aspx">KeBugCheckEx</a> routine with the CRITICAL_OBJECT_TERMINATION error code. Furthermore, there is only one instance of CSRSS running on a Windows XP box - this process is responsible for handling the requests coming from all active processes, no matter what user they're running under.</p>
<p style="text-align: justify;"><img class="alignleft" title="CSRSS security token" src="http://j00ru.vexillium.org/blog/08_07_10/csrss.png" alt="" width="397" height="35" /></p>
<p style="text-align: justify;">
<p style="text-align: justify;">
<p style="text-align: justify;">When it comes to Windows Vista and later, a quite different approach was adopted. One CSRSS instance is started by the Session Manager, for handling the requests coming from session-0 modules. This process also stays alive until the very end of system execution. Furthermore, whenever an interactive user logs on, another separate csrss.exe is started by a dedicated smss.exe - the new Windows Subsystem instance is again designed to deal with signals sent by applications running under the new user's session. When, in turn, the user decides to log out, CSRSS is politely terminated (though no BSoD is triggered by the kernel this time). All of the active CSRSS instances run with the same, highest user rights, as all of them are launched by the smss.exe process, highly tight up with the NT kernel itself.</p>
<h3 style="text-align: justify;">Conclucions</h3>
<p style="text-align: justify;">In this short post, I have tried to give you a basic insight into the precise way of where csrss.exe is from and how it works internally. As it turns out, it is possible to make use of the Windows Subsystem in the context of various, interesting hacks - but this remains yet to be described. More detailed information can be found in the Windows Internals 5 book (section <em>Environment Subsystems and Subsystem DLLs</em>), and numerous articles and advisories, listed in the following section. Comments - as always - welcome!</p>
<h3 style="text-align: justify;">References</h3>
<ol style="text-align: justify;">
<li>Wikipedia, <a href="http://en.wikipedia.org/wiki/POSIX"><strong>P</strong>ortable <strong>O</strong>perating <strong>S</strong>ystem <strong>I</strong>nterface [for Uni<strong>x</strong>]</a><a href="http://en.wikipedia.org/wiki/POSIX"></a></li>
<li>Wikipedia, <a href="http://en.wikipedia.org/wiki/CSRSS"><strong>C</strong>lient/<strong>S</strong>erver <strong>R</strong>untime <strong>S</strong>ub<strong>s</strong>ystem</a></li>
<li>Wikipedia, <a href="http://en.wikipedia.org/wiki/Windows_NT_startup_process">Windows NT Startup Process</a></li>
<li>Secunia, <a href="http://secunia.com/advisories/23448">Microsoft Windows CSRSS MsgBox Memory Corruption Vulnerability</a></li>
<li>SecurityFocus, <a href="http://www.securityfocus.com/bid/21688/info">Microsoft Windows CSRSS HardError Messages Denial of  Service Vulnerability</a></li>
<li>Ivanlef0u, <a href="http://www.ivanlef0u.tuxfamily.org/?p=395">Win7 and CreateRemoteThread</a></li>
<li>Cesar Cerrudo, <a href="http://www.argeniss.com/research/MSBugPaper.pdf">Story of a dumb patch</a></li>
<li>Mark Russinovich, Inside the Windows Vista kernel: <a href="http://technet.microsoft.com/pl-pl/magazine/2007.02.vistakernel(en-us).aspx">Part 1</a>, <a href="http://technet.microsoft.com/pl-pl/magazine/2007.03.vistakernel%28en-us%29.aspx">Part 2</a>, <a href="http://technet.microsoft.com/pl-pl/magazine/2007.04.vistakernel%28en-us%29.aspx">Part 3</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=492</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Attacking the Host via Remote Kernel Debugger (Virtual Machines)</title>
		<link>http://j00ru.vexillium.org/?p=405</link>
		<comments>http://j00ru.vexillium.org/?p=405#comments</comments>
		<pubDate>Sat, 03 Jul 2010 16:28:00 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Ring3]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>
		<category><![CDATA[malware]]></category>
		<category><![CDATA[rootkit]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=405</guid>
		<description><![CDATA[NOTE: This post is highly related to the research performed by Alex Ionescu. He is going to present the results of his work on the RECON2010 conference, during his Debugger-based Target-to-Host Cross-System Attacks speech. As it turns out, me and Alex have been working on the same subject concurrently - while I have only managed [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;"><strong>NOTE</strong>: This post is highly related to the research performed by <a href="http://www.alex-ionescu.com/">Alex Ionescu</a>. He is going to present the results of his work on the <a href="http://recon.cx/2010/speakers.html#debugger">RECON2010</a> conference, during his <em>Debugger-based Target-to-Host Cross-System Attacks</em> speech. As it turns out, me and Alex have been working on the same subject concurrently - while I have only managed to perform cursory analysis of the mechanism, Alex has carried out a thorough analysis and possibly developed a PoC for a real vulnerability <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Besides this, I would like to share some of my ideas and conclusions which I came up with, during a short period of the recent weeks <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<p><span id="more-405"></span></p>
<h2 style="text-align: justify;">Introduction</h2>
<p style="text-align: justify;"><img class="alignleft" title="Microsoft Virtual PC" src="http://j00ru.vexillium.org/blog/03_07_10/virtualpc.bmp" alt="" width="239" height="164" />Nowadays, all of the classic Virtual Machine products (Microsoft Virtual PC, VMWare Workstation, Oracle VirtualBox) are widely-spread and commonly used for a variety of purposes. That's because of the fact that they provide a reliable and convenient environment for multiple, virtual operating systems, which can be employed by Software Developers, Web Developers, Reverse-Engineers or even people not related to the IT industry, at all. As for one of the most significant advantages of using VMs, is that they are supposed to run in complete separation from the host system. Numerous security professionals take advantage of this assumption by setting up malware-dedicated guest systems, where the samples can be observed and safely debugged at run-time.</p>
<p style="text-align: justify;">As it turns out, however, it might be possible to escape from a live guest OS right into the Host, under certain circumstances. <a href="http://www.microsoft.com/whdc/devtools/debugging/default.mspx">WinDbg</a> - the most commonly used debugger (working in both local and remote mode) developed by Microsoft, is confirmed to be vulnerable to multiple flaws, and/or makes use of vulnerable libraries. These security flaws could potentially lead to information disclosure (in case WinDbg leaks memory chunks to the guest), or even code execution in the context of the remote debugger process (running on the host OS). What should be noted, is that the attack vector is debugger-specific and can only be triggered in very specific conditions (hence being useless without a debugger being attached, in the first place).</p>
<h2 style="text-align: justify;">Background<img class="alignright" title="VMWare" src="http://j00ru.vexillium.org/blog/03_07_10/vmware.bmp" alt="" width="303" height="170" /></h2>
<p style="text-align: justify;">These days, the debugging software seems to be the integral part of any development environment - whenever new software of any kind is being developed, various kinds of errors tend to appear on both logical and implementation levels of the new code. These issues must often have to be dealt with using run-time analysis, which includes stepping through the code, examining and altering parts of the processor context (register, flags) and program state (in-memory variables etc). There's not much of a problem when it comes to user-mode, as Windows kernel provides debugger support (both in kernel- and user-mode parts of the system), and a decent number of ring-3 debuggers is freely available. Things are more interesting in the context of <em>kernel-mode</em>, which also has to be analyzed in many different situations (malware analysts, driver developers, bug-hunters and others). In order to begin the Kernel Debugging process, two (logical) live machines are required - the target (guest) and host computers, most likely connected via the serial port (though USB and IEEE1394 are also supported on the latest Windows versions). When the physical connection is correctly set-up, the <em>Kernel Debugger</em> (also referred to as "KD" later in this post) built-in part of the target system core begins exchanging data with WinDbg (or potentially any software supporting the undocumented communication protocol).</p>
<p style="text-align: justify;">When the guest system is running, the remote debugger resides in the idle state, waiting for one of the following events to happen:</p>
<ul style="text-align: justify;">
<li>A certain event occurs inside the guest, which requires the debugger to be informed about it,</li>
<li>A special (<em>native</em>) request is received from the guest (such as a <em>DbgPrint</em> request, resulting in printing a user-defined message on the debugger console),</li>
<li>The user decides to intercept the OS execution in the middle of its work (e.g. to set a breakpoint),</li>
</ul>
<p style="text-align: justify;">The first situation happens whenever an exception is raised, a breakpoint is reached or the system crashes with <em>a Blue Screen of Death</em>. The second and third will be described in more details later.</p>
<p style="text-align: justify;">In case any of the above conditions is met, the control is passed to the remote debugger, consequently leading to the target OS getting temporarily frozen. When WinDbg's is in control, the user is able to perform actions such as managing (setting, removing, enabling) the breakpoints, modifying memory, examining and altering the processor context and so on. During all the time, when WinDbg is instructed to execute these operations, the guest execution is suspended - the only way for it to get back to work requires the debugger to issue a special <em>Resume</em> packet.  What should be also noted is that most of the debugging features can only be taken advantage of when the debugger resides in the stand-by mode. An exception of this rule is the break-in action (otherwise known as CTRL-C), which can be used any time, in order to break into the debugger.</p>
<p style="text-align: justify;">Due to the fact that a majority of kernel debugging is performed using Virtual Machines, and because most malware analysts use these techniques in their daily work, escaping from the virtual environment by any means (i.e. execute code on the host system) can pose a real threat these days. In this post, I would like to present some of the possible attack scenarios and vectors, which could be potentially used to perform <em>in-the-wild</em> attacks in the future.</p>
<h2 style="text-align: justify;">First attack vector - the KDCOM protocol</h2>
<h3 style="text-align: justify;">The basics</h3>
<p style="text-align: justify;">The overall idea behind the reserach is "wherever data exchange takes place, a security flaw is likely to appear". In this case, the data in consideration is being pas<img class="alignleft" title="Welcome 0x31337!" src="http://j00ru.vexillium.org/blog/03_07_10/31337.png" alt="" width="283" height="209" />sed between <em>WinDbg.exe</em> and the Virtual Machine process (whichever one we choose - it doesn't really matter at this point). Although, before lurking into some advanced communication internals, let's get a quick insight into the basics.</p>
<p style="text-align: justify;">First of all, before trying to perform kernel debugging, one must make sure that the physical (in case of two real computers), or software - e.g. named pipes (in case of a VM) connection can be successfully established. Furthermore, the guest must be also booted up with adequate boot-settings - i.e. the /DEBUG flag set in the <a href="http://support.microsoft.com/kb/314081"><em>boot.ini</em> configuration file</a> for Windows XP and earlier, or using the <a href="http://technet.microsoft.com/en-us/library/ee221031(WS.10).aspx"><em>bcedit.exe </em>utility</a> on Windows Vista and later.</p>
<p style="text-align: justify;">The entire debugging session can be carried out thanks to the packet-based protocol, used by the host and guest to understand each other. As <em>WinDbg</em> is the only application capable of debugging a remote Windows OS, plus Microsoft doesn't intend the things to change, no official documentation has been published by the vendor, so far. On the other hand, some people have put effort in order to understand the proto and create a thorough reference, uncovering most of the technical details. Two, very appreciable articles can be found here: <a href="http://www.vsj.co.uk/articles/display.asp?id=265">Kernel and remote debuggers</a> and here: <a href="http://articles.sysprogs.org/kdvmware/kdcom.shtml">KD extension DLLs &amp; KDCOM protocol</a>, covering the precise way of how the target's kernel deals with the remote debugger. As can be seen, the protocol isn't very complex in general, though supports a great number of cross-system operation codes. Let's find out how the packet structure looks like.</p>
<p style="text-align: justify;">The entire packet consists of two, main parts - the header, let's call it KD_PACKET_HEADER and the packet body, which is specific to the operation type defined in the header. Let's take a look at the header layout, in the form of a C structure:
<pre class="brush: c">typedef struct _KD_PACKET_HEADER
{
	DWORD Signature;
	WORD  PacketType;
	WORD  DataSize;
	DWORD PacketID;
	DWORD Checksum;
	BYTE  PacketBody[1];
} KD_PACKET_HEADER, *PKD_PACKET_HEADER;
</pre>
</p>
<p style="text-align: justify;">Let's focus on each and every of the structure fields:</p>
<ol style="text-align: justify;">
<li><strong>Signature</strong><br />
Can be either <strong>0x30303030</strong> ('0000') or <strong>0x69696969</strong> ('iiii'). The first value represents Data Packets, while the latter one is used for marking Control Packets,</li>
<li><strong>PacketType</strong><br />
Defines the type of the request,</li>
<li><strong>ByteCount</strong><br />
Defines the length of the <em>PacketBody </em>array - containing necessary information - assigned to the request type,</li>
<li><strong>PacketID</strong><br />
Used to detect synchronization issues,</li>
<li><strong>Checksum</strong><br />
A straight-forward sum of all the bytes inside <em>PacketBody </em>(specifically zero, if ByteCount = 0)<em>,</em></li>
<li><strong>PacketBody</strong><br />
Type-specific data.</li>
</ol>
<p style="text-align: justify;">The <em>PacketType</em> field must be one of the following enum values:
<pre class="brush: c">enum KD_PACKET_TYPE
{
	PACKET_TYPE_UNUSED              =  0,
	PACKET_TYPE_KD_STATE_CHANGE32   =  1,
	PACKET_TYPE_KD_STATE_MANIPULATE =  2,
	PACKET_TYPE_KD_DEBUG_IO         =  3,
	PACKET_TYPE_KD_ACKNOWLEDGE      =  4,
	PACKET_TYPE_KD_RESEND           =  5,
	PACKET_TYPE_KD_RESET            =  6,
	PACKET_TYPE_KD_STATE_CHANGE64   =  7,
	PACKET_TYPE_KD_POLL_BREAKIN     =  8,
	PACKET_TYPE_KD_TRACE_IO         =  9,
	PACKET_TYPE_KD_CONTROL_REQUEST  =  10,
	PACKET_TYPE_KD_FILE_IO          =  11
};
</pre>
</p>
<p style="text-align: justify;">A quick explanation for some of the above types follows:</p>
<ul style="text-align: justify;">
<li>Acknowledgement packets - used to confirm that a specific packet has been successfully received - sent by both Windbg and the target,</li>
<li>Resend packets - used, when Windbg or the target considers a specific packet invalid (e.g. the Checksum value is incorrect, given the packet data),</li>
<li>Resync packets - used, when the synchronization between two machines fails.</li>
</ul>
<p style="text-align: justify;">For a more detailed (or better visualized) information on how a successful packet exchange is achieved, take a look at the aforementioned <em>KD extension DLLs &amp; KDCOM protocol</em> text, section <a href="http://articles.sysprogs.org/kdvmware/kdcom.shtml"><em>KDCOM protocol</em></a>.</p>
<p style="text-align: justify;">Furthermore, if we take a closer look at the structure definitions for each of the above types, we can easily observe a certain characteristic, which is true for most of the packets:
<pre class="brush: c">typedef struct _DBGKD_DEBUG_IO
{
	ULONG ApiNumber;
	USHORT ProcessorLevel;
	USHORT Processor;
	union {
		DBGKD_PRINT_STRING PrintString;
		DBGKD_GET_STRING GetString;
	} u;
} DBGKD_DEBUG_IO, *PDBGKD_DEBUG_IO;</pre>
</p>
<p style="text-align: justify;">
<pre class="brush: c">typedef struct _DBGKD_CONTROL_REQUEST
{
	ULONG ApiNumber;
	union {
		DBGKD_REQUEST_BREAKPOINT RequestBreakpoint;
	DBGKD_RELEASE_BREAKPOINT ReleaseBreakpoint;
	} u;
} DBGKD_CONTROL_REQUEST, *PDBGKD_CONTROL_REQUEST;</pre>
</p>
<p style="text-align: justify;">Noticeably, a majority of the type-specific structures begin with an ApiNumber field, containing the sub-type of the packet. For example, if the target encounters a PACKET_TYPE_KD_STATE_CHANGE<em> </em>type, then the correct value of the ApiNumber field must be one of the following:
<pre class="brush: c">enum KD_STATE_CHANGE_API_NUMBER
{
	DbgKdExceptionStateChange     = 0x00003030L,
	DbgKdLoadSymbolsStateChange   = 0x00003031L,
	DbgKdCommandStringStateChange = 0x00003032L
};
</pre>
</p>
<p style="text-align: justify;">PACKET_TYPE_KD_DEBUG_IO packet type, together with the supported APIs can be another good example:
<pre class="brush: c">enum KD_DEBUG_IO_API_NUMBER
{
	DbgKdPrintStringApi = 0x00003230L,
	DbgKdGetStringApi   = 0x00003231L
};
</pre>
</p>
<p style="text-align: justify;"><span style="text-decoration: underline;">Please note that all of the constants and structure definitions can be found in the windbgkd.h file included in Windows 2000 DDK, or the <a href="http://www.koders.com/c/fid556E4312920611402ACDADA7D137A14250903A82.aspx">ReactOS</a> project files.</span></p>
<p style="text-align: justify;">In general, the entire cross-system communication is carried out using the packet format presented above. Later in this post, I will show you how the data exchange looks in practise.</p>
<h3 style="text-align: justify;">Previous research &amp; related stuff</h3>
<p style="text-align: justify;">As mentioned before, several people have decided to analyze and describe the strictly technical details related to Kernel Debugger communication. Apart from great articles, a few KD-related projects have also been started, some of which are successfully developed until today. If you're interested in this subject, you should check out the following links:</p>
<ul style="text-align: justify;">
<li><a href="http://virtualkd.sysprogs.org/">SYSPROGS VirtualKD</a> - a project officially named <em>Windows Kernel Debugger booster for Virtual Machines</em>, which is supposed to speed up the Windows kernel debugging process, for both VMWare and VirtualBox software. The effect can be achieved by replacing the standard virtual COM port (allowing 115200 bps transfer) with a custom mechanism, reaching up to 450KB/s (VirtualBox) or 150KB/s (VMWare) transfer rate. Thanks to the author, both the binaries and source code are available, making it possible to build the executable images on one's own, or even learn how the program works under-the-hood. Download: <a href="http://sourceforge.net/projects/virtualkd/">http://sourceforge.net/projects/virtualkd/</a></li>
</ul>
<ul style="text-align: justify;">
<li><a href="http://www.secureworks.com/research/tools/windpl.html">SecureWorks Wind Pill</a> - a perl tool, which aims at automating the tasks involved in debugging the Windows kernel. Using this script, it becomes possible to carry out a kernel debugging session on custom host platforms. Besides, the utility makes it much easier to perform certain tasks (able to be automated) on the target system, which would be harder to achieve using the standard Windbg scripting abilities. Black Hat USA 2007 presentation: <a href="https://www.blackhat.com/presentations/bh-usa-07/Stewart/Presentation/bh-usa-07-stewart.pdf">https://www.blackhat.com/presentations/bh-usa-07/Stewart/Presentation/bh-usa-07-stewart.pdf</a>, Source Code: <a href="http://www.secureworks.com/research/tools/windpl.zip">http://www.secureworks.com/research/tools/windpl.zip</a></li>
</ul>
<h3 style="text-align: justify;">Pipe Proxy - traffic monitor</h3>
<p style="text-align: justify;">In order to observe, how the communication between the two systems is really achieved, I have written a straight-forward tool called <strong>Windbg-PipeProxy</strong>, which works just as the name suggests - becomes a named-pipe proxy between the guest and host, and redirects every single packet sent by either of the systems. Apart from forwarding the traffic, a more or less reader-friendly log messages are emitted, giving some insight on what type of information is really passed to and from the kernel debugger. In the long haul, this can provide possible ideas of how the debugger can be actually attacked by the host system.  <strong>Examples:</strong></p>
<p style="text-align: justify;">
<ul style="text-align: justify;">
<li>Windbg tries to connect to the booting machine by sending the PACKET_TYPE_KD_RESET packet (type 6):</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ WinDBG ---&gt; VirtualPC ]==---</strong>
[14:38:12] Leader:  0x69696969
 S --&gt; C  Type:     0000000006
 S --&gt; C  Bytes:             0
 S --&gt; C  Checksum: 0x00000000
</pre>
<ul style="text-align: justify;">
<li>Windbg sends a BREAKIN packet (single 0x62 byte):</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ WinDBG ---&gt; VirtualPC ]==---</strong>
[14:38:12] Leader:  0x00000062
 S --&gt; C  Type:     0000000000
 S --&gt; C  Bytes:             0
 S --&gt; C  Checksum: 0x00000000
</pre>
<ul style="text-align: justify;">
<li>Windbg send a <em>GetVersion</em> request to the Virtual Machine...</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ WinDBG ---&gt; VirtualPC ]==---</strong>
[14:38:15] Leader:  0x30303030
 S --&gt; C  Type:     0000000002
 S --&gt; C  Bytes:            56
 S --&gt; C  Checksum: 0x00000610
00000000: <span style="color: #ff0000;">46 31 00 00</span> 00 00 00 00 03 01 00 00 00 00 00 00 F1..............
00000010: b8 f8 c7 02 01 00 00 00 30 d1 52 69 00 00 00 00 ........0.Ri....
00000020: aa 00 00 00 01 00 00 00 00 00 00 00 53 61 00 00 ............Sa..
00000030: 00 00 00 00 00 00 00 00 ?? ?? ?? ?? ?? ?? ?? ?? ................

<span style="color: #ff0000;">0x00003146</span> = DbgKdGetVersionApi
</pre>
<ul style="text-align: justify;">
<li>... and the guest system responses with a correctly filled structure:</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ VirtualPC ---&gt; WinDBG ]==---</strong>
[14:38:15] Leader:  0x30303030
 C --&gt; S  Type:     0000000002
 C --&gt; S  Bytes:            56
 C --&gt; S  Checksum: 0x0000149e
00000000: <span style="color: #ff0000;">46 31 00 00</span> 00 00 00 00 00 00 00 00 00 00 00 00 F1..............
00000010: 0f 00 bc 1b 06 00 03 00 4c 01 0c 03 2f 00 00 00 ........L.../...
00000020: <span style="color: #0000ff;">00 a0 83 82 ff ff ff ff</span> <span style="color: #008000;">70 95 97 82 ff ff ff ff</span> ........p.......
00000030: <span style="color: #800080;">ec cf b9 82 ff ff ff ff</span> ?? ?? ?? ?? ?? ?? ?? ?? ................

<span style="color: #ff0000;">0x00003146</span> = DbgKdGetVersionApi
<span style="color: #0000ff;">0xffffffff`8283a000</span> = NT Kernel ImageBase
<span style="color: #008000;">0xffffffff`82979570</span> = PsLoadedModuleList
<span style="color: #800080;">0xffffffff`82b9cfec</span> = DebuggerDataList
</pre>
<ul style="text-align: justify;">
<li>A user-mode application uses <a href="http://msdn.microsoft.com/en-us/library/ff543634%28VS.85%29.aspx">ntdll.DbgPrintEx</a> to send out a simple text message to the debugger console.</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ VirtualPC ---&gt; WinDBG ]==---</strong>
[15:29:16] Leader:   0x30303030
 C --&gt; S  Type:     0000000003
 C --&gt; S  Bytes:            29
 C --&gt; S  Checksum: 0x000007f2
00000000: <span style="color: #ff0000;">30 32 00 00</span> 10 00 00 00 <span style="color: #800080;">0d 00 00 00</span> f9 fc b5 82 02..............
00000010: <span style="color: #0000ff;">48 65 6c 6c 6f 20 57 6f 72 6c 64 21 0a</span> ?? ?? ?? <span style="color: #0000ff;">Hello World!.</span>...
<img class="alignright" title="Hello World!" src="http://j00ru.vexillium.org/blog/03_07_10/hello.png" alt="" width="251" height="86" />
<span style="color: #ff0000;">0x00003230</span> = DbgKdPrintStringApi
<span style="color: #800080;">0x0000000d</span> = Output string length
'<span style="color: #0000ff;">Hello world!</span><span style="color: #0000ff;">\n</span>' = Output string contents
</pre>
<ul style="text-align: justify;">
<li>A user-mode application uses <a href="http://msdn.microsoft.com/en-us/library/ff543635%28VS.85%29.aspx">ntdll.DbgPrompt</a> to send a request for input data from the KD, providing prompt text...</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ WinDBG ---&gt; VirtualPC ]==---</strong>
[15:20:21] Leader:   0x30303030
C --&gt; S  Type:     0000000003
C --&gt; S  Bytes:            31
C --&gt; S  Checksum: 0x0000061c
00000000: <span style="color: #ff0000;">31 32 00 00</span> 10 00 00 00 <span style="color: #800080;">0f 00 00 00</span> <span style="color: #008000;">20 00 00 00</span> 12.......... ...
00000010: <span style="color: #0000ff;">48 65 6c 6c 6f 20 64 65 62 75 67 67 65 72 21</span> ?? <span style="color: #0000ff;">Hello debugger!</span>.

<span style="color: #ff0000;">0x00003231</span> = DbgKdGetStringApi
<span style="color: #800080;">0x0000000f</span> = Prompt string length
<span style="color: #008000;">0x00000020</span> = Output buffer size
'<span style="color: #0000ff;">Hello debugger!</span>' = Prompt string contents</pre>
<ul style="text-align: justify;">
<li>... and Windbg responses with an input string:</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong>---==[ WinDBG ---&gt; VirtualPC ]==---</strong></pre>
<pre style="padding-left: 30px; text-align: justify;">[15:20:26] Leader:   0x30303030
 C --&gt; S  Type:     0000000003
 C --&gt; S  Bytes:            30
 C --&gt; S  Checksum: 0x00000544</pre>
<pre style="padding-left: 30px; text-align: justify;">00000000: <span style="color: #ff0000;">31 32 00 00</span> <span style="color: #800080;">10 00 00 00 0f 00 00 00</span> <span style="color: #008000;">0e 00 00 00</span> 12..............
00000010: <span style="color: #0000ff;">48 65 6c 6c 6f 20 63 6c 69 65 6e 74 21 00</span> ?? ?? <span style="color: #0000ff;">Hello client!</span>...</pre>
<p style="padding-left: 30px; text-align: justify;"><img class="alignright" title="Hello client!" src="../blog/03_07_10/prompt.png" alt="" width="211" height="73" /></p>
<pre style="padding-left: 30px; text-align: justify;"><span style="color: #ff0000;">0x00003231</span> = DbgKdGetStringApi
<span style="color: #800080;">0x00000010</span> = unchanged
<span style="color: #800080;">0x0000000f</span> = unchanged
<span style="color: #008000;">0x0000000e</span> = the actual size of the output string (must be less or equal to the buffer size defined in the previous packet)
'<span style="color: #0000ff;">Hello client!</span>' = The text message typed by the KD user
</pre>
<ul style="text-align: justify;">
<li>The debugger requests 0x18 bytes to be read from the specified virtual memory address...</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"> ---==[ WinDBG ---&gt; VirtualPC ]==---
[15:14:15] Leader:   0x30303030
  C --&gt; S  Type:     0000000002
  C --&gt; S  Bytes:            56
  C --&gt; S  Checksum: 0x0000078b
00000000: <span style="color: #ff0000;">30 31 00 00</span> 00 00 00 00 02 00 00 00 00 00 00 00 01..............
00000010: <span style="color: #800080;">e8 3b 95 82 ff ff ff ff</span> <span style="color: #008000;">18 00 00 00</span> 00 00 00 00 .;..............
00000020: 42 01 00 00 00 00 00 00 00 00 00 00 01 0f 00 00 B...............
00000030: 00 00 0f 77 01 00 00 00 ?? ?? ?? ?? ?? ?? ?? ?? ...w............

<span style="color: #ff0000;">0x00003130</span> = DbgKdReadVirtualMemoryApi
<span style="color: #800080;">0xffffffff`82953be8</span> = Memory address to read
<span style="color: #008000;">0x00000018</span> = Byte count
</pre>
<ul style="text-align: justify;">
<li>... And the kernel supplies the data in consideration.</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;">---==[ VirtualPC ---&gt; WinDBG ]==---
[15:14:15] Leader:   0x30303030
  C --&gt; S  Type:     0000000002
  C --&gt; S  Bytes:            80
  C --&gt; S  Checksum: 0x00000e08
00000000: <span style="color: #ff0000;">30 31 00 00</span> 00 00 00 00 00 00 00 00 00 00 00 00 01..............
00000010: <span style="color: #800080;">e8 3b 95 82 ff ff ff ff 18 00 00 00</span> <span style="color: #008000;">18 00 00 00</span> .;..............
00000020: 42 01 00 00 00 00 00 00 00 00 00 00 01 0f 00 00 B...............
00000030: 00 00 0f 77 01 00 00 00 <span style="color: #0000ff;">ec 5f b9 82 ec 5f b9 82</span> ...w....._..._..
00000040: <span style="color: #0000ff;">00 00 00 00 00 00 00 00 4b 44 42 47 40 03 00 00</span> ........KDBG@...

<span style="color: #ff0000;">0x00003130</span> = DbgKdReadVirtualMemoryApi
<span style="color: #800080;">0xffffffff`82953be8</span> = Memory address to read (<em>unchanged</em>)
<span style="color: #800080;">0x00000018</span> = Byte count (<em>unchanged</em>)
<span style="color: #008000;">0x00000018</span> = Number of bytes read
<span style="color: #0000ff;">###Data###</span> = Requested memory contents, appended to the primary packet
</pre>
<ul style="text-align: justify;">
<li>Apart from the normal, <em>functional</em> packets, the ACK packets are continuously sent after the KD / kernel successfully receives a message from the other side:</li>
</ul>
<pre style="padding-left: 30px; text-align: justify;"><strong> ---==[ WinDBG ---&gt; VirtualPC ]==---</strong>
[15:14:15] Leader:   0x69696969
  S --&gt; C  Type:     0000000004
  S --&gt; C  Bytes:             0
  S --&gt; C  Checksum: 0x00000000

<strong>---==[ VirtualPC ---&gt; WinDBG ]==---</strong>
[15:14:15] Leader:   0x69696969
  C --&gt; S  Type:     0000000004
  C --&gt; S  Bytes:             0
  C --&gt; S  Checksum: 0x00000000</pre>
<p style="text-align: justify;">
<p style="text-align: justify;">One should also note that the maximum size of the overall packet is a constant value 0f 4000 (0xFA0), and cannot be extended by any means:</p>
<pre style="padding-left: 30px; text-align: justify;">
<pre class="brush: c">#define PACKET_MAX_SIZE 4000</pre>
</pre>
<p style="text-align: justify;">This, in turn, indicates that when the KD requests for more than one typical memory page (4096 bytes), the replies are sent in a chunked form, where each packet contains a separate memory area and fits in the packet size limitation. Furthermore, it is not possible to correctly send an debug message longer than ~4000 bytes, as well as send an input string to the kernel, either.</p>
<p style="text-align: justify;">The above listings should give you some insight at how the communication is performed - please keep in mind that the number of supported APIs is more than 50, while almost every single one has its own, unique structure - definitely worth taking a closer look at. If you would like to receive more packet samples, complete dumps of a debugging session or even the source code, feel free to <a href="http://j00ru.vexillium.org/?page_id=5">contact me</a>.</p>
<h3 style="text-align: justify;">Pipe Proxy - fuzzing the packets</h3>
<p style="text-align: justify;">Already being able to monitor and observe the flowing packets, the next step on the way to find some real vulnerability required modifying the packets. More precisely, the first idea was to pick random bytes and alter them before passing to to the adequate pipe - a technique otherwise known as <em>fuzzing</em>. Due to the fact that the first PipeProxy version was unable to recognize the packet structure, only a fully raw-data fuzzing was possible. As it later turned out, the only thing I could achieve by doing this, was hanging the entire debugging session after a couple seconds.</p>
<p style="text-align: justify;">The reason of such behavior was pretty simple - both sides of the communication were validating the packet body against the <em>Checksum </em>field - if the verification failed, the message was requested to be re-sent. Apparently, more intelligent fuzzer was required to deal with this issue. And so, packet buffering was implemented, which guaranteed that each packet is entirely received before sending it to the other party. By doing so, <em>random</em> bytes could be chosen out of the packet body and changed - next then, the Checksum was re-calculated and sent to the dest~ pipe. After this fix, the fuzzer seemed to work well, as the debugger really started to behave unexpectedly.</p>
<h3 style="text-align: justify;">Results</h3>
<p style="text-align: justify;">Unfortunately, the bad news is that during a couple fuzzing sessions, Windbg seemed to work steadily when parsing malformed protocol structures. The interesting thing is that strange things began happening when the packet's body (e.g. the memory contents read from the machine) was fuzzed together with the protocol structures. More precisely, exceptions started to be raised inside external DLL modules utilized by Windbg.exe. This phenomenom gave me the idea that the KD could possibly have problems handling the Virtual Machine's internal state, as well. More information regarding some <strong>real</strong> security flaws can be found in the "second attack vector" section <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p style="text-align: justify;">Please note, that the fact that no serious issues were found (yes, there were some minor cases) in relation to the packet format, doesn't necessarily mean that Windbg is vulnerability-free in this context. Don't hesitate to make your own experiments in this area <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<h3 style="text-align: justify;">Source Code</h3>
<p style="text-align: justify;">On a second thought, I decided not to make the <strong>Windbg-PipeProxy</strong> available for the public audience for now. Chances are that you can get it after contacting me on a private channel, but no guarantee regarding this. I've got plans to release the tool soon though, so you can patiently wait or create one on your own <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<h2 style="text-align: justify;">Second attack vector - executable image symbols' support</h2>
<h3 style="text-align: justify;">The basics</h3>
<p style="text-align: justify;">As a very advanced debugger, Windbg can obviously make use of symbols of any kind (export names, additional .pdb symbol files - both locally and remotely) if the user desires to do so. The symbol packages for each Windows version can be downloaded <a href="http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx">here</a>, together with other debugging tools. If the Kernel Debugger can assign virtual addresses to the names exported from both user- and kernel-mode modules, it must somehow <em>calculate</em> the addresses, based on the current VM state (i.e. memory layout). The question is - does Windbg parses the PE and possibly PE32+ format manually? The answer is - yes and no.</p>
<p style="text-align: justify;">As it turns out, Microsoft has developed a couple of libraries, aiming to help the developers in writing both user- and kernel- mode debugger applications. These libraries make it possible to focus on the real debugger functionality and usability, rather than implementing all the core operations from stratch; as they are really convenient, most of the modern debugging software makes use of these libs. These DLLs are:</p>
<ul style="text-align: justify;">
<li><strong><a href="http://msdn.microsoft.com/en-us/library/ff540534%28VS.85%29.aspx">The Debugger Engine</a> (DbgEng.dll)</strong><br />
As MSDN states:</p>
<ul>
<li>
<pre>The debugger engine (DbgEng.dll), typically referred to as the engine, provides an interface for examining and manipulating debugging targets in user mode and kernel mode on Microsoft Windows.

The debugger engine can acquire targets, set breakpoints, monitor events, query symbols, read and write to memory, and control threads and processes in a target.

You can use the debugger engine to write both debugger extension libraries and stand-alone applications. Such applications are referred to as debugger engine applications. A debugger engine application that uses the full functionality of the debugger engine is called a debugger. For example, WinDbg, CDB, NTSD, and KD are debuggers; the debugger engine provides the core of their functionality.</pre>
</li>
</ul>
</li>
<li><strong><a href="http://msdn.microsoft.com/en-us/library/ms679309%28v=VS.85%29.aspx">The Debug Help Library</a> (DbgHelp.dll) </strong>- a small helper library, providing support to all symbol-related activities, as well as minidump files' management. This module is actually responsible for parsing all the internal PE/PE32+ structures of a specific  image, on the debugger's demand.</li>
</ul>
<p style="text-align: justify;">Noticeably, Windbg takes advantage of both of these libraries. All in all, even if the debugger itself doesn't have problems dealing with malformed packet structure, both <strong>external</strong> DLLs are also directly operating on the information provided by the target kernel. Compromising the security of one of these modules would be as good as owning Windbg.exe itself - either way, the code's executing with the same privileges.</p>
<p style="text-align: justify;">I have personally focused on the latter one, which has already been proven to contain critical security flaws. Let's take a look if there's anything left for us <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<h3 style="text-align: justify;">Previous research &amp; related stuff</h3>
<p style="text-align: justify;">One or two buffer overflow vulnerabilities were found and exploited during recent years. Some references:</p>
<ol style="text-align: justify;">
<li><a href="http://forum.tuts4you.com/index.php?showtopic=16445">OllyDBG v1.10 and ImpREC v1.7f export name buffer overflow  vulnerability</a> - tuts4you forums discussion</li>
<li><a href="http://www.openrce.org/blog/view/1369/Old_dbghelp_and_an_old_exploit...">Old dbghelp and an old exploit...</a> - a brief analysis of a stack-based BO by <a href="http://rewolf.pl">ReWolf</a></li>
</ol>
<p style="text-align: justify;">Even thought that's not much, these sources can reveal the actual code quality provided by <em>DbgHelp.dll</em>. For more details on how it really looks like, take a look at the next section <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<h3 style="text-align: justify;">PE Image Fuzzing (environment + process)</h3>
<p style="text-align: justify;">In my opinion, running a fuzzer for a couple of nights is one of the most efficient ways of finding some anchor points, which can be further analyzed in order to work out a <em>code-exec</em> exploit; this case is no different. Before I start listing bugs found in the aforementioned DLL, one thing must be noted: Because of the fact that the target kernel has complete control over the data being sent to the debugger, one can simply fuzz the <em>DbgHelp.dll</em> library without Windbg taking part in this process - every malformed .exe loaded in the context of a local fuzzer can also be loaded by Windbg.exe, when the guest decides to.</p>
<p style="text-align: justify;">Okie, let's take a look at a brief explanation of a some of the DbgHelp bugs:</p>
<ol style="text-align: justify;">
<li>
<pre><strong>Type</strong>: Out-of-bounds memory reference (READ)
<strong>Instruction</strong>: rep movsd, <span style="color: #ff0000;">[ESI]=???</span>
<strong>Call Stack</strong>:
 msvcrt!memcpy
 dbghelp!ReadImageData
 dbghelp!ReadHeader
 dbghelp!imgReadFromDisk
 dbghelp!modload
 dbghelp!LoadModule
 dbghelp!SymLoadModuleEx (exported)
</pre>
<p>Description: this exception can be raised thanks to lack of sanity checks in numerous places, e.g. when (IMAGE_FILE_HEADER.NumberOfSections * sizeof(IMAGE_SECTION_HEADER)) is greater than the executable image size</li>
<li>
<pre><strong>Type</strong>: Invalid Memory Reference (READ)
<strong>Instruction</strong>: movzx eax, word ptr [edx+ecx*2], <span style="color: #ff0000;">EDX=controlled</span>
<strong>Call Stack</strong>:
 dbghelp!LoadExportSymbols
 dbghelp!idd2me
 dbghelp!modload
 dbghelp!LoadModule
 dbghelp!SymLoadModuleEx (exported)
</pre>
<p>Description: the instruction is a part of a loop, which parses respective Export Table entries. The above instruction loads the ordinal of the current routine into AX - any address can be specified as the source operand, here.</li>
<li>
<pre><strong>Type</strong>: Out-of-bounds memory reference (READ)
<strong>Instruction</strong>: mov edx, [ecx+eax*4],  <span style="color: #ff0000;">LO(EAX)=controlled</span>
<strong>Call Stack</strong>:
 dbghelp!LoadExportSymbols
 dbghelp!idd2me
 dbghelp!modload
 dbghelp!LoadModule
 dbghelp!SymLoadModuleEx (exported)
</pre>
<p>Description: the code tries to retrieve the address/name of the currently-processed function, using the export ordinal as an array index. Because of the fact that the buffer allocation size is determined by max(NumberOfFunctions,NumberOfNames), and the ordinals are not compared against the buffer size, one can reference heap data after the buffer.</li>
<li>
<pre><strong>Type</strong>: Invalid Memory Reference (READ)
<strong>Instruction</strong>: movzx eax, byte ptr [edx+10h], <span style="color: #ff0000;">EDX=controlled</span>
<strong>Call Stack</strong>:
 dbghelp!LoadCoffSymbols
 dbghelp!idd2me
 dbghelp!modload
 dbghelp!LoadModule
 dbghelp!SymLoadModuleEx
</pre>
<p>Description: just another controlled memory reference while parsing the coff symbols.</li>
<li>A bunch of similar invalid memory references controlled by the VM. Although these bugs doesn't directly lead to any thread, a few information-disclosure attacks are confirmed to be possible (i.e. revealing Windbg.exe process memory to the guest).</li>
<li>
<pre><strong>Type</strong>: Out-of-bounds memory reference (WRITE)
<strong>Instruction</strong>: mov word ptr [eax+edx*2], cx       <span style="color: #ff0000;">LO(EDX)=controlled</span>
                                                <span style="color: #ff0000;">CX=0x0001</span>
<strong>Call Stack</strong>:
 dbghelp!LoadExportSymbols
 dbghelp!idd2me
 dbghelp!modload
 dbghelp!LoadModule
 dbghelp!SymLoadModuleEx
</pre>
<p>Description: The <em>LoadExportSymbols</em> functions allocates a special array on the heaps, let's call it <em>IsOrdinalPresent</em>. Its size (in items) is set to max(NumberOfFunctions,NumberOfNames), while the size of a single item is 16-bits. During the export directory parsing, the routine marks the ordinals as <em>present</em> by putting a TRUE value into the corresponding array index. Noticeably, the actual value of the ordinal is not validated and can exceed the size of the buffer. This flaw makes it possible for us to overwrite the memory following our buffer with (WORD)1. Considering today's Windows heap protections, it might be particurarly hard to transform this bug into a reliable <em>code-execution</em>; but who knows, it still might be possible (i.e. by targetting the heap allocation contents instead of headers).</li>
<li>
<pre><strong>Type</strong>: 32-bit Integer Overflow
<strong>Instructions</strong>:
 mov edx, [ebp-864], <span style="color: #ff0000;">EDX=controlled</span>
 shl edx, 1
 push edx
 call _pMemAlloc

<strong>Call Stack</strong>:
 dbghelp!LoadExportSymbols
 dbghelp!idd2me
 dbghelp!modload
 dbghelp!LoadModule
 dbghelp!SymLoadModuleEx
</pre>
<p>Description: Because of the fact that the ordinals aren't validated either way when being used as a WRITE array index, the bug doesn't change anything. If, however, an appropriate check would be added in order to accept ords &lt;= allocation size, than the <em>integer overflow</em> would still make it possible to perform out-of-bounds writing e.g. by setting the <em>NumberOfFunctions</em> field to 0x80000001 - then:
<pre class="brush: c">(uint32_t)(NumberOfFunctions * 2) = 2</pre>
</li>
</ol>
<p style="text-align: justify;">As can be seen, <em>DbgHelp.dll</em> contains numerous places where virtually any value could be used as the <em>read</em> instruction operand. Unfortunately for us, the library makes use of multiple exception handlers, which do their best not to have the process crashed; most of the Access Violation exceptions are correctly dealt with. However, it is still possible to cause serious damage on the <em>Windbg.exe</em> heap (using <em>out-of-bounds write</em>), which would eventually lead to Denial of Service conditions. Code execution has not been confirmed due to very hard exploitation conditions of the known flaws - I strongly believe that it <strong>is</strong> possible, though.</p>
<h2 style="text-align: justify;">Conclusions</h2>
<p style="text-align: justify;">As shown in this post entry, using a remote debugger might not be as safe as one might expect. <em>Wherever information is exchanged, a vulnerability is likely to appear</em> - this phrase perfectly fits to the issues described here.</p>
<p style="text-align: justify;">Please keep in mind that the results presented here are not the results of a really thorough research - Alex appeared to be faster than me <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  This means that I could have missed some obvious vectors which should be double-checked - in this case, please just let me know. Additionally, you are free to carry out your own analysis of the Kernel Debugger security - however, if you're going to make use of some ideas / conclusions included in this post, I'd be really glad to be let known <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Thanks!</p>
<h2 style="text-align: justify;">References</h2>
<ol>
<li><a href="http://www.microsoft.com/whdc/devtools/debugging/default.mspx">Debugging Tools for Windows</a></li>
<li><a href="http://www.vsj.co.uk/articles/display.asp?id=265">Kernel and remote debuggers</a></li>
<li><a href="http://virtualkd.sysprogs.org/">SysProgs VirtualKD project</a></li>
<li><a href="https://www.blackhat.com/presentations/bh-usa-07/Stewart/Presentation/bh-usa-07-stewart.pdf">Just Another Windows Kernel Perl Hacker</a> @ BH 2007</li>
<li><a href="http://www.koders.com/c/fid556E4312920611402ACDADA7D137A14250903A82.aspx">ReactOS windbgkd.h file</a></li>
<li><a href="http://msdn.microsoft.com/en-us/library/aa365590%28VS.85%29.aspx">Windows Named Pipes</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=405</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>A quick insight into the Driver Signature Enforcement</title>
		<link>http://j00ru.vexillium.org/?p=377</link>
		<comments>http://j00ru.vexillium.org/?p=377#comments</comments>
		<pubDate>Sun, 20 Jun 2010 00:32:45 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Windows 7]]></category>
		<category><![CDATA[Windows Vista]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=377</guid>
		<description><![CDATA[Hey! I have recently had some fun playing around with driver signing on Windows x64, and so I like to share some matters that have came into my head Therefore, let me briefly describe some internal mechanisms lying behind well known Driver Signature Enforcement, a significant part of the Code Integrity feature introduced by Microsoft [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Hey!</p>
<p style="text-align: justify;">I have recently had some fun playing around with driver signing on Windows x64, and so I like to share some matters that have came into my head <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  Therefore, let me<em> </em> briefly describe some internal mechanisms lying behind well known <em>Driver Signature Enforcement</em>, a significant part of the Code Integrity feature introduced by Microsoft in Windows Vista and Windows 7. Understanding the underlying system behavior would let us think of possible attack vectors against the protection, as well as better apprehend the existing techniques, such as the ones developed by Joanna Rutkowska or Alex Ionescu. Let the fun begin!</p>
<p><span id="more-377"></span></p>
<p style="text-align: justify;">Note: All the information included therein are based upon Windows 7 executables and are not guaranteed to be valid for previous Windows NT systems.</p>
<h3 style="text-align: justify;">Introduction</h3>
<p>According to Microsoft:</p>
<pre style="text-align: justify;">Code Integrity is a feature that improves the security of the operating  system by validating the integrity of a driver or system file each  time it is loaded into memory. Code Integrity detects whether an  unsigned driver or system file is being loaded into the kernel, or  whether a system file has been modified by malicious software that is  being run by a user account with administrative permissions. On  x64-based versions of the operating system, kernel-mode drivers must be  digitally signed.</pre>
<p style="text-align: justify;">What the above quotation really informs about, is that the only device drivers that can be actually executed within <em>ring-0</em> must be digitally signed - otherwise, the nt core simply refuses to load such an image. This rule applies to every single module, which desires to run with the kernel privileges. In fact, launching a device driver is the only legitimate way, in which a system user can execute ring-0 code - and this very only way is disabled for unsigned executable images. The most interesting thing, however, is that not only restricted users are forbidden to load unsigned code (which they couldn't do before anyway, neither in x64 nor x86 mode), but the entire <em>Administrator's </em>group, as well. This, in turn, means that the privileges assigned to a user account don't play an important role anymore, in this context - the ability to load unsigned code was taken away from <em>every</em> system user!</p>
<p style="text-align: justify;">The official purpose of introducing such restrictions was to make the OS more secure (by preventing <em>ring-0 </em>malware from pwning the system from inside), get rid of possible anti-DRM solutions and overall, <em>make our computers a better place to work</em>. As a few years have passed and we're still alive, it seems like <em>Code Integrity</em> is doing well. Even though this feature is obligatory, and running on most of the systems nowadays, it is still possible to temporarily turn the mechanism off (which is the only reasonable idea for device driver developers, testing their work out):</p>
<ul style="text-align: justify;">
<li>Pressing F8 during Boot Time, and choosing the <em>Disable Driver Signature Enforcement</em> boot option or,</li>
<li>Specifying the /DEBUG boot flag using <em>bcdedit.exe</em> plus attaching a Remote Kernel Debugger during, of after the system boot-up.</li>
</ul>
<p style="text-align: justify;">Both alternatives obviously require very high user privileges, as well as rely on rebooting the machine in order to take effect. As turning the machine off and on again might be somewhat inconvenient (while attaching a remote, physical debugger is even worse), as well as is relatively hard to perform programatically (unless someone decides to create a bootkit, which would choose the right boot option without the user's knowledge). Due to this situation, a brand new type of <em>Elevation of Privileges</em> attack arises: "Admin to Kernel transition". Some experts tend to consider security flaws belonging to this class as "bugs only", while others see them as a full-fledged vulnerabilities.</p>
<p style="text-align: justify;">A few attacks against <em>Code Integrity </em>have been performed in the past, involving design and implementation flaws found in certain parts of the Windows kernel. These include:</p>
<ol style="text-align: justify;">
<li>Employing paged-out kernel code (i.e. executable parts of drivers, found in Pageable sections), which has been previously overwritten inside pagefile.sys (by utilizing raw disk access).</li>
<li>Exploiting security flaws existing in common, already-signed device drivers, such as graphic card drivers (e.g. <em>ATI</em>, <em>nVidia </em>vendors).<em><br />
</em></li>
</ol>
<p style="text-align: justify;">As we already know what we're dealing with, let's take a look at how the mechanism works internally.</p>
<h3 style="text-align: justify;">Initialization</h3>
<p style="text-align: justify;">The actual heart of <em>Code Integrity</em> lies inside a single executable image, called <strong>CI.dll</strong> (you can find it inside your \Windows\system32 directory). If we take a look at the list of imported symbols, we will most likely see the following names:</p>
<ul>
<li>CiCheckSignedFile</li>
<li>CiFindPageHashesInCatalog</li>
<li>CiFindPageHashesInSignedFile</li>
<li>CiFreePolicyInfo</li>
<li>CiGetPEInformation</li>
<li><strong>CiInitialize</strong></li>
<li>CiVerifyHashInCatalog</li>
</ul>
<p style="text-align: justify;">What shouldn't be a surprise, the first function within our interest is the initialization routine, CI!CiInitialize. This routine is imported by the NT core (<em>ntoskrnl.exe</em>), and called during system initialization:</p>
<pre>
<pre class="brush: c">VOID SepInitializeCodeIntegrity()
{
  DWORD CiOptions;

  g_CiEnabled = FALSE;
  if(!InitIsWinPEMode)
    g_CiEnabled = TRUE;

  memset(g_CiCallbacks,0,3*sizeof(SIZE_T));
  CiOptions = 4|2;

  if(KeLoaderBlock)
  {
    if(*(DWORD*)(KeLoaderBlock+84))
    {
      if(SepIsOptionPresent((KeLoaderBlock+84),L&quot;DISABLE_INTEGRITY_CHECKS&quot;))
        CiOptions = 0;
      if(SepIsOptionPresent((KeLoaderBlock+84),L&quot;TESTSIGNING&quot;))
        CiOptions |= 8;
    }
    CiInitialize(CiOptions,(KeLoaderBlock+32),&amp;amp;amp;amp;amp;amp;amp;amp;g_CiCallbacks);
  }
}</pre>
</pre>
<p style="text-align: justify;">The above C-like pseudocode presents the general idea of the <em>SepInitializeCodeIntegrity</em> routine. As can be seen, some global <em>nt!g_CiEnabled</em> variable is being set to FALSE / TRUE, depending on whether the machine is booting up in the WinPE mode. Furthermore, <em>CiOptions</em> is initialized accordingly to the system boot options, and finally passed to the <em>CiInitialize </em>routine, together with a pointer to <em>KeLoaderBlock</em> and a global <em>g_CiCallbacks</em> array. A complete call-stack, from the very beginning of the Phase1 thread initialization follows:</p>
<pre class="brush: php">nt!SepInitializeCodeIntegrity
nt!SepInitializationPhase1+0x1a1
nt!SeInitSystem+0x29
nt!Phase1InitializationDiscard+0x7ce
nt!Phase1Initialization+0xd
nt!PspSystemThreadStartup+0x9e
nt!KiThreadStartup+0x19</pre>
</p>
<p style="text-align: justify;">If we decide to go one level deeper, inside the CI!CiInitialize function, several actions taken by the initialization routine can be observed:</p>
<ul>
<li>First of all, a self-integrity test is being performed, by calling the CI!CiFipsCheck function. This function is a wrapper of CI!MincrypK_SelfTest, which eventually verifies the digital signature assigned to the module. If any anomaly is encountered, the 0xc0000428 (<span id="main" style="visibility: visible;"><span id="search" style="visibility: visible;">STATUS_INVALID_IMAGE_HASH) e</span></span>rror code is returned,</li>
<li>If the self-test passes, the <em>nt!g_CiCallbacks</em>, passed by <em>ntoskrnl.exe</em> is filled in the following way:
<pre class="brush: php">g_CiCallbacks[0] = CI!CiValidateImageHeader;
g_CiCallbacks[1] = CI!CiValidateImageData;
g_CiCallbacks[2] = CI!CiQueryInformation;</pre>
</li>
<li>In the very end, the CI!PESetPhase1Initialization function is called, which aims to validate the signature of every single driver present on the <em>Boot Driver List</em>. In case of any errors, an adequate error code is returned and the booting process is halted.</li>
</ul>
<p>From this point now on, the <em>Code Integrity</em> mechanism can be considered pretty much initialized. The most important thing about <em>CiInitialize</em> is that it passes three function pointers back to the nt core, which can be later used to validate respective, custom drivers as they try to be loaded.</p>
<h3>Driver Signature Verification</h3>
<p>Having the essential pointers already initialized, let's take a look at which point the driver loading is disrupted by <em>Code Integrity</em>. We can safely assume that the <em>nt!MmLoadSystemImage</em> routine is reached before the loading process bails out - this is pretty much the first function to call if one wants to load a device driver. The function is used by <em>nt!NtSetSystemInformation</em> (when SystemInformationClass is equal either 28 or 38), so that it can be easily taken advantage of by a user-mode applications. When trying to load an unsigned driver, we end up with the following call-stack, until an error is encountered:</p>
<ol>
<li>
<pre><strong>nt!MmLoadSystemImage</strong></pre>
<pre>The routine loads a specific executable image into kernel memory.</pre>
</li>
<li>
<pre><strong>nt!MiObtainSectionForDriver</strong></pre>
</li>
<li>
<pre><strong>nt!MiCreateSectionForDriver</strong></pre>
</li>
<li>
<pre><strong>nt!MmCheckSystemImage</strong></pre>
<pre>The routine ensures that the system module to be loaded has a correct checksum, which matches the data in the image.</pre>
</li>
<li>
<pre><strong>nt!NtCreateSection</strong></pre>
</li>
<li>
<pre><strong>nt!MmCreateSection</strong></pre>
</li>
<li>
<pre><strong>nt!MiValidateImageHeader</strong></pre>
</li>
<li>
<pre><strong>nt!SeValidateImageHeader</strong></pre>
<pre>The routine ensures that the system module has a valid digital signature appended.</pre>
</li>
<li>
<pre><strong>nt!_g_CiCallbacks[0]</strong></pre>
<pre>e.g. <strong>CI!CiValidateImageData</strong></pre>
</li>
</ol>
<p>As the above shows, after going through a long chain of internal calls, the execution reaches the <em>nt!SeValidateImageHeader</em> routine, which performs the following:</p>
<ol>
<li>Checks, whether <em>nt!g_CiEnabled</em> is set to TRUE
<ol>
<li>If so, compares the nt!g_CiCallbacks[0] pointer to NULL
<ol>
<li>If not empty, calls the nt!g_CiCallbacks[0] function and quits,</li>
<li>Otherwise, returns 0xc0000428.</li>
</ol>
</li>
<li>Otherwise:
<ol>
<li>Allocates one byte on the Paged Pools,</li>
<li>Puts the resulting address into memory pointed to by the first argument,</li>
<li>Returns STATUS_SUCCESS (or whatever zero means here).</li>
</ol>
</li>
</ol>
</li>
</ol>
<p>Surprisingly, that's the end of the signature validation (the actual code responsible for performing the verification lies inside CI.dll).</p>
<p>Conclusions:</p>
<p style="padding-left: 30px;">The decision on whether a driver can or cannot be launched is up to one function, checking one, single variable. If one wanted to turn the mechanism off, it would be up to altering one single byte (or even better, bit!) within the <em>ntoskrnl </em>image. This could prove useful for a malicious, already-signed driver aiming at getting rid of <em>Code Integrity </em>mechanism, overall. Furthermore, potential attackers could also utilize a previously-found <em>write-what-where</em> vulnerability in one of the common device drivers, in order to overwrite the flag and open a gate straight into <em>ring-0</em> execution - the latter option could be utile in terms of <em>admin-&gt;kernel</em> escalation, where disabling CI is the final goal of the exploit.</p>
<p style="padding-left: 30px;">Neither <em>nt!g_CiEnabled</em> nor <em>nt!g_CiCallbacks</em> are exported symbols. Thus, it might be relatively hard to overwrite either of these values reliably, in a cross-version exploit (as long as hardcoded offsets are not provided for every single Windows version).</p>
<p style="padding-left: 30px;">The two remaining item - nt!g_CiCallbacks[0] and nt!g_CiCallbacks[1] - filled inside CI!CiInitialize are used by the <em>nt!SeValidateImageData</em> and <em>nt!SeCodeIntegrityQueryInformation</em> functions, respectively.</p>
<p style="padding-left: 30px;">
<h3 class="dtH1">Driver Verification Debugging Options</h3>
<p>Even tough it is impossible to disable <em>Code Integrity </em>by any other way than the two described at the beginning (F8 option or attached debugger), one is able to do the opposite - enforce the mechanism to work, even if a remote debugger is attached to the machine - certainly a fair option for some of the device driver developers.</p>
<p>As MSDN states:</p>
<p style="padding-left: 30px;">In some cases, developers may want to enforce mandatory kernel mode code signing policy even when a debugger is attached. An example of this is when a driver stack has an unsigned driver (such as a filter driver) that fails to load, which may invalidate the entire stack. Since attaching a debugger allows the unsigned driver to load, the problem appears to vanish as soon as the debugger is attached. Debugging this type of issue may be difficult. In order to facilitate debugging in this case, Code Integrity supports a registry key that can be set to enforce kernel mode signing enforcement even when a debugger is attached.</p>
<p style="padding-left: 30px;">There are two flags defined in the registry that control Code Integrity behavior under the debugger. The flags are not defined by default.</p>
<p style="padding-left: 30px;">Create registry value as follows:</p>
<div class="LW_CodeSnippetContainer" style="padding-left: 30px;">
<div class="LW_CodeSnippetContainerCodeCollection">
<div id="CodeSnippetContainerCode11" class="LW_CodeSnippetContainerCode">
<div style="color: black;">
<pre>Key:   HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CI
Value:   DebugFlags      REG_DWORD</pre>
</div>
</div>
</div>
</div>
<p style="padding-left: 30px;">Possible values:</p>
<dl style="padding-left: 30px;">
<dt><strong>00000001</strong></dt>
<dd>Results in a debug break into the debugger and unsigned driver is allowed to load with <code><strong>g</strong></code>.</dd>
<dt><strong>00000010</strong></dt>
<dd>CI will ignore the presence of the debugger and unsigned drivers are blocked from loading.</dd>
</dl>
<p style="padding-left: 30px;">Any other value results in unsigned drivers loading—this is the default policy.</p>
<h3>Conclusion</h3>
<p>In this short post, I wanted to introduce bits of information that brought my attention during the last one or two days. Obviously, there is still a lot of interesting material to cover, related to <em>Code Integrity</em> subject, which I might further describe in the near future. As for now - feel encouraged to read the articles listed below... and that's it.</p>
<p>Have fun &amp;&amp; Leave comments!</p>
<h3>References &amp; Links</h3>
<ol>
<li>Joanna Rutkowska, <a href="http://www.blackhat.com/presentations/bh-usa-06/BH-US-06-Rutkowska.pdf">Subverting Vista Kernel for Fun and Profit</a></li>
<li>Joanna Rutkowska, <a href="http://theinvisiblethings.blogspot.com/2006/10/vista-rc2-vs-pagefile-attack-and-some.html">Vista RC2 vs. pagefile attack (and some thoughts about Patch Guard)</a></li>
<li>ZDNET, <a href="http://www.zdnet.com/blog/security/update-ati-driver-flaw-exposes-vista-kernel-to-attackers/438">UPDATE: ATI driver flaw exposes Vista kernel to attackers</a></li>
<li>MSDN, <a href="http://www.microsoft.com/whdc/driver/install/drvsign/kmsigning.mspx">Digital Signatures for Kernel Modules on Systems Running Windows Vista</a></li>
<li>Microsoft, <a href="http://csrc.nist.gov/groups/STM/cmvp/documents/140-1/140sp/140sp890.pdf">Code Integrity (ci.dll) Security Policy</a></li>
<li>Matthew Conover, <a href="http://www.symantec.com/avcenter/reference/Windows_Vista_Kernel_Mode_Security.pdf">Assessment of Windows Vista Kernel-Mode Security</a></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=377</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CONFidence 2010 is over</title>
		<link>http://j00ru.vexillium.org/?p=363</link>
		<comments>http://j00ru.vexillium.org/?p=363#comments</comments>
		<pubDate>Sun, 30 May 2010 08:18:52 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[CSRSS]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Undocumented API]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=363</guid>
		<description><![CDATA[One of the biggest (best ) IT security-oriented conferences in Poland finished three days ago, in the wednesday evening. In the very first place, I would like to congratulate all the organisers, for their decision on where the event should be held, as well as how it should look like - during these two days, [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">One of the biggest (best <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ) IT security-oriented conferences in Poland finished three days ago, in the wednesday evening. In the very first place, I would like to congratulate all the organisers, for their decision on where the event should be held, as well as how it should look like - during these two days, I had plenty of real fun!</p>
<p><span id="more-363"></span></p>
<p style="text-align: justify;">CONFidence 2010 took place in Poland, on 25th and 26th of May, in the Kijów Cinema. The lectures were presented on two, independent tracks (thus everyone was able to find something for himself in any given moment), and regarded numerous, important security fields. In my opinion (and because of my specific interests), the best speeches were given by Sebastian Fernandez - "<strong>General notes about exploiting Windows x64</strong>", Mario Heidreich - "<strong>The Presence and Future of Web Attacks Multi-Layer Attacks and XSSQLI</strong>" and Alexey Tikhonow - "<strong>De-blackboxing of digital camera"</strong>. I am really looking forward to see the slides being published as soon as possible. Meanwhile, you can find the complete conference schedule at <a href="http://2010.confidence.org.pl/agenda">http://2010.confidence.org.pl/agenda</a>.</p>
<p style="text-align: justify;">The ESET company (NOD32 software producent) has recently decided to organise two competitions with fun prizes - some detailed information can be found <a href="http://www.eset.pl/nowosci/nowosci/wez-udzial-w-konkursie-eset-i-wygraj-jeden-z-dwoch-komputerow.html">here</a>. In short: the purpose of the first one was to create or project a security-related application of any kind. The second one was directed towards the conference attendees, as the goal was to find a correct serial key associated to a chosen user name, in a specially prepared executable file. A team consisting of <a href="http://gynvael.coldwind.pl/">Gynvael Coldwind</a> and <em>me</em> managed to meet the latter objective, and therefore win the competition <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  Due to the above, a short blog entry/article should be released soon, covering the exact way of generating a correct serial, having as little knowledge about the input data verification mechanisms, as only possible (<em>stay tuned <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </em>). The CrackMe can be still downloaded from the CONFidence website: <a onclick="javascript:pageTracker._trackPageview('/outbound/article/2010.confidence.org.pl');" href="http://2010.confidence.org.pl/ESET/banner.html">http://2010.confidence.org.pl/ESET/banner.html</a>, and I encourage every one and each of you to take a look at this one.</p>
<p style="text-align: justify;">Moreover, I had the pleasure (once more, with <em>Gynvael</em>'s collaboration) to carry out one of the last presentations, dedicated to the Windows kernel vulnerabilities (related to CSRSS and the system registry), which I have often mentioned lately. I think this is a perfect opportunity to publish some <em>advisory</em> documents, containg more relevant, detailed information about the vulns, of a more technical nature. Below you can find a complete list of these:</p>
<ul style="text-align: justify;">
<li><strong><a href="http://vexillium.org/dl.php?HISPASEC_CSRSS_Priv_Escal.pdf">Windows CSRSS Local Privilege Elevation Vulnerability</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0023">CVE-2010-0023</a>)</li>
<li><strong><a href="http://vexillium.org/dl.php?HISPASEC_Local_DoS1.pdf">Windows Kernel Null Pointer Vulnerability</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0234">CVE-2010-0234</a>)</li>
<li><strong><a href="http://vexillium.org/dl.php?HISPSAEC_Local_DoS2.pdf">Windows Kernel Symbolic Link Value Vulnerability</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0235">CVE-2010-0235</a>)</li>
<li><strong><a href="http://vexillium.org/dl.php?HISPASEC_Buffer_Overflow.pdf">Windows Kernel Memory Allocation Vulnerability</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0236">CVE-2010-0236</a>)</li>
<li><strong><a href="http://vexillium.org/dl.php?HISPASEC_Registry_Local_Priv_Escal.pdf">Windows Kernel Symbolic link Creation Vulnerability</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0237">CVE-2010-0237</a>)</li>
<li><strong><a href="http://vexillium.org/dl.php?HISPASEC_Info_Disclosure.pdf">Windows Kernel Symbolic link Information Disclosure</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0237">CVE-2010-0237</a>)</li>
<li><strong><a href="http://vexillium.org/dl.php?HISPASEC_Race_Condition.pdf">Windows Kernel Registry Key Vulnerability</a></strong> (<a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2010-0238">CVE-2010-0238</a>)</li>
</ul>
<p style="text-align: justify;">Furthermore, a package including all the above advisories is available to be downloaded <strong><a href="http://vexillium.org/dl.php?Hispasec_Advisories.zip">here</a> (864 kB).</strong></p>
<p style="text-align: justify;">The slides presented during our lecture can be found <strong><a href="http://vexillium.org/dl.php?confidence_slideshow.pdf">here</a> (1.6 MB).</strong></p>
<p style="text-align: justify;">I strongly encourage every conference attendee to share your opinion regarding the conference itself, as well as specifically the material talked over by us. <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=363</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Windows CSRSS cross-version API Table</title>
		<link>http://j00ru.vexillium.org/?p=349</link>
		<comments>http://j00ru.vexillium.org/?p=349#comments</comments>
		<pubDate>Mon, 03 May 2010 00:09:52 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[CSRSS]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring3]]></category>
		<category><![CDATA[Undocumented API]]></category>
		<category><![CDATA[Windows 7]]></category>
		<category><![CDATA[Windows Vista]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[blog]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=349</guid>
		<description><![CDATA[Hello! It seems like half a year has passed since I published the Win32k.SYS system call table list on the net. During this time (well, it didn't take so long ) I managed to gather enough information to release yet another API list - this time, concerning an user-mode application - CSRSS (Client/Server Runtime SubSystem). [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">Hello!</p>
<p style="text-align: justify;">It seems like half a year has passed since I published the <a href="http://j00ru.vexillium.org/?p=257">Win32k.SYS system call table</a> list on the net. During this time (well, it didn't take so long <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ) I managed to gather enough information to release yet another API list - this time, concerning an <em>user-mode</em> application - CSRSS (<em>Client/Server Runtime SubSystem</em>). As a <em>relatively</em> common research subject, I think a table of this kind can make things easier for lots of people.</p>
<p><span id="more-349"></span></p>
<p style="text-align: justify;">Before presenting the table itself, I would like to gently introduce the mechanism in consideration to the reader. As the name itself states, CSRSS is a part of the Windows Environment Subsystem, running in user-mode. It is a single process (having the highest possible - SYSTEM - privileges), which mostly takes advantage of three dynamic libraries - <em>basesrv.dll</em>, <em>csrsrv.dll</em> and <em>winsrv.dll</em>. These files provide support for certain parts of the subsystem functionality, such as:</p>
<ul style="text-align: justify;">
<li>Updating the list of processes / threads running on the system</li>
<li>Handling the Console Window (i.e. special <em>text-mode</em> window) events</li>
<li>Implementing parts of the Virtual DOS Machine support</li>
<li>Supplying miscellaneous functions, such as <a href="http://msdn.microsoft.com/en-us/library/aa376868(VS.85).aspx"><em>ExitWindowsEx</em></a></li>
</ul>
<p style="text-align: justify;">Every <em>W</em>indows process running on the system does (or, at least, <em>should</em>) have an open connection with CSRSS, through the LPC / ALPC mechanism (depending on the system version) - which in turn stands for<em> (Advanced) Local Procedure Calls</em>. The <em>ntdll.dll</em> module provides multiple functions dedicated to the data exchange between user processes and CSRSS. Some of the examplary, exported names include, but are not limited to:</p>
<ul style="text-align: justify;">
<li>CsrClientConnectToServer</li>
<li>CsrGetProcessId</li>
<li>CsrClientCallServer</li>
<li>CsrAllocateMessageBuffer</li>
</ul>
<p style="text-align: justify;">Out of all the Csr~ wrapper functions, <em>CsrClientCallServer</em> is the most commonly used. One can find it's references in <a href="http://msdn.microsoft.com/en-us/library/ms682425(VS.85).aspx"><em>kernel32.CreateProcess</em></a>, <a href="http://msdn.microsoft.com/en-us/library/ms681944(VS.85).aspx"><em>kernel32.AllocConsole</em></a>, <a href="http://msdn.microsoft.com/en-us/library/ms681944(VS.85).aspx"><em>kernel32.FreeConsole</em></a>, <em><a href="http://msdn.microsoft.com/en-us/library/ms633492(VS.85).aspx">user32.EndTask</a> </em>and tens of other documented API functions. At a closer look, it is easy to notice that each time a call is made to <em>CsrClientCallServer</em>, an unique number is pushed on the stack, differing from routine to routine. An exemplary code snippet follows:</p>
<pre style="text-align: justify;">
<pre class="brush: php">.text:77E96D55           push    4
.text:77E96D57           push    20225h    &amp;amp;amp;lt;---------- HERE
.text:77E96D5C           mov     [ebp+var_7C], eax
.text:77E96D5F           push    0
.text:77E96D61           lea     eax, [ebp+var_A4]
.text:77E96D67           push    eax
.text:77E96D68           call    ds:__imp__CsrClientCallServer@16 ; CsrClientCallServer(x,x,x,x)</pre>
</pre>
<p style="text-align: justify;">As it turns out, these numbers are in fact <em>indexes</em> into special function pointer tables defined by the aforementioned libraries used by CSRSS. More specifically, a special routine - internally called <em>CsrApiRequestThread</em> - running in the context of a separate csrss.exe thread, is responsible for receiving user requests (that is - the <em>CsrApi ID</em> value together with the input buffer), handling it through  appropriate dispatch tables, and returning the results. This scheme is slightly different on Windows 7, but the general idea remains the same.</p>
<p style="text-align: justify;">In order to give the reader a better view of how many and what functions are supported on a specific OS version, as well as make cross-version comparisons easier, I've created two versions of the CsrAPI table:</p>
<ol style="text-align: justify;">
<li>A complete list of the functions present in the dispatch tables for most likely every NT-series system can be found @ <strong><a href="http://j00ru.vexillium.org/csrss_list/api_list.html"><span style="text-decoration: underline;">http://j00ru.vexillium.org/csrss_list/api_list.html</span></a></strong>.</li>
<li>A cross-version compatibility table, for the same system version set can be found @ <strong><a href="http://j00ru.vexillium.org/csrss_list/api_table.html"><span style="text-decoration: underline;">http://j00ru.vexillium.org/csrss_list/api_table.html</span></a></strong>.</li>
</ol>
<p style="text-align: justify;">I have done my best to make sure that the presented materials are correct and up-to-date. If, however, a mistake of any kinds is noticed, please let me know about this fact asap <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  It is possible that I will manage to fill the red-green table with corresponding api-numbers soon - I cannot guarantee this, though.</p>
<p style="text-align: justify;">From this point, I would like to thank all people who showed their interest and helped my with this tiny project - Thank You! Also, please drop me a line on whether you like the idea or not <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 26px; width: 1px; height: 1px; overflow: hidden; text-align: justify;"><em>CsrClientCallServer</em></div>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=349</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Windows Kernel Vulnerabilities continued &#8211; details</title>
		<link>http://j00ru.vexillium.org/?p=343</link>
		<comments>http://j00ru.vexillium.org/?p=343#comments</comments>
		<pubDate>Thu, 22 Apr 2010 14:34:19 +0000</pubDate>
		<dc:creator>j00ru</dc:creator>
				<category><![CDATA[CSRSS]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[OS Internals]]></category>
		<category><![CDATA[Ring0]]></category>
		<category><![CDATA[Windows Vista]]></category>
		<category><![CDATA[Windows XP]]></category>
		<category><![CDATA[hacking]]></category>
		<category><![CDATA[kernel]]></category>

		<guid isPermaLink="false">http://j00ru.vexillium.org/?p=343</guid>
		<description><![CDATA[And so it happened ;&#62; As I've written in this post, Gynvael Coldwind has just finished speaking about recent Windows Kernel Vulnerabilities on the Hack In The Box Dubai conference, taking place today. Unfortunately, because of the European air communication being disabled these days, the presentation was held remotely - one way or another, it [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;">And so it happened ;&gt; As I've written in <a href="http://j00ru.vexillium.org/?p=307&amp;lang=en">this</a> post, Gynvael Coldwind has just finished speaking about recent Windows Kernel Vulnerabilities on the <em>Hack In The Box Dubai</em> conference, taking place today. Unfortunately, because of the European air communication being disabled these days, the presentation was held remotely - one way or another, it can be considered very successful, imho.</p>
<p style="text-align: justify;">Thanks to the organisers, who publish the materials right after the speeches are over, all of the slides are now available at <a href="http://conference.hitb.org/hitbsecconf2010dxb/materials/">http://conference.hitb.org/hitbsecconf2010dxb/materials/</a>.</p>
<p style="text-align: justify;">Our presentation, containing the details of how the aforementioned kernel / CSRSS vulns work and can be exploited, can be found <strong><a href="http://conference.hitb.org/hitbsecconf2010dxb/materials/D2%20-%20Gynvael%20Coldwind%20-%20Case%20Study%20of%20Recent%20Windows%20Vulnerabilities.pdf">here</a></strong> <strong>(1.27MB).</strong></p>
<p style="text-align: justify;">I am not going to spoil anything more here - if you were not lucky to attend the Dubai conference, I strongly recommend the polish <a href="http://2010.confidence.org.pl/">CONFidence 2010</a> held in May (which I also mentioned already).</p>
<p style="text-align: justify;">Have fun! <img src='http://j00ru.vexillium.org/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://j00ru.vexillium.org/?feed=rss2&amp;p=343</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
