Windows CSRSS Write Up: Inter-process Communication (part 2/3)

A quick beginning note: My friend d0c_s4vage has created a technical blog and posted his first text just a few days ago. The post entry covers a recent, critical libpng vulnerability discovered by this guy; the interesting thing is that, among others, the latest Firefox and Chrome versions were vulnerable. Feel free to take a minute and read the article here.

Additionally, the video and mp3 recordings from the presentation performed by me and Gynvael on the CONFidence 2010 conference, are now publicly available on the official website: link (Case study of recent Windows vulnerabilities).


A majority of the LPC /supposedly an acronym for Local Inter-Process Communication rather than Local Procedure Calls, as stated in WRK/ basics have been described in the first post of Inter-process Communication chapter, together with the corresponding, undocumented native functions related to LPC Ports. As you already have the knowledge required to understand higher abstraction levels, today I would like to shed some light on the internal Csr~ interface provided by NTDLL and extensively utilized by the Win32 API DLLs (kernel32 and user32).


As explained previously, LPC is an (officially) undocumented, packet-based IPC mechanism. It basically relies on two things – a Port Object and internal LPC structures, such as _PORT_HEADER – both unexposed to the Windows API layer. Due to the fact that CSRSS implements its own protocol on top of LPC, it would become highly inconvenient (and impractical) for the win32 libraries to take care of both LPC and CSRSS internals, at the same time. And so, an additional layer between the port-related functions and high-level API was created – let’s call it Native Csr Interface.

The medium level of the call chain provides a set of helper functions, specifically designed to hide the internals of the communication channel from high-level API implementation. Therefore, it should be theoretically possible to re-implement the Csr-Interface using a different communication mechanism with similar properties, without any alterations being applied on the API level. This has been partially accomplished by replacing the deprecated LPC with an improved version of the mechanism – Advanced / Asynchronous LPC on modern NT-family systems (Vista, 7).

In this post, the precise meaning, functionalities and definitions of the crucial Csr~ routines will be focused on. After reading the article, one should be able to recognize and understand specific CSR API calls found inside numerous, documented functions related to console management, process / thread creation and others.

Connection Initialization

What has already been mentioned is the fact that every application belonging to the win32-subsystem is connected to the Windows Subsystem process (CSRSS) at its startup, by default. Although it is technically possible to disconnect from the port before the program is properly terminated, such behavior is beyond the scope of this post entry. However, some details regarding a security flaw related to CSRSS-port disconnection in the context of a live process, can be found here and here (discovered by me and Gynvael).

From this point on, it will be assumed that when the process is given execution (i.e. Entry Point, imported module’s DllMain or TLS callback is called), the CSRSS connection is already established. And so, the question is – how, and where the connection is set up during the process initialization. This section provides answers for both of these questions.

Opening named LPC port

During a process creation, numerous parts of the system come into play and perform their part of the job. It all starts with the parent application calling an API function (CreateProcess) – the execution then goes through the kernel, a local win32 subsystem, and finally – ring-3 process self-initialization (performed by the system libraries). A step-by-step explanation of the Windows process creation can be found in the Windows Internals 5 book, Chapter “Processes, Threads and Jobs”.

As the CSRSS connection is not technically crucial for the process to exist (and execute), it can be performed later than other parts of the process initialization. And so, the story of establishing a connection with the subsystem begins in the context of a newly-created program – more precisely, inside the kernel32 entry point (kernel32!BaseDllInitialize). At this point, the CSRSS-related part of the routine performs the following call:


  CsrClientConnectToServer(L"\\Windows", BASESRV_INDEX, ...);


thus forwarding the execution to the ntdll.dll module, where a majority of the subsystem-related activities are performed. Before we dive into the next routine, two important things should be noted here:

  1. The Base Dll (kernel32) has complete control over the Port Object directory and makes the final decision regarding the referenced port’s name prefix. As it turns out, it is also possible for a different Object Directory to be used – let’s take a look at the following pseudo-code listing:


    The “SessionId” symbols represents a global DWORD variable, initialized inside the BaseDllInitialize function, as well:

     mov     eax, large fs:18h
     mov     eax, [eax+30h]
     mov     eax,  [eax+1D4h]
     mov     _SessionId, eax

    … translated to the following high-level pseudo-code:

    SessionId = NtCurrentTeb()->SessionId;

    If one takes a look into the PEB structure definition, he will certainly find the variable:

    kd> dt _PEB
       +0x154 TlsExpansionBitmapBits : [32] Uint4B
       +0x1d4 SessionId        : Uint4B
       +0x1d8 AppCompatFlags   : _ULARGE_INTEGER
  2. If one decides to connect to the win32 subsystem, he must specify a particular ServerDll to connect to (csrsrv, basesrv, winsrv); the identification number is be passed as the second argument of CsrClientConnectToServer. As can be seen, kernel32 specifies the BASESRV_INDEX constant, as it desires to connect to a certain module – being basesrv in this case. Basesrv.dll is the kernel32 equivalent on the subsystem side – a Csr connection between these two modules is required for some of the basic win32 API calls to work properly.

    On the other hand, all of the console-management functionality is implemented by winsrv (to be exact – the consrv part of the module). And so – in order to take advantage of functions, such as AllocConsole, FreeConsole, SetConsoleTitle or WriteConsole – a valid connection with winsrv is also required. Fortunately – kernel32 remembers about it and issues a call to another internal function – ConDllInitialize() – after the LPC Port connection is successfully established. The routine’s obvious purpose is to set up the console-related structures inside the Base dll image, and use the CsrClientConnectToServer function with the second argument set to CONSRV_INDEX.

When we make a step into CsrClientConnectToServer and analyze further, a great amount of CSRSS-related initialization code surrounds us. Don’t worry – a huge part of the routine deals with user-mode structures and other irrevelant stuff – our interest begins, where the following call is made:

  ReturnCode = CsrpConnectToServer(ObjectDirectory); // ObjectDirectory is kernel32-controlled
    return (ReturnCode);

As the above indicates, the global CsrPortHandle variable is compared with zero – if this turns out to be true, CsrpConnectToServer is called, taking the object directory string as its only argument. So – let’s face another routine ;>

The proc starts with the following code:

CsrPortName.Length    = 0;
CsrPortName.MaxLength = 2*wcslen(ObjectDirectory)+18;
CsrPortName.Buffer    = RtlAllocateHeap(CsrHeap,NtdllBaseTag,CrsPortName.MaxLength);


Apparently, the final Port Object name is formed here, and stored inside a local “UNICODE_STRING CsrPortName” structure. Next then, a special section is created, using an adequate native call:

LARGE_INTEGER SectionSize = 0x10000;
NtStatus = NtCreateSection(&SectionHandle, SECTION_ALL_ACCESS, NULL, &SectionSize, PAGE_READWRITE, SEC_RESERVE, NULL);

  return NtStatus;

This section is essential to the process<->subsystem communication, as this memory area is mapped in both the client and win32 server, and then used for exchanging large portions of data between these two parties. And so, when the section is successfully created, the routine eventually tries to connect to the named port!

/* SID Initialization */
NtStatus = RtlAllocateAndInitializeSid(...,&SystemSid);
  return NtStatus;

NtStatus = NtSecureConnectPort(&CsrPortHandle,&CsrPortName,...);

For the sake of simplicity and reading convenience, I’ve stripped the remaining arguments from the listing; they describe some advanced connection characteristics, and are beyond the scope of this post. When everything is fine up to this point, we have an established connection (yay, CSRSS accepted our request) and an open handle to the port. Therefore, we can start sending first packets, in order to let CSRSS (and its modules – ServerDlls) know about ourselves.

So – after returning back to ntdll!CsrClientConnectToServer:

NtStatus = CsrpConnectToServer(ObjectName);
  return NtStatus;

the following steps are taken:

  CaptureBuffer = CsrAllocateCaptureBuffer(1,InformationLength);
CsrClientCallServer(&Message, CaptureBuffer, CSR_API(CsrpClientConnect), sizeof(ConnStructure));

First of all, the ConnectionInformation pointer is checked – in case it’s non-zero, the CsrAllocateCaptureBuffer, CsrAllocateMessagePointer and RtlMoveMemory functions are called, respectively. The purpose of these operations is to move the data into a shared heap in such a way, that both our application and CSRSS can easily read its contents. After the “if” statement, a first, real message is sent to the subsystem using CsrClientCallServer, of the following prototype:


For a complete, cross-version compatible table and/or list of Csr APIs, check the following references: CsrApi List and CsrApi Table. And so, in the above snippet, the “CsrpClientConnect” API is used, providing additional information about the connecting process. This message is handled by an internal csrsrv.CsrSrvClientConnect routine, which redirects the message to an adequate callback function, specified by the ServerDll being connected to (in this case – basesrv!BaseClientConnectRoutine).

After sending the above message, the connection between the client- and server-side DLLs (i.e. kernel32 and basesrv) can be considered fully functional.

As it turns out, parts of the execution path presented above can be also true for CSRSS itself! Because of the fact that ntdll!CsrClientConnectToServer can be reached from inside the subsytem process, the CsrClientConnectToServer routine must handle such case properly. And so – before any actions are actually taken by the function, the current process instance is checked, first:

NtHeaders = RtlImageHeader(NtCurrentPeb()->ImageBaseAddress);
CsrServerProcess = (NtHeaders->OptionalHeader.Subsystem == IMAGE_SUBSYSTEM_NATIVE);

  // Take normal steps
  // Do nothing, except for the _CsrServerApiRoutine pointer initialization
  _CsrServerApiRoutine = GetProcAddress(GetModuleHandle("csrsrv"),"CsrCallServerFromServer");

Apparently, every process connecting to the LPC Port that has the SUBSYSTEM_NATIVE header value set, is assumed to be an instance of CSRSS. This, in turn, implies that CSRSS is the only native, system-critical process which makes use of the Csr API calls.

Data tranmission

Having the connection up and running, a natural order of things is to exchange actual data. In order to achieve this, one native call is exported by ntdll – the CsrClientCallServer function, already mentioned in the text. Because of the fact that each Csr API requires a different amount of input/output data (while some don’t need these, at all) from the requestor, as well as due to the LPC packet-length limitations, the messages can be sent in a few, different ways.

In general, all of the CSR-supported packets can be divided into three, main groups: empty, short, and long packets. Based on the group a given packet belongs to, it is sent using an adequate mechanism. This section provides a general overview of the data transmission-related techniques, as well as example (practical) use of each type.

Empty packets

  • Description

    “Empty packets” is a relatively small group of purely-informational messages, which are intended to make CSRSS perform a specific action. These packets don’t supply any input data – their API ID is the only information needed by the win32 subsystem. A truely-empty packets don’t generate any output data, either.

  • Sending

    Due to the fact that “empty packets” don’t supply any additional information, the only data being transferred is the internal _PORT_HEADER structure. The address of a correctly initialized PortHeader should be then passed as the first CsrClientCallServer parameter. The shared section doesn’t take part while sending and handling these packets. What is more, no serious input validation is required by the API handler, because there is no input in the first place. The routine is most often supposed to perform one, certain action and then return. Unsupported APIs, statically returning the STATUS_UNSUCCESSFUL or STATUS_NOT_SUPPORTED error codes, can also be considered “empty packets”, as they always behave the same way, regardless of the input information.

  • Examples

    One, great example of an empty-packet is winsrv!SrvCancelShutdown. As the name implies, the APIs purpose is pretty straight-forward – cancelling the shutdown. Seemingly, no input / output arguments are necessary:

    ; __stdcall SrvCancelShutdown(x, x)
    _SrvCancelShutdown@8 proc near
      call    _CancelExitWindows@0 ; CancelExitWindows()
      neg     eax
      sbb     eax, eax
      and     eax, 3FFFFFFFh
      add     eax, 0C0000001h
      retn    8
    _SrvCancelShutdown@8 endp

    As shown above, the handler issues a call to the CancelExitWindows() function, and doesn’t make use of any of the two parameters. Another CsrApi function of this kind is basesrv!BaseSrvNlsUpdateCacheCount, always performing the same task:

    ; __stdcall BaseSrvNlsUpdateCacheCount(x, x)
    _BaseSrvNlsUpdateCacheCount@8 proc near
      cmp     _pNlsRegUserInfo, 0
      jz      short loc_75B28AFC
      push    esi
      mov     esi, offset _NlsCacheCriticalSection
      push    esi
      call    ds:__imp__RtlEnterCriticalSection@4 ; RtlEnterCriticalSection(x)
      mov     eax, _pNlsRegUserInfo
      inc     dword ptr [eax+186Ch]
      push    esi
      call    ds:__imp__RtlLeaveCriticalSection@4 ; RtlLeaveCriticalSection(x)
      pop     esi
      xor     eax, eax
      retn    8
    _BaseSrvNlsUpdateCacheCount@8 endp

    A few more examples can be found – looking for these is left as an exercise for the reader.

Short packets

  • Description

    The “short packets” group describes a great part of the Csr messages. Every request, passing actual data to / from CSRSS but fitting in the LPC-packet length restriction belongs to this family. And so – most fixed-size (i.e. these, that don’t contain volatile text strings or other, possibly long chunks of data) structures are indeed smaller than the 304-byte limitation.

  • Sending

    As this particular type requires additional data to be appended at the end of the _PORT_MESSAGE structure, a set of API-specific structs has been created. All of these types begin with the standard LPC PortMessage header, and then specify the actual variables to send, e.g.:

      struct _PORT_HEADER PortHeader;
      BOOL  Boolean;
      ULONG Data[0x10];
      DWORD Flags;

    Such amount of data can be still sent in a single LPC packets. And so, a custom structure, beginning with the _PORT_HEADER field must be used as a first CsrClientCallServer argument. The Capture Buffer technique remains unused, thus the second parameter should be set to NULL.

  • Examples

    As for the examples, it is really easy to list a couple:

    1. winsrv!SrvGetConsoleAliasExesLength
    2. winsrv!SrvSetConsoleCursorMode
    3. winsrv!SrvGetConsoleCharType
    4. basesrv!BaseSrvExitProcess
    5. basesrv!BaseSrvBatNotification

    The above handlers take a constant number of bytes as the input, and optionally return some data (of static length, as well).

Long packets

  • Description

    From the researcher’s point of view, the “long packets” group is doubtlessly the most interesting one. Due to the fact that they are used to send/receive large amounts of data (beyond the maximum size of a LPC message), a special mechanism called a Shared Section is used for transferring these messages. Let’s take a look at the details.

  • Initialization

    Do you remember the ntdll!CsrpConnecToServer function? At some point, between forming the port name and establishing the connection, we could see a weird NtCreateSection(0x10000) call. As it turns out, this section is a special memory area, mapped in both the client and server processes. After creating the section, its handle is passed to CSRSS through the NtSecureConnectPort native call. Once the win32 subsystem receives a connection request and accepts it, the section is mapped into the server’s virtual address space. Next then, CSRSS provides its client with some basic memory mapping information – such as the server-side base address and view size. Based on the supplied info, a few global variables are initialized (CsrProcessId, CsrObjectDirectory), with CsrPortMemoryRemoteDelta being the most important one for us:

    CsrPortMemoryRemoteDelta = (CSRSS.BaseAddress - LOCAL.BaseAddress);

    Basically, the above variable is filled with the distance between the server- and user- mappings of the shared memory. This information is going to appear to be crucial to exchange information, soon. Furthermore, a commonly known structure called “heap” is created on top of the allocation:

    CsrPortHeap = RtlCreateHeap(0x8000u, LOCAL.BaseAddress, LOCAL.ViewSize, PageSize, 0, 0);

    From this point on, the shared heap is going to be used thorough the whole communication session, for passing data of various size and content. The functions taking advantage of the heap are:

    1. CsrAllocateCaptureBuffer
    2. CsrFreeCaptureBuffer
    3. CsrAllocateMessagePointer (indirect)
    4. CsrCaptureMessageBuffer (indirect)
    5. CsrCaptureMessageString (indirect)

    All of the above routines are apparently related to the “Capture Buffer” mechanism, described in the following section.

    • Capture Buffers

      In order to fully understand the idea behind Capture Buffers, one should see it as a special box, a container designed to hold data in such a way, that it can be easily accessed by both sides of the communication (i.e. be offset-based rather than VA-based etc). Such structure is determined by the following characteristics:

      1. Number of memory blocks: one Capture Buffer is able to hold mulitple data blocks – e.g. a couple of strings, describing a specific object (like a console window).

      2. Total size: the total size of the container, including its header, pointer table, and the data blocks themselves.

      So – these “data boxes” are used to transfer data between the two parties. In order to illustrate this complex the mechanism, suppose we’ve got the following structure:

      struct CSR_MESSAGE
        _PORT_HEADER PortHeader;
        LPVOID FirstPointer;
        LPVOID SecondPointer;
        LPVOID ThirdPointer;
        LPVOID ForthPointer;
        LPVOID FifthPointer;
      } m;

      The above packet is going to be sent to CSRSS after the initialization takes place. Having the above declared, we can take a closer look at each of the CA-related functions:

      1. CsrAllocateCaptureBuffer(ULONG PointerCount, ULONG Size)

        Allocates an adequate number of bytes from CsrHeap:

        (Size + sizeof(CAPTURE_HEADER) + PointerCount*sizeof(LPVOID))

        … and returns the resulting pointer to the user. Right after the allocation, the CaptureBuffer structure contents look like this:

        CaptureBuffer = AllocateCaptureBuffer(5,20);

        Due to the fact that no messages have been allocated from the CaptureBuffer yet, Capture.Memory is a single memory block, while the Capture.Pointers[] array remains empty.

      2. CsrFreeCaptureBuffer(LPVOID CaptureBuffer)

        Frees a given CaptureBuffer memory area, by issuing a simple call:

      3. CsrAllocateMessagePointer(LPVOID CaptureBuffer, ULONG Length, PVOID* Pointer)

        The routine allocates “Length” bytes from the CaptureBuffer’s general memory block. The address of the newly allocated block is stored inside *Pointer, while Pointer is put into one of the Capture.Pointers[] items.



      4. Having three (out of twenty) bytes allocated, one can copy some data:


        After all of the five allocations are made, the CaptureBuffer structure layout can look like this:

        It is important to keep in mind that the pointers into CaptureBuffer.Memory[] must reside in the actual LPC message being sent to the server – the reason of this requirement will be disclosed, soon :-)

      5. CsrCaptureMessageBuffer(LPVOID CaptureBuffer, PVOID Buffer, ULONG Length, PVOID *OutputBuffer)

        The routine is intended to simplify things for the developer, by performing the CaptureBuffer-allocation and copying the user specified data at the same time.


      6. CsrCaptureMessageString(LPVOID CaptureBuffer, PCSTR String, ULONG Length, ULONG MaximumLength, PSTRING OutputString)

        Similar to the previous routine – allocates the requested memory space, and optionally copies a specific string into the new allocation.

      After the Capture Buffer is allocated and initialized (all N memory blocks are in use), it’s time to send the message, already! This time, we fill in the second parameter of the CsrClientCallServer routine with our CaptureBuffer pointer. When the following call is issued:


      … and the 2nd argument is non-zero, a couple of interesting conversions are taking place in the above routine. This is the time when the CsrPortMemoryRemoteDelta value comes into play. First of all, the data-pointers residing in the CSR_MESSAGE structure (&m) are translated to a server-compatible virtual address, by adding the RemoteDelta. From now on, the m.FirstPointer, m.SecondPointer, …, m.FifthPointer are invalid in the context of the local process, but are correct in terms of server-side memory mapping.

      for( UINT i=0;i<PointerCount;i++ )
        *CaptureBuffer.Pointers[i] += CsrPortMemoryRemoteDelta;

      Furthermore, the CaptureBuffer.Pointers[] array is altered, using the following pseudo-code:

      for( UINT i=0;i<PointerCount;i++ )
        CaptureBuffer.Pointers[i] -= &m;

    So, to sum everything up – after the address/offset translation is performed, we’ve got the following connection between the LPC message and shared buffer:

    • m.CaptureBuffer points to the server’s virtual address of the CaptureBuffer base,
    • CaptureBuffer->Pointers[] contain the relative offsets of the data pointers, i.e. (&m+CaptureBuffer->Pointers[0]) is the pointer to the first capture buffer,
    • (&m+CaptureBuffer->Pointers[n]) points to the server’s virtual address of the n-th capture buffer.

    Or, the same connection chain illustrated graphically looks like this:

    When both the local CSR_MESSAGE and shared CaptureBuffer structures are properly modified, ntdll!CsrClientCallServer calls the standard NtRequestWaitReplyPort LPC function, and waits for an optional output. When the native calls returns, all of the modified struct fields are restored to their original values, so that the user (or, more likely – win32 APIs) can easily read the error code and optional subsytem’s output.

    Due to the fact that the VA- and offset-related conversions are non-trivial to be explained in words, I strongly advice you to check the information presented in this post by yourself. This should give you even better insight at how the cross-process data exchange reliability is actually achieved.

  • Sending

    What’s been already described – if one wants to make use of large data transfers, he must allocate a CaptureBuffer, specifying the number of memory blocks and the total byte count, fill it with the desired data (using CsrCaptureMessageBuffer or CsrCaptureMessageString), and call the CsrClientCallServer, supplying an LPC structure, (containing the data-pointers into CaptureBuffer) as the first parameter, and the CaptureBuffer itself – as the second one. The rest of the job is up to ntdll. Please keep in mind that one CaptureBuffer can be technically utilized only once – and therefore, it should be freed after its first (and last) usage, using CsrFreeCaptureBuffer.

  • Examples

    In this particular case, every CsrApi handler using the CsrValidateMessageBuffer import makes a good example, let it be:

    • winsrv!SrvAllocConsole
    • winsrv!SrvSetConsoleTitle
    • winsrv!SrvAddConsoleAlias

    … and numerous other functions, which are pretty easy to find by oneself.


This post entry aimed to briefly present the “Native Csr Interface” – both in terms of the functions, structures and mechanisms playing some role in the Inter-Process Communication. As you must have noted, only client-side perspective has been described here, as the precise way of CSRSS receiving, handling and responding to the request is a subject for another, long article (or two). And so – if you feel like some important Csr~ routines should have been described or mentioned here – let me know. On the other hand, I am going to cover the remaining, smaller functions (such as CsrGetProcessId) in one, separate post called CSRSS Tips & Tricks.

Watch out for (part 3/3) and don’t hesitate to leave constructive comments! ;)

9 thoughts on “Windows CSRSS Write Up: Inter-process Communication (part 2/3)”

  1. It stands for local procedure call, not local ipc.

    Also, a lot of this is changed on vista and later due to alpc.

  2. Can we drop the LPC acronym debate already. Anyone with WRK or 2k leaked src access can grep for “Local Inter-Process Communication (LPC) connection system services”

  3. Thank you for this excellent post. Recalling Gary Nebbet’s famour implementation of fork(), and the problem it mentions of having to hardcode the address of CsrpConnectToServer:

    would a custom CstClientConnectToServer — based on what’s outlined in your post, and aware of the forking condition (thus skipping the test if (!CsrPortHandle) — actually solve this problem? In other words, does it matter whether the instructions within CsrClientConnectToServer are called from inside ntdll.dll, or from our own custom library?

Comments are closed.