Thursday, December 24, 2015

IEFuzz - A Static Internet Explorer Fuzzer

Today I'm sharing an IE Fuzzer, which was developed almost from scratch. Like many other softwares, browsers can also be fuzzed in two ways, a) Static and b) Dynamic.

Dynamic browser fuzzers are very popular, due to its speed, since they are purely written in JavaScript. However one common problem software security auditors face, while fuzzing browser dynamically, is 'Crash Reproduction'. You have to very careful while crafting your JS browser fuzzer (by placing logging code in right place), otherwise crash will not be reproducible.

Another option is, Static fuzzer. If you are fuzzing browsers using Static Test Cases, in 99% cases 'A crash' == 'A reproducible crash'.

How does 'IEFuzz' work?

  1. Launch IE
  2. Attach 'iexplore.exe' to debugger(pydbg) - To monitor any type of crash(Both in parent and child process).
  3. Generate a test case (html + javascript).
  4. Load the test case locally as file (file://c:/fuzzer/testcases/temp.html)in IE using win32COM.
  5. If no crash, re-generate a html test case and reload the test case using win32COM.(Note, we are not closing, re-opening IE here, We are just refreshing the same page but code/content of the page is different in every time. Which saves time significantly )
  6. In case of any kind of access violation, copy/save the test case to separate folder,  and kill IE completely.
  7. Go to step 1

This Static IE fuzzer is written in python. And following modules were used.

  1. pywin32com - Load / Reload *.html Test Cases
  2. pydbg - Monitor IE for Access Violation / Guard Page Violation.
  3. paimei - For crash dump generation.

Required Configuration Changes in IE

To run this Fuzzer you have to make following changes in IE: 

1. Since this fuzzer loads the test cases locally (eg. file://c:/fuzzer/testcases/temp.html) as .html file.
You must turn off IE's ActiveX warning prompt by following below instructions.

Tools (menu) -> Internet Options -> Security (tab) -> Custom Level (button) -> Disable Automatic prompting for ActiveX controls.

2. You also need to disable IE protected mode to be able to control Internet Explorer using Python 'win32com'. Please be aware of the risks.

 -> Internet Options -> Security -> Trusted Sites    : Low
 -> Internet Options -> Security -> Internet         : Medium + unchecked Enable Protected Mode
 -> Internet Options -> Security -> Restricted Sites : unchecked Enable Protected Mode

Writing Test Cases:

You can write you own static test case generator for this fuzzer in python. You have to place it inside /TestCases folder. For your reference one sample is given here 'TestCases/SampleTestCase.py'. While writing test cases do remember, it should have a 'TestCase' class and 'getFinalTestCase()' method in it. This getFinalTestCase() method should return the entire html page. 

In case of dynamic fuzzer, attributes of different html elements extracted from object and fuzzed on the fly at runtime , since its a static fuzzer we can pre define html elements and their attributes our test case as python dict.

attr = {'CANVAS':['height','width','getContext', ... , ... , ... ]}

For this attribute list generation, one JavaScript application is provided here : MiscTools/Generate_Elements_Dict.html

Source Code:

Source code of IEFuzz is available for download @ my github page.

Licence:

This software is licenced under BEER WARE licence although the following libraries are included with 'IEFuzz' and are licensed separately.

Running This Fuzzer:


One video demo is available here, on how to run this fuzzer and reproduce crashes.



Happy Fuzzing :) :)

Tuesday, February 17, 2015

Walking Heap Using Pydbg

I'm a big fan of Pydbg. Although it has many awesome features , it also has few limitations. One of them is lack of control over process heap. For a long time I'm thinking of writing something which makes Heap Manipulation / Heap parsing / Traversing using pydbg little easier for reverse engineers. So finally last weekend I wrote couple of small py scripts which can parse Windows 7 process heaps on the fly.
In this blog post I'm going to share one of them.

This is the simplest implementation of HeapWalk() API based on pydbg. Heap walk API enumerates the memory blocks in the specified heap. If you are not very familiar with HeapWalk() API this page has a very good example in C++.

Right now best available tool available for heap analysis is windbg. The script I'm going to share  does something similar to windbg's "!heap -a 0xmyheaphandle" command.




You can use the function HeapWalk() [@ Line 103] as break point hander in your pydbg script. In below example actually I did something similar.

First I'm running an application (on 32 bit Windows 7) which uses user32!MessageBoxA API somewhere.

After that I'm attaching my pydbg script with that process and setting up a break point at user32!MessageBoxA and also setting up HeapWalk() as the breakpoint handler.

Now whenever the application will make a call to MessageBoxA api our breakpoint handler HeapWalk() will be invoked and it will start traversing all the available process heap and their segments.

Script 1:


The output of this script will be something similar: https://gist.github.com/debasishm89/1264d7a6726b9e910a5d

Since this script will give you addresses of all all heap blocks and their size, now you should have more control over process heap. You should be able to search for string/data / byets / pointer in process heaps very easily.

Thank you for reading. Hope you've enjoyed :)

Cheers,

Wednesday, January 28, 2015

qHooK - Not Just a Win32 API Hooking Script

Hello everyone. Hope every one is doing good. After a long gap I'm about to post something. Sometimes its easier to build something on your own, than finding something similar which has already been developed in the past. I'm not sure whether any script / tool already present in the wild which does the same, but I definitely needed a tool / script, which can reduce efforts of analysing unknown exploits/ shellcode. I developed this tool one and half years back to mainly analyse shellcodes / exploits etc etc. Obviously when I wrote this there was no name, I just given this a name "qHooK" before writing this post :)

So what it does ?

Its very simple and straight forward python script (dependent on pydbg) which hooks user defined Win32 APIs in any process and prepare a CSV report with various interesting information which can  help reverse engineer to track down / analyse unknown exploit samples / shellcode. Please refer to demo video.

qHooK Final CSV Report:





Video Demo(With Voice):

I guess I've become pretty lazy so I'm not going to write each and every thing about this script. Just adding a video demo (with voice) which explains few real life scenarios, where it may help you. Sorry about my weak voice. My laptop mic sucks :( cant help.



Source Code:

https://github.com/debasishm89/qHooK

Saturday, July 5, 2014

Releasing Stupid v0.1 - The Dumbest File Format Fuzzer (Python+Pydbg)

I developed Stupid in late 2011 to automate fuzzing and problem/app fault detection process of different file formats( mainly Music/Video players etc). I've been receiving many email from my readers asking me to release POC of a python + pydbg fuzzer. So today I'm very happy to make this small yet effective Fuzzer open to everyone. This is highly prototypal and I recommend to rewrite/modify the test case generator sub routine to make this fuzzer more effective.


Happy fuzzing guys. If you are lucky enough to find any zero day using this fuzzer, you can drop me a TY email or buy me a beer in return if we meet someday :)


Source Code:


Stupid source code is available @   https://github.com/debasishm89/Stupid

Licence:






This software is licenced under a Beerware licence although the following libraries are included with Stupid and are licensed separately.


  • pydbg
  • paimei - https://github.com/pedramamini/paimei

Running this Fuzzer:

Stupid was developed and tested with Win32 Python 2.7(x86). So it's recommended to use the same version of python. Also make sure pydbg(x86) is installed on the system.

You need to provide the target application binary path (.exe) and at least one base file to run this fuzzer. You can to modify the configuration section of "stupid.py" as per your requirement.

Test Case Generation:


mutate() routine is responsible for generating test cases from given bases files. It has two sub parts:
  • Bitflip
  • Random Byte Flip

You may want to change / modify these routines to make this fuzzer more effective. ;)

Monitoring:


To monitor target application for different types of crashes (access violation), Stupid uses pydbg(Python debugger).It also uses "utils" of https://github.com/pedramamini/paimei framework to collect crash information which can be used later to identify/distinguish interesting app crashes. Sample crash synopsis file is below,




Reproducing Crashes:


Crash files and crash information can be found in "Crashes" folder which can be used to reproduce app crashes.

Saturday, April 19, 2014

Attacking Audio "reCaptcha" using Google's Web Speech API

I had a fun project months back, Where I had to deal with digital signal processing and low level audio processing. I was never interested in DSP and all other control system stuffs, But when question arises about breaking things, every thing becomes interesting :) . In this post i'm going to share one technique to fully/ partially bypass reCaptcha test. This is not actually a vulnerability but its better if we call it "Abuse of functionality".

Disclaimer : Please remember this information is for Educational Purpose only and should not be used for malicious purpose. I will not assume any liability or responsibility to any person or entity with respect to loss or damages incurred from information contained in this article.

1. What is Captcha

A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot. The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University.


2. What is Re-captcha

reCAPTCHA is a free CAPTCHA service by Google, that helps to digitize books, newspapers and old time radio shows. More details can be found here.


3. Audio reCaptcha

reCAPTCHA also comes with an audio test to ensure that blind users can freely navigate.

4. Main Idea: Attacking Audio reCaptcha using Google's Web Speech API Service





5. Google Web Speech API

Chrome has a really interesting new feature for HTML5 speech input API. Using this user can talk to computer using microphone and Chrome will interpret it. This feature is also available for Android devices. If you are not aware of this feature you can find a live demo here.

https://www.google.com/intl/en/chrome/demos/speech.html

I was always very curious about the Speech recognition API of chrome. I tried sniff the api/voice traffic using Wireshirk but this API uses SSL. :(.

So finally I started browsing the Chromium source code repo. Finally I found exactly what I wanted.

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It pretty simple, First the audio is collected from the mic, and then it posts it to Google web service, which responds with a JSON object with the results.  The URL which handles the request is :

https://www.google.com/speech-api/v1/recognize

Another important thing is this api only accepts flac audio format.

6. Programatically Accessing Google Web Speech API(Python)

Below python script was written to send a flac audio file to Google Web Speech API and print out the JSON response.

./google_speech.py hello.flac


'''
Accessing Google Web Speech API using Pyhon
Author : Debasish Mandal

'''

import httplib
import sys

print '[+] Sending clean file to Google voice API'
f = open(sys.argv[1])
data = f.read()
f.close()
google_speech = httplib.HTTPConnection('www.google.com')
google_speech.request('POST','/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US',data,{'Content-type': 'audio/x-flac; rate=16000'})
print google_speech.getresponse().read()
google_speech.close()



7. Thoughts on complexity of reCaptcha Audio Challenges

While dealing with audio reCaptcha, you may know , it basically gives two types of audio challenges. One is pretty clean and simple (Example : https://dl.dropboxusercontent.com/u/107519001/easy.wav) . percentage of noise is very less in this type of challenges. 

Another one is very very noisy and its very difficult for even human to guess (Example : https://dl.dropboxusercontent.com/u/107519001/difficult.wav). Constant hissss noise and overlapping voice makes it really difficult to crack human. You may wanna read this discussion on complexity of audio reCapctha.

In this post I will mainly cover the technique / tricks to solve the easier one using Google Speech API. Although I've tried several approaches to solve the complex one, but as I've already said, its very very had to guess digits even for human :( .

8. Cracking the Easy Captcha Manually Using Audacity and Google Speech API

Google Re-captcha allows user to download audio challenges in mp3 format. And Google web speech API accepts audio in flac format. So if we just normally convert the mp3 audio challenge to flac format of frame rate 16000 its does not work :( .  Google Chrome Speech to text api does not respond to this sound.

But after some experiment and head scratching, it was found that we can actually make Google web speech api convert the easy captcha challenge to text for us, if we can process the audio challenge little bit. In this section i will show how this audio manipulation can be done using Audacity.

To manually verify that first I'm going to use a tool called Audacity to do necessary changes to the downloaded mp3 file. 

Step 1: Download the challenge as mp3 file.
Step 2: Open the challenge audio in Audacity.



Step 3: Copy the first digit speaking sound from main window and paste it in a new window. So here we will only have a one digit speaking sound.

Step 4: From effect option make it repetitive once. (Now It should speak the same digit twice).

Lets say for example if the main challenge is  7 6 2 4 6, Now we have only first digit challenge in wav format which having the digit 7 twice.





Step 5: Export the updated audio in WAV format.
Step 6: Now convert the wav file to flac format using sox tool and send it to Google speech server using the python script posted in section 6. And we will see something like this.

Note: In some cases little bit amplification might be required if voice strength is too low.

debasish@debasish ~/Desktop/audio/heart attack/final $ sox cut_0.wav -r 16000 -b 16 -c 1 cut_0.flac lowpass -2 2500
debasish@debasish ~/Desktop/audio/heart attack/final $ python send.py cut_0.flac 


Great! As you can see first digit of the audio challenge has been resolved by Google Speech. :) :) :) Now in same manner we can solve the entire challenge. In next section we will automate the same thing using python and it's wave module. 

9. Automation using Python and it's WAVE Module

Before we jump into processing of raw WAV audio using low level python API, its important to have some idea of how digital audio actually works. In above process we've extracted the most louder voices using audacity but to do it automatically using python, we must have some understanding of how digital audio is actually represented in numbers.

9.1. How is audio represented with numbers

There is an excellent stackoverflow post which explains the same. In short ,we can say audio is nothing but a vibration. Typically, when we're talking about vibrations of air between approximately 20Hz and 20,000Hz. Which means the air is moving back and forth 20 to 20,000 times per second. If somehow we can measure that vibration and convert it to an electrical signal using a microphone, we'll get an electrical signal with the voltage varying in the same waveform as the sound. In our pure-tone hypothetical, that waveform will match that of the sine function.

Now, we have an analogue signal, the voltage. Still not digital. But, we know this voltage varies between (for example) -1V and +1V. We can, of course, attach a volt meter to the wires and read the voltage.  Arbitrarily, we'll change the scale on our volt meter. We'll multiple the volts by 32767. It now calls -1V -32767 and +1V 32767. Oh, and it'll round to the nearest integer.

Now after having a set of signed integers we can easily draw an waveform using the data sets.

X axis -> Time
Y axis -> Amplitude (signed integers)



Now, if we attach our volt meter to a computer, and instruct the computer to read the meter 44,100 times per second. Add a second volt meter (for the other stereo channel), and we now have the data that goes on an audio CD. This format is called stereo 44,100 Hz, 16-bit linear PCM. And it really is just a bunch of voltage measurements.

9.2. WAVE File Format walk through using Python

As an example lets open up a very small wav file with a hex editor.

  

9.3. Parsing the same WAV file using Python

The wave module provides a convenient interface to the WAV sound format. It does not support compression/decompression, but it does support mono/stereo. Now we are going to parse the same wav file using python wave module and try to relate what we have just seen in hex editor.

Let's write a python script:

import wave 
f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1) 
    print single_frame.encode('hex') 
f.close()

Line 1 imports python wav module.
Line 2: Opens up the sample.wav file.
Line 3: getparams() routine returns a tuple (nchannels, sampwidth, framerate, nframes, comptype, compname), equivalent to output of the get*() methods.
Line 4: getnframes() returns number of audio frames.
Line 5,6,7: Now we are iterating through all the frames present in the sample.wav file and printing them one by one.
Line 8: Closes the opened file

Now if we run the script we will find something like this:

[+] WAV parameters (1, 2, 44100, 937, 'NONE', 'not compressed')
[+] No. of Frames 937
[+] Sample 0 = 62fe    <- Sample 1
[+] Sample 1 = 99fe   <- Sample 2
[+] Sample 2 = c1ff    <- Sample 3
[+] Sample 3 = 9000
[+] Sample 4 = 8700
[+] Sample 5 = b9ff
[+] Sample 6 = 5cfe
[+] Sample 7 = 35fd
[+] Sample 8 = b1fc
[+] Sample 9 = f5fc
[+] Sample 10 = 9afd
[+] Sample 11 = 3cfe
[+] Sample 12 = 83fe
[+] ....
and so on,

It should make sense now. In first line we get number of channels, sample width , frame/sample rate,total number of frames etc etc. Which is exact same what we saw in the hex editor (Section 9.2). From second line it stars printing the frames/sample which is also same as what we have seen in hex editor. Each channel is 2 bytes long because the audio is 16 bit. Each channel will only be one byte. We can use the getsampwidth() method to determine this. Also, getchannels() will determine if its mono or stereo.

Now its time to decode each and every frames of that file. They're actually little-endian. So we will now modify the python script little bit so that we can get the exact value of each frame. We can use python struct module to decode the frame values to signed integers.

import wave 
import struct 

f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1) 
    sint = struct.unpack('<h', single_frame) [0]
    print "[+] Sample ",i," = ",single_frame.encode('hex')," -> ",sint[0] 
f.close()

This script will print something like this:

[+] WAV parameters (1, 2, 44100, 937, 'NONE', 'not compressed')
[+] No. of Frames 937
[+] Sample 0 = 62fe -> -414
[+] Sample 1 = 99fe -> -359
[+] Sample 2 = c1ff -> -63
[+] Sample 3 = 9000 -> 144
[+] Sample 4 = 8700 -> 135
[+] Sample 5 = b9ff -> -71
[+] Sample 6 = 5cfe -> -420
[+] Sample 7 = 35fd -> -715
[+] Sample 8 = b1fc -> -847
[+] Sample 9 = f5fc -> -779
[+] Sample 10 = 9afd -> -614
[+] Sample 11 = 3cfe -> -452
[+] Sample 12 = 83fe -> -381
[+] Sample 13 = 52fe -> -430
[+] Sample 14 = e2fd -> -542

Now what we can see we have a set of positive and negative integers. Now you should be able to connect the dots. What I have explained in section 9.1. 

So now if we plot the same positive and negative values in a graph will find complete wave form. Lets do it using python matlab module.

import wave 
import struct 
import matplotlib.pyplot as plt 

data_set = [] 
f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1)
    sint = struct.unpack('<h', single_frame)[0]
    data_set.append(sint) 
f.close() 
plt.plot(data_set) 
plt.ylabel('Amplitude')
plt.xlabel('Time') 
plt.show()

This should form following graph

Now you must be familiar with this type of graph. This is what you see in SoundCloud, But definitely more complex one.

So now we have clear understanding of how audio represented in numbers. Now it will be easier for readers to understand how the python script ( shared in section 9.3 ) actually works.

9.3. Python Script

In this section we will develop a script which automate the steps we did using Audacity in Section 8. Below python script will try extract loud voices from input wav file and generate separate wav files.



Once the main challenge is broken into parts we can easily convert it to flac format and send each parts of the challenge to Google speech API using the Python script shared in section 6.

9.4. Demo:



10. Attempt to Crack the Difficult(noisy) audio challenge

So we have successfully broken down the easy challenge.Now its time to give the difficult one a try. So I started with one noisy captcha challenge. You can see the matlab plot of the same noisy audio challenge below.

In above figure we can understand presence of a constant hisss noise. One of the standard ways to analyze sound is to look at the frequencies that are present in a sample. The standard way of doing that is with a discrete Fourier transform using the fast Fourier transform or FFT algorithm. What these basically in this case is to take a sound signal isolate the frequencies of sine waves that make up that sound.

10.1. Signal Filtering using Fourier Transform

Lets get started with a  simple example. Consider a signal consisting of a single sine wave, s(t)=sin(w∗t). Let the signal be subject to white noise which is added in during measurement, Smeasured(t)=s(t)+n. Let F be the Fourier transform of S. Now by setting the value of F to zero for frequencies above and below w, the noise can be reduced. Let Ffiltered be the filtered Fourier transform. Taking the inverse Fourier transform of Ffiltered yields Sfiltered(t). 

The way to filter that sound is to set the amplitudes of the fft values around X Hz to 0. In addition to filtering this peak, It's better to remove the frequencies below the human hearing range and above the normal human voice range. Then we recreate the original signal via an inverse FFT.

I have written couple of scripts which successfully removes the constant hiss noise from the audio file but main challenge is the overlapping voice. Over lapping voice makes it very very difficult even for human to guess digits. Although I was not able to successfully crack any of given difficult challenges using Google Speech API still I've shared few noise removal scrips (using Fourier Transform). 

These scripts can be found in the GitHub project page. There is tons of room for improvement of all this scripts.

11. Code Download

Every code I've written during this project is hosted here:  

12. Conclusion

When I reported this issue to Google security team, they've confirmed that, this mechanism is working as intended. The more difficult audio patterns are only triggered only when abuse/non-human interaction is suspected. So as per the email communication noting is going to be changed to stop this.

Thanks for reading. I hope you have enjoyed. Please drop me an email/comment in case of any doubt and confusion.

13. References

http://rsmith.home.xs4all.nl/miscellaneous/filtering-a-sound-recording.html
http://www.topherlee.com/software/pcm-tut-wavformat.html
http://exnumerus.blogspot.in/2011/12/how-to-remove-noise-from-signal-using.html
http://www.swharden.com/blog/2009-01-21-signal-filtering-with-python/

Sunday, March 16, 2014

In-Memory Kernel Driver(IOCTL)Fuzzing using Python

I'm sharing one of my Kernel Driver IOCTL Fuzzer which operates completely from user land. To run this script you should know at least one process which sends IOCTL to your target device you are fuzzing.


This script is very simple and straight forward. It basically operate in two modes. One is in-memory fuzzing mode and another is logging mode.

In fuzzing mode it attaches it self to given user mode process and hooks DeviceIoControl!Kernel32. After that when DeviceIoControl is get called by theprocess it fuzzes the input/output buffer length, input buffer content etc inside memory and at the same time logs actual buffer and mutated buffer length / content in a xml log file. Which can be helpful while reproducing os crashes.

When running in logging mode it tries to dump all I/O Control code I/O Buffer pointer, I/O buffer length that given process is sending to Kernel mode device. This XML log can be used to fuzz any driver further.

Download:


This tool can be downloaded from my github page : iofuzz

Source Code:



Friday, February 28, 2014

Reversing A Tiny Built-In Windows Kernel Module [Journey from Kernel32 to HAL]

Hello readers. Hope you are doing great. In this post I am going to explore our very own windows kernel little bit by reverse engineering a built in kernel module. If you have ever developed any kernel driver/module for windows it will be very easy for you to understand. If you are not very familiar with how device drivers work then codeproject.com has some really good resources to start up. So let's get started.

https://www.cr0.org/paper/to-jt-party-at-ring0.pdf


But Before we can start reversing the core component, take a look at the diagram mentioned below.




In above picture, the green elements are user mode components (Ring3). The diagram actually shows how apps.exe (a user mode application ) calls the kernel mode driver. We will try to cover each an every section mentioned in above diagram and try to reverse them, to understand how thing works.

Choosing a Target Driver to Reverse:


Its always better to start with simple thing. In this article we will reverse the Beep driver. The Beep Driver component provides the beep driver in the beep.sys file. This component also provides some supporting registry information. This is probably the smallest built in Kernel module in windows OS. It has only 6 routines.

First Step : Building apps.exe


We will start with section 1. So first we will start with a very basic C code. [beep.c]

#include<windows.h>
int main(){
     Beep( 50, 750 );
     return 0;
}

The code is very simple and straight forward. You can see its calling Beep function which produces Beep sound. The function beep resides in Kernel32.dll. After compiling the code you should get beep.exe file. Running beep.exe should generate beep sound.

BOOL WINAPI Beep(
  _In_  DWORD dwFreq,
  _In_  DWORD dwDuration
);

Locating the Driver\Device:


From below screen shot we can see, Beep driver has actually created a Device called "Beep".And you can also see many other information like major functions supported by this driver and many more.



Sniffing all I/O Request Packets (IRP) to Beep Device using IRP Tracker utility:


IRP Tracker is very cool and powerful utility. It can actually sniff the Ring3 and Ring0 gateway and show details of messages passed from user mode process and kernel driver. Using this tool we are going to sniff all request which beep.exe actually actually sending from ring3 to ring0 to produce the beep sound.

Ok, So to start sniffing we need to provide the tool the driver name we want to sniff.  Go to File and select driver. Now you need to provide the driver name which we want to sniff. In this case we are only interested in sniffing the "Beep" driver. So now we started sniffing all messages between user and kernel. Now its time to execute the beep.exe we just compiled . When you execute the beep.exe file you will see few new entries in IRP Tracker window.


Now if you look at the IRP Address Sequence Number column you will see first entry is ntCreateFile() and the last entry is ntClose(). In between them you can see ntDeviceIoControlFile is getting called. Now if you look at the major function column you will find "DEVICE_CONTROL". If you look in msdn you will find that this API is actually used to send IOCTL codes from user land to kernel driver.

NTSTATUS WINAPI NtDeviceIoControlFile(
  _In_   HANDLE FileHandle,
  _In_   HANDLE Event,
  _In_   PIO_APC_ROUTINE ApcRoutine,
  _In_   PVOID ApcContext,
  _Out_  PIO_STATUS_BLOCK IoStatusBlock,
  _In_   ULONG IoControlCode,
  _In_   PVOID InputBuffer,
  _In_   ULONG InputBufferLength,
  _Out_  PVOID OutputBuffer,
  _In_   ULONG OutputBufferLength
);

IRP tracker utility can also provide us the IOCTL code the user mode application sending to kernel driver.


We can see its sending IOCTL code 0x10000 (BEEP_SET) to that device. Keep this in mind. We are going to come back to this in a minute.

Reversing Beep() [Kernel32.dll]


To reverse the Beep routine, lets load Kernel32.dll in IDA Pro. After its loaded lets jump into Beep routine and you should see something like this.


And we can see, its trying to communicate with the device Beep \\Device\\Beep by calling NtCreateFile. When communicating with a Kernel mode driver , any user land application uses NtCreateFile. If its successful this function returns one handle the target device. Using that handle we can read / write to that device. We will come to this later.

If we go little further inside Beep!Kernel32 we should see its trying to verify few parameters passed to it and after that it calls NtDeviceIoControlFile().





I hope now you are able to connect the dots now. If you can remember its the same sequence of function call you have seen in IRP tracker utility. Its already known to us that this NtDeviceIoControlFile is used for sending IOCTL codes to kernel driver.

Reversing ntdll.dll [NtDeviceIoControlFile()]


Now we will have a closer look at the call of NtDeviceIoControlFile. We have seen in msdn that the 6th parameter of NtDeviceIoControlFile is the IOCTL code.

NTSTATUS WINAPI NtDeviceIoControlFile(
  _In_   HANDLE FileHandle,
  _In_   HANDLE Event,
  _In_   PIO_APC_ROUTINE ApcRoutine,
  _In_   PVOID ApcContext,
  _Out_  PIO_STATUS_BLOCK IoStatusBlock,
  _In_   ULONG IoControlCode,
  _In_   PVOID InputBuffer,
  _In_   ULONG InputBufferLength,
  _Out_  PVOID OutputBuffer,
  _In_   ULONG OutputBufferLength
);

Lets verify that in NtDeviceIoControlFile function routine.


Hope you are able the connect the dots now. In above image you can see, first its loading 10000h into ebx register and then passing it to NtDeviceIoControlFile(). This is the same IOCTL code we have seen in IRP tracker utility.

Now lets attach the beep.exe file with immunity debugger and set a break point at NtDeviceIoControlFile(). After setting up the break point if we continue the execution, we should break at this point as shown in screen shot below.



If you look at the entry point of NtDeviceIoControlFile() routine, you will see below instruction,

MOV EAX,42
MOV EDX,7FFE0300
CALL DWORD PTR DS:[EDX]
RETN 28

Now from this sequence of instructions we can understand that its probably going for a system call. Now if we go further and follow the CALL DWORD PTR DS:[EDX] instruction we will get something like this.



LEA EDX,DWORD PTR SS:[ESP+8]
INT 2E

From this INT 2E it's now absolutely clear that its going to invoke a software interrupt. Now EAX is actually pointing to 0x42. So we can say this is the system call number. We can verify this from any SDDT dumping utility. In this case I've used ICESword tool to dump the System Service Descriptor Table.



You can see systemcall 0x42 is actually pointing to Kernel version of NtDeviceIoControlFile() and its resides in the main windows kernel component which is ntoskrnl.exe. After invoking the interrupt OS will switch to kernel-mode to execute a system service. KiSytemServices is going to take the call.
The ‘int’ instructor causes the CPU to execute a software interrupt, i.e. it will go into the Interrupt Descriptor Table at index 2e and read the Interrupt Gate Descriptor at that location. The CPU switches automatically to the kernel-mode stack. The CPU automatically saves the user-mode program’s SS, ESP, EFLAGS, CS and EIP registers on the kernel-mode stack.

More Details Here

Breaking the Beep Driver(Beep.sys):


So till this point we have seen that how user land application is sending request to the kernel driver. Now we will see, how the kernel driver actually process the user land application request and act accordingly. For this we will reverse the Beep.sys file which is the main driver PE file.

After loading the driver into IDA you should first see the Driver Entry subroutine. We know DriverEntry is the first routine called after a driver is loaded. Since its responsible for initializing the driver we should find all the IOCTL handler functions in this driver entry. First thing you will see IoCreateDevice() is getting called.


NTSTATUS IoCreateDevice(
  _In_      PDRIVER_OBJECT DriverObject,
  _In_      ULONG DeviceExtensionSize,
  _In_opt_  PUNICODE_STRING DeviceName,
  _In_      DEVICE_TYPE DeviceType,
  _In_      ULONG DeviceCharacteristics,
  _In_      BOOLEAN Exclusive,
  _Out_     PDEVICE_OBJECT *DeviceObject
);

The IoCreateDevice() routine creates a device object for use by a driver. You should see call to this function on every driver's driver entry routine.

Now that we have successfully created our \Device\Beep device driver. So going further into the DriverEntry we will get a structure like this.



Equivalent C code will be something like this:

DriverObject->DriverStartIo = sub_1051A;
DriverObject->DriverUnload = DriverUnload;
DriverObject->MajorFunction[0] = sub_1046A;
DriverObject->MajorFunction[2] = sub_104B8;
DriverObject->MajorFunction[14] = sub_10400;
DriverObject->MajorFunction[18] = sub_10354;

More practical idea about IRP Major Functions can be found here

Digging further into all above mentioned IRP handlers (sub_xxxx), it was found that sub_10354 actually responsible for handling all IoControls. So we can conclude,

DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = Beephandler;

Now lets jump into sub_10354 in IDA pro. and have a look what its trying to do. Inside sub_10354 you should see somthing like this.


On a bigger picture you should notice calls to below functions.

KeRemoveDeviceQueue,KfRaiseIrql,IoAcquireCancelSpinLock,IoReleaseCancelSpinLock etc etc.Right now we are not very interested in any of above functions. If you are interested you can explore msdn. The call we are interested in is HalMakeBeep.

Call To HalMakeBeep

Reversing the HAL.dll:


Now we will jump into HalMakeBeep routine. HalMakeBeep routine actually resides into hal.dll.

The Windows Hardware Abstraction Layer (HAL) is implemented in Hal.dll. Hardware abstractions are sets of routines in software that emulate some platform-specific details, giving programs direct access to the hardware resources.They often allow programmers to write device-independent applications by providing standard Operating System (OS) calls to hardware. Each type of CPU has a specific instruction set architecture or ISA. One of the main functions of a compiler is to allow a programmer to write an algorithm in a high-level language without having to care about CPU-specific instructions. Then it is the job of the compiler to generate a CPU-specific executable. The same type of abstraction is made in operating systems, but OS APIs now represent the primitive operations of the machine, rather than an ISA.  This allows a programmer to use OS-level operations (i.e. task creation/deletion) in their programs while still remaining portable over a variety of different platforms.[Source : Wiki]

So to look at the assembly of HalMakeBeep we have to load up hal.dll in IDA. After loading hal.dll we will jump into HalMakeBeep routine. You should see lot of inline assembly inside this function.




Every PC has an internal speaker. It can generating beeps of different frequencies. We can actually control the speaker by providing a frequency number which defines the pitch of the beep, then turning the speaker on for the duration of the beep.

https://courses.engr.illinois.edu/ece390/books/labmanual/io-devices-speaker.html


Here the frequency number we provide is nothing but a a counter value. Our computer uses it to determine how long is to wait between sending pulses to the internal speaker. More clearly a smaller frequency number will cause the pulses to be sent quicker, and it will result a higher pitch.Here the frequency number actually tells the PC how many of these cycles to wait before sending another pulse.

Mainly we can communicate with the speaker controller using IN and OUT instructions. Below I've mentioned few steps in generating a beep:

  1. First we need to send the value 182 to port 43h. This will actually set the speaker up.
  2. Next thing is, sending the frequency number to port 42h. Since this is an 8-bit port, we must use two OUT instructions to do this. Send the least significant byte first, then the most significant byte.
  3. After that, to start the beep sound, bits 1 and 0 of port 61h must be set to 1. Since the other bits of port 61h have other uses, they must not be modified. Therefore, you must use an IN instruction first to get the value from the port, then do an OR to set the two bits, then use an OUT instruction to send the new value to the port.
  4. Pause for the duration of the beep.
  5. We can turn off the beep by resetting bits 1 and 0 of port 61h to 0. Remember that since the other bits of this port must not be modified, you must read the value, set just bits 1 and 0 to 0, then output the new value.


So now if we look at the HalMakeBeep routine it should make sense. You can see that its doing the same thing just describe above.

Thanks for reading. Hope you have enjoyed this post. If you believe i did something wrong anywhere and you want me to correct or i've missed something to cover, please drop an email or comment below.

cheers,

References:


MSDN: http://msdn.microsoft.com/
Wiki : http://en.wikipedia.org/wiki/Hal.dll
https://courses.engr.illinois.edu/ece390/books/labmanual/io-devices-speaker.html

Tuesday, February 11, 2014

Building Assembly Control Flow Graph(CFG) at Runtime for Reverse Engineering Using Python

A control flow graph (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. In this post I'm going share one python tool which I've written few days back to build control flow graph of any function at run-time very quickly.

What it Does?

This tool actually help you to visualize any function's control flow graph at time of its execution. It also gives de-reference information of executed instructions.



To build CFG of any function you need to provide the entry point and exit point of that particular function you want to analyze. In last part of this post I've have posted one video which demonstrates how to use this tool.

How it's gonna help?


This tool actually can help you to reverse complex functions by creating control flow graph of it at runtime. So it reduces reverse engineering efforts a lot in many cases. It also gives you de-reference information each and every instruction executed. From this information you can easily find out at any certain point which register is point to to which place (stack / heap).

Sample Control Flow Graph Generated by visdasm:


http://htmlpreview.github.io/?https://github.com/debasishm89/visdasm/blob/master/Report.html

Download:


This tool is available for download at my Github page:

How to Use this tool?[Video Demo]





This tool uses below libraries:

  1. Pydbg
  2. Pydasm
  3. Jquery [For control flow graph]
  4. JqueryUI [For control flow graph]
  5. PlumberJS[For control flow graph]
Last Words:

I'vent tested this script much. I am modifying this tool everyday. So in some cases it may throw dirty errors.

Sunday, October 27, 2013

Reverse Engineering Automation using Pydbg - I

Pydbg is an open source Python debugger. I've been using Pydbg for many days to automate many boring parts of reverse engineering. In this post I will share one technique sometimes I use for crash debugging.

So, suppose after fuzzing an application we found an interesting crash. And we want to get into the root cause of this crash. It can be sometime very difficult to find root cause of any crash because may be corruption happened inside one function but you are getting access violation inside a different function.

I'm not saying its the most efficient way to trace application flow but sometimes I find it very helpful.

Here I will use one sample crash of "SampleParser.exe". We have two files. One is the file which is crashing the SampleParser.exe and another is the base file on which our Fuzzer did modification. So we have a reproducible crash and we want to reach to the vulnerable function causing the crash. So before we start reverse engineering, we must a clear view of call graph of the application. Obviously it has many functions. Not all of them are involved into parsing input files. First we will narrow down our RE scope to the functions which are taking part into parsing the input file.

Finding out Subroutines taking part in File parsing:

To find out which functions are taking part in parsing the input file, we must have all functions belongs to SampleParser.exe. How how to get that list. Here IDA pro can help us getting the list of all functions. Open up the executable using IDA Pro when its completely loaded, From function window just you can get all function list. (If the application loads arbitrary DLL at run time for parsing input file you need to open the dll file with IDA Pro. )


Now just copy all functions from there and put it in a text file and save it as ida-export.txt. Now we will run the application inside the pydbg and do following things.
  1. From the text file ida-export.txt, we will only take those function addresses which contains the word "sub_" and set a break point on all of them.
  2. Next we will run the program with pydbg.
  3. If there is any break point hit, we will print the value of EIP in command prompt. 
Here is the python script to do the above steps. (Make sure you named the function list ida-export.txt )


Time for The First Run:

So when we start the application, we will get few function addresses in command prompt. So these are the function, responsible for starting up the application. For obvious reason we are not interested in these function.

But we will simply copy the list from command window and paste it in an excel sheet for later analysis.


Now the application is running and our debugger is watching it. So we will open the base file using the SampleParser.exe (The base file is the file, on which our Fuzzer did modification, and this file should not crash the application).

Now in our command window we will see some new functions are getting called and after few seconds when the file is completely loaded inside SampleParser.exe , it will be idle and you will not see any new call in command window. Here roughly we can say these are the functions, responsible for parsing the input file. Again we will copy the list from command prompt and paste it in the next column of excel sheet.

So now we have an excel like this.


And we have a rough list of functions responsible for parsing/loading the input file.

Now we rerun the target application using the above script, and this time we will open the input file which was causing the crash. This time we will get a list of functions but after few seconds the target application will crash. Now again we will copy the new list of function from command prompt and paste it in the next column of excel sheet. So the final excel look like this,


In above excel we can see, we ran the application thrice and function calls upto the blue line are common. At this level application is started and its running.

Below the blue line, few functions are marked in Green ,We see this calls when we feed a normal file to that application and the application starts parsing the file. So these are are the function responsible for parsing we can roughly say.

In the next column the red marked function is where our SampleParser.exe program crashed with an access violation. We can see in the 3rd attempt there are few common function which tell us that it started parsing the file but it crashed suddenly due to some error.

From this analysis now we have 5 function involved into parsing the input file and we are pretty sure any of these did something wrong for which we got the access violation.

0x004057d0
0x004057c0
0x004014a0
0x00402470
0x00404210

So now we have narrowed down our analysis scope to only 5 functions.

Fuzzing Facebook for $$$ using Burpy

Fuzzing is an automated / sometime semi automated software security/bug testing technique which allows us to find different types of bugs with very less efforts. It actually involves providing invalid, unexpected, or random data to the inputs of a computer program. The best thing about fuzzing is, you spend few sleepless nights to write your Fuzzer, and after that you just sleep, and your Fuzzer brings you bugs (or money some time ;)) without any positive effort .


Months back I have blogged about two (this & this)XSRF vulnerabilities, I've found in Twitter application. Both XSRF bugs were found using a web application Fuzzer I've developed,I named it Bupry. I've already shared that tool. You can download the tool from my GitHub page.

The tool is pretty simple, straight forward and flexible. You can easily write your own application specific modules to perform various test cases. Burpy actually takes Burp suite log as input and after that it parses all request response from the Burp log xml file. After parsing that log it performs various tests depending on the module you've provided. In this tool I've also included on raw http request manipulation library which is rawweb.py. You can easily manipulate or modify raw http requests using this library and its very simple. In this post, I am going show how I wrote a Facebook application specific plugin for Burpy to fuzz Facebook application.

So, like Twitter , here also in Facebook, my target was to find broken XSRF protection, in an automated manner using Burpy.


As I've already said, Burpy takes Burp suite log as input. So to do that,I've opened up Burp proxy and started surfing Facebook application randomly with a test account. At the same time Burp suite was capturing all the request responses in background.

So after that, I exported the captured log from Burp. Now here, Burpy's job would be to parse burp log and get a list of all http request (normal client's requests without any modification)present in the Burp suite log and depending on the modules provided, it will modify/manipulate those raw http request (using rawweb.py)and re-send them to web server. And depending on the response, Burpy will generate an HTML report powered by Twitter Bootstrap. A Sample report with single issue is here.

So I wrote a Burpy plugin which removes the xsrf token parameter from all raw http request found in burp log, and re-fire them to server one by one. Now like twitter Facebook also behaves in same manner every time, when you send any request to server using without XSRF token ( XSRF token name is constant for main facebook application which is fb_dtsg). So if you send any request without this XSRF token, maximum time Facebook will give you back an error saying, Sorry, something went wrong - Please try closing and re-opening your browser window or it will throw 500 Server Error response code.

Now if we remove the XSRF token from each and every request that client side code sends to Facebook server, and if we dont find above mentioned error in server response, then we can say that there is a chance that XSRF token validation is not properly implemented. But Facebook application send thousands of GET and POST request. So its almost impossible to do this manually for each and every request that Facebook sends.

So I wrote this tiny Burpy plugin to automate this test case. It simply checks whether XSRF token validation is present in server side or not, by removing XSRF token from request and replaying it. Since Facebook application always throws a generic error message for XSRF error, So if this error is not present in response after removing the token it returns +ve.


So after running this 10-15 times I've gathered few suspicious request response from Burpy report.But  most of them were technically XSRF but not very critical or harmful. But I was able to track down near about 8-10 pretty serious issues.

So from the next day I started reporting them to Facebook Security Team using their online form.

After few weeks I started getting responses back from Facebook security team. Unfortunately few of them did not qualify for Facebook Bug Bounty program because few were previously known to them and few were not very critical.





etc..
etc..
etc..

Bad Luck..huhhh..:(


But 2013 is not so unlucky. Few bugs did qualify for reward.......:) :) :)



So at last I've earned some money out of Facebook and the best thing was Facebook guys mentioned my name on their White Hat Hall of Fame page which is definitely a great CV builder.