Sunday, October 27, 2013

Reverse Engineering Automation using Pydbg - I

Pydbg is an open source Python debugger. I've been using Pydbg for many days to automate many boring parts of reverse engineering. In this post I will share one technique sometimes I use for crash debugging.

So, suppose after fuzzing an application we found an interesting crash. And we want to get into the root cause of this crash. It can be sometime very difficult to find root cause of any crash because may be corruption happened inside one function but you are getting access violation inside a different function.

I'm not saying its the most efficient way to trace application flow but sometimes I find it very helpful.

Here I will use one sample crash of "SampleParser.exe". We have two files. One is the file which is crashing the SampleParser.exe and another is the base file on which our Fuzzer did modification. So we have a reproducible crash and we want to reach to the vulnerable function causing the crash. So before we start reverse engineering, we must a clear view of call graph of the application. Obviously it has many functions. Not all of them are involved into parsing input files. First we will narrow down our RE scope to the functions which are taking part into parsing the input file.

Finding out Subroutines taking part in File parsing:

To find out which functions are taking part in parsing the input file, we must have all functions belongs to SampleParser.exe. How how to get that list. Here IDA pro can help us getting the list of all functions. Open up the executable using IDA Pro when its completely loaded, From function window just you can get all function list. (If the application loads arbitrary DLL at run time for parsing input file you need to open the dll file with IDA Pro. )

Now just copy all functions from there and put it in a text file and save it as ida-export.txt. Now we will run the application inside the pydbg and do following things.
  1. From the text file ida-export.txt, we will only take those function addresses which contains the word "sub_" and set a break point on all of them.
  2. Next we will run the program with pydbg.
  3. If there is any break point hit, we will print the value of EIP in command prompt. 
Here is the python script to do the above steps. (Make sure you named the function list ida-export.txt )

Time for The First Run:

So when we start the application, we will get few function addresses in command prompt. So these are the function, responsible for starting up the application. For obvious reason we are not interested in these function.

But we will simply copy the list from command window and paste it in an excel sheet for later analysis.

Now the application is running and our debugger is watching it. So we will open the base file using the SampleParser.exe (The base file is the file, on which our Fuzzer did modification, and this file should not crash the application).

Now in our command window we will see some new functions are getting called and after few seconds when the file is completely loaded inside SampleParser.exe , it will be idle and you will not see any new call in command window. Here roughly we can say these are the functions, responsible for parsing the input file. Again we will copy the list from command prompt and paste it in the next column of excel sheet.

So now we have an excel like this.

And we have a rough list of functions responsible for parsing/loading the input file.

Now we rerun the target application using the above script, and this time we will open the input file which was causing the crash. This time we will get a list of functions but after few seconds the target application will crash. Now again we will copy the new list of function from command prompt and paste it in the next column of excel sheet. So the final excel look like this,

In above excel we can see, we ran the application thrice and function calls upto the blue line are common. At this level application is started and its running.

Below the blue line, few functions are marked in Green ,We see this calls when we feed a normal file to that application and the application starts parsing the file. So these are are the function responsible for parsing we can roughly say.

In the next column the red marked function is where our SampleParser.exe program crashed with an access violation. We can see in the 3rd attempt there are few common function which tell us that it started parsing the file but it crashed suddenly due to some error.

From this analysis now we have 5 function involved into parsing the input file and we are pretty sure any of these did something wrong for which we got the access violation.


So now we have narrowed down our analysis scope to only 5 functions.