Saturday, July 5, 2014

Releasing Stupid v0.1 - The Dumbest File Format Fuzzer (Python+Pydbg)

I developed Stupid in late 2011 to automate fuzzing and problem/app fault detection process of different file formats( mainly Music/Video players etc). I've been receiving many email from my readers asking me to release POC of a python + pydbg fuzzer. So today I'm very happy to make this small yet effective Fuzzer open to everyone. This is highly prototypal and I recommend to rewrite/modify the test case generator sub routine to make this fuzzer more effective.


Happy fuzzing guys. If you are lucky enough to find any zero day using this fuzzer, you can drop me a TY email or buy me a beer in return if we meet someday :)


Source Code:


Stupid source code is available @   https://github.com/debasishm89/Stupid

Licence:






This software is licenced under a Beerware licence although the following libraries are included with Stupid and are licensed separately.


  • pydbg
  • paimei - https://github.com/pedramamini/paimei

Running this Fuzzer:

Stupid was developed and tested with Win32 Python 2.7(x86). So it's recommended to use the same version of python. Also make sure pydbg(x86) is installed on the system.

You need to provide the target application binary path (.exe) and at least one base file to run this fuzzer. You can to modify the configuration section of "stupid.py" as per your requirement.

Test Case Generation:


mutate() routine is responsible for generating test cases from given bases files. It has two sub parts:
  • Bitflip
  • Random Byte Flip

You may want to change / modify these routines to make this fuzzer more effective. ;)

Monitoring:


To monitor target application for different types of crashes (access violation), Stupid uses pydbg(Python debugger).It also uses "utils" of https://github.com/pedramamini/paimei framework to collect crash information which can be used later to identify/distinguish interesting app crashes. Sample crash synopsis file is below,




Reproducing Crashes:


Crash files and crash information can be found in "Crashes" folder which can be used to reproduce app crashes.

Saturday, April 19, 2014

Attacking Audio "reCaptcha" using Google's Web Speech API

I had a fun project months back, Where I had to deal with digital signal processing and low level audio processing. I was never interested in DSP and all other control system stuffs, But when question arises about breaking things, every thing becomes interesting :) . In this post i'm going to share one technique to fully/ partially bypass reCaptcha test. This is not actually a vulnerability but its better if we call it "Abuse of functionality".

Disclaimer : Please remember this information is for Educational Purpose only and should not be used for malicious purpose. I will not assume any liability or responsibility to any person or entity with respect to loss or damages incurred from information contained in this article.

1. What is Captcha

A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot. The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University.


2. What is Re-captcha

reCAPTCHA is a free CAPTCHA service by Google, that helps to digitize books, newspapers and old time radio shows. More details can be found here.


3. Audio reCaptcha

reCAPTCHA also comes with an audio test to ensure that blind users can freely navigate.

4. Main Idea: Attacking Audio reCaptcha using Google's Web Speech API Service





5. Google Web Speech API

Chrome has a really interesting new feature for HTML5 speech input API. Using this user can talk to computer using microphone and Chrome will interpret it. This feature is also available for Android devices. If you are not aware of this feature you can find a live demo here.

https://www.google.com/intl/en/chrome/demos/speech.html

I was always very curious about the Speech recognition API of chrome. I tried sniff the api/voice traffic using Wireshirk but this API uses SSL. :(.

So finally I started browsing the Chromium source code repo. Finally I found exactly what I wanted.

http://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/

It pretty simple, First the audio is collected from the mic, and then it posts it to Google web service, which responds with a JSON object with the results.  The URL which handles the request is :

https://www.google.com/speech-api/v1/recognize

Another important thing is this api only accepts flac audio format.

6. Programatically Accessing Google Web Speech API(Python)

Below python script was written to send a flac audio file to Google Web Speech API and print out the JSON response.

./google_speech.py hello.flac


'''
Accessing Google Web Speech API using Pyhon
Author : Debasish Mandal

'''

import httplib
import sys

print '[+] Sending clean file to Google voice API'
f = open(sys.argv[1])
data = f.read()
f.close()
google_speech = httplib.HTTPConnection('www.google.com')
google_speech.request('POST','/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US',data,{'Content-type': 'audio/x-flac; rate=16000'})
print google_speech.getresponse().read()
google_speech.close()



7. Thoughts on complexity of reCaptcha Audio Challenges

While dealing with audio reCaptcha, you may know , it basically gives two types of audio challenges. One is pretty clean and simple (Example : https://dl.dropboxusercontent.com/u/107519001/easy.wav) . percentage of noise is very less in this type of challenges. 

Another one is very very noisy and its very difficult for even human to guess (Example : https://dl.dropboxusercontent.com/u/107519001/difficult.wav). Constant hissss noise and overlapping voice makes it really difficult to crack human. You may wanna read this discussion on complexity of audio reCapctha.

In this post I will mainly cover the technique / tricks to solve the easier one using Google Speech API. Although I've tried several approaches to solve the complex one, but as I've already said, its very very had to guess digits even for human :( .

8. Cracking the Easy Captcha Manually Using Audacity and Google Speech API

Google Re-captcha allows user to download audio challenges in mp3 format. And Google web speech API accepts audio in flac format. So if we just normally convert the mp3 audio challenge to flac format of frame rate 16000 its does not work :( .  Google Chrome Speech to text api does not respond to this sound.

But after some experiment and head scratching, it was found that we can actually make Google web speech api convert the easy captcha challenge to text for us, if we can process the audio challenge little bit. In this section i will show how this audio manipulation can be done using Audacity.

To manually verify that first I'm going to use a tool called Audacity to do necessary changes to the downloaded mp3 file. 

Step 1: Download the challenge as mp3 file.
Step 2: Open the challenge audio in Audacity.



Step 3: Copy the first digit speaking sound from main window and paste it in a new window. So here we will only have a one digit speaking sound.

Step 4: From effect option make it repetitive once. (Now It should speak the same digit twice).

Lets say for example if the main challenge is  7 6 2 4 6, Now we have only first digit challenge in wav format which having the digit 7 twice.





Step 5: Export the updated audio in WAV format.
Step 6: Now convert the wav file to flac format using sox tool and send it to Google speech server using the python script posted in section 6. And we will see something like this.

Note: In some cases little bit amplification might be required if voice strength is too low.

debasish@debasish ~/Desktop/audio/heart attack/final $ sox cut_0.wav -r 16000 -b 16 -c 1 cut_0.flac lowpass -2 2500
debasish@debasish ~/Desktop/audio/heart attack/final $ python send.py cut_0.flac 


Great! As you can see first digit of the audio challenge has been resolved by Google Speech. :) :) :) Now in same manner we can solve the entire challenge. In next section we will automate the same thing using python and it's wave module. 

9. Automation using Python and it's WAVE Module

Before we jump into processing of raw WAV audio using low level python API, its important to have some idea of how digital audio actually works. In above process we've extracted the most louder voices using audacity but to do it automatically using python, we must have some understanding of how digital audio is actually represented in numbers.

9.1. How is audio represented with numbers

There is an excellent stackoverflow post which explains the same. In short ,we can say audio is nothing but a vibration. Typically, when we're talking about vibrations of air between approximately 20Hz and 20,000Hz. Which means the air is moving back and forth 20 to 20,000 times per second. If somehow we can measure that vibration and convert it to an electrical signal using a microphone, we'll get an electrical signal with the voltage varying in the same waveform as the sound. In our pure-tone hypothetical, that waveform will match that of the sine function.

Now, we have an analogue signal, the voltage. Still not digital. But, we know this voltage varies between (for example) -1V and +1V. We can, of course, attach a volt meter to the wires and read the voltage.  Arbitrarily, we'll change the scale on our volt meter. We'll multiple the volts by 32767. It now calls -1V -32767 and +1V 32767. Oh, and it'll round to the nearest integer.

Now after having a set of signed integers we can easily draw an waveform using the data sets.

X axis -> Time
Y axis -> Amplitude (signed integers)



Now, if we attach our volt meter to a computer, and instruct the computer to read the meter 44,100 times per second. Add a second volt meter (for the other stereo channel), and we now have the data that goes on an audio CD. This format is called stereo 44,100 Hz, 16-bit linear PCM. And it really is just a bunch of voltage measurements.

9.2. WAVE File Format walk through using Python

As an example lets open up a very small wav file with a hex editor.

  

9.3. Parsing the same WAV file using Python

The wave module provides a convenient interface to the WAV sound format. It does not support compression/decompression, but it does support mono/stereo. Now we are going to parse the same wav file using python wave module and try to relate what we have just seen in hex editor.

Let's write a python script:

import wave 
f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1) 
    print single_frame.encode('hex') 
f.close()

Line 1 imports python wav module.
Line 2: Opens up the sample.wav file.
Line 3: getparams() routine returns a tuple (nchannels, sampwidth, framerate, nframes, comptype, compname), equivalent to output of the get*() methods.
Line 4: getnframes() returns number of audio frames.
Line 5,6,7: Now we are iterating through all the frames present in the sample.wav file and printing them one by one.
Line 8: Closes the opened file

Now if we run the script we will find something like this:

[+] WAV parameters (1, 2, 44100, 937, 'NONE', 'not compressed')
[+] No. of Frames 937
[+] Sample 0 = 62fe    <- Sample 1
[+] Sample 1 = 99fe   <- Sample 2
[+] Sample 2 = c1ff    <- Sample 3
[+] Sample 3 = 9000
[+] Sample 4 = 8700
[+] Sample 5 = b9ff
[+] Sample 6 = 5cfe
[+] Sample 7 = 35fd
[+] Sample 8 = b1fc
[+] Sample 9 = f5fc
[+] Sample 10 = 9afd
[+] Sample 11 = 3cfe
[+] Sample 12 = 83fe
[+] ....
and so on,

It should make sense now. In first line we get number of channels, sample width , frame/sample rate,total number of frames etc etc. Which is exact same what we saw in the hex editor (Section 9.2). From second line it stars printing the frames/sample which is also same as what we have seen in hex editor. Each channel is 2 bytes long because the audio is 16 bit. Each channel will only be one byte. We can use the getsampwidth() method to determine this. Also, getchannels() will determine if its mono or stereo.

Now its time to decode each and every frames of that file. They're actually little-endian. So we will now modify the python script little bit so that we can get the exact value of each frame. We can use python struct module to decode the frame values to signed integers.

import wave 
import struct 

f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1) 
    sint = struct.unpack('<h', single_frame) [0]
    print "[+] Sample ",i," = ",single_frame.encode('hex')," -> ",sint[0] 
f.close()

This script will print something like this:

[+] WAV parameters (1, 2, 44100, 937, 'NONE', 'not compressed')
[+] No. of Frames 937
[+] Sample 0 = 62fe -> -414
[+] Sample 1 = 99fe -> -359
[+] Sample 2 = c1ff -> -63
[+] Sample 3 = 9000 -> 144
[+] Sample 4 = 8700 -> 135
[+] Sample 5 = b9ff -> -71
[+] Sample 6 = 5cfe -> -420
[+] Sample 7 = 35fd -> -715
[+] Sample 8 = b1fc -> -847
[+] Sample 9 = f5fc -> -779
[+] Sample 10 = 9afd -> -614
[+] Sample 11 = 3cfe -> -452
[+] Sample 12 = 83fe -> -381
[+] Sample 13 = 52fe -> -430
[+] Sample 14 = e2fd -> -542

Now what we can see we have a set of positive and negative integers. Now you should be able to connect the dots. What I have explained in section 9.1. 

So now if we plot the same positive and negative values in a graph will find complete wave form. Lets do it using python matlab module.

import wave 
import struct 
import matplotlib.pyplot as plt 

data_set = [] 
f = wave.open('sample.wav', 'r') 
print '[+] WAV parameters ',f.getparams() 
print '[+] No. of Frames ',f.getnframes() 
for i in range(f.getnframes()): 
    single_frame = f.readframes(1)
    sint = struct.unpack('<h', single_frame)[0]
    data_set.append(sint) 
f.close() 
plt.plot(data_set) 
plt.ylabel('Amplitude')
plt.xlabel('Time') 
plt.show()

This should form following graph

Now you must be familiar with this type of graph. This is what you see in SoundCloud, But definitely more complex one.

So now we have clear understanding of how audio represented in numbers. Now it will be easier for readers to understand how the python script ( shared in section 9.3 ) actually works.

9.3. Python Script

In this section we will develop a script which automate the steps we did using Audacity in Section 8. Below python script will try extract loud voices from input wav file and generate separate wav files.



Once the main challenge is broken into parts we can easily convert it to flac format and send each parts of the challenge to Google speech API using the Python script shared in section 6.

9.4. Demo:



10. Attempt to Crack the Difficult(noisy) audio challenge

So we have successfully broken down the easy challenge.Now its time to give the difficult one a try. So I started with one noisy captcha challenge. You can see the matlab plot of the same noisy audio challenge below.

In above figure we can understand presence of a constant hisss noise. One of the standard ways to analyze sound is to look at the frequencies that are present in a sample. The standard way of doing that is with a discrete Fourier transform using the fast Fourier transform or FFT algorithm. What these basically in this case is to take a sound signal isolate the frequencies of sine waves that make up that sound.

10.1. Signal Filtering using Fourier Transform

Lets get started with a  simple example. Consider a signal consisting of a single sine wave, s(t)=sin(w∗t). Let the signal be subject to white noise which is added in during measurement, Smeasured(t)=s(t)+n. Let F be the Fourier transform of S. Now by setting the value of F to zero for frequencies above and below w, the noise can be reduced. Let Ffiltered be the filtered Fourier transform. Taking the inverse Fourier transform of Ffiltered yields Sfiltered(t). 

The way to filter that sound is to set the amplitudes of the fft values around X Hz to 0. In addition to filtering this peak, It's better to remove the frequencies below the human hearing range and above the normal human voice range. Then we recreate the original signal via an inverse FFT.

I have written couple of scripts which successfully removes the constant hiss noise from the audio file but main challenge is the overlapping voice. Over lapping voice makes it very very difficult even for human to guess digits. Although I was not able to successfully crack any of given difficult challenges using Google Speech API still I've shared few noise removal scrips (using Fourier Transform). 

These scripts can be found in the GitHub project page. There is tons of room for improvement of all this scripts.

11. Code Download

Every code I've written during this project is hosted here:  

12. Conclusion

When I reported this issue to Google security team, they've confirmed that, this mechanism is working as intended. The more difficult audio patterns are only triggered only when abuse/non-human interaction is suspected. So as per the email communication noting is going to be changed to stop this.

Thanks for reading. I hope you have enjoyed. Please drop me an email/comment in case of any doubt and confusion.

13. References

http://rsmith.home.xs4all.nl/miscellaneous/filtering-a-sound-recording.html
http://www.topherlee.com/software/pcm-tut-wavformat.html
http://exnumerus.blogspot.in/2011/12/how-to-remove-noise-from-signal-using.html
http://www.swharden.com/blog/2009-01-21-signal-filtering-with-python/

Sunday, March 16, 2014

In-Memory Kernel Driver(IOCTL)Fuzzing using Python

I'm sharing one of my Kernel Driver IOCTL Fuzzer which operates completely from user land. To run this script you should know at least one process which sends IOCTL to your target device you are fuzzing.


This script is very simple and straight forward. It basically operate in two modes. One is in-memory fuzzing mode and another is logging mode.

In fuzzing mode it attaches it self to given user mode process and hooks DeviceIoControl!Kernel32. After that when DeviceIoControl is get called by theprocess it fuzzes the input/output buffer length, input buffer content etc inside memory and at the same time logs actual buffer and mutated buffer length / content in a xml log file. Which can be helpful while reproducing os crashes.

When running in logging mode it tries to dump all I/O Control code I/O Buffer pointer, I/O buffer length that given process is sending to Kernel mode device. This XML log can be used to fuzz any driver further.

Download:


This tool can be downloaded from my github page : iofuzz

Source Code:



Friday, February 28, 2014

Reversing A Tiny Built-In Windows Kernel Module [Journey from Kernel32 to HAL]

Hello readers. Hope you are doing great. In this post I am going to explore our very own windows kernel little bit by reverse engineering a built in kernel module. If you have ever developed any kernel driver/module for windows it will be very easy for you to understand. If you are not very familiar with how device drivers work then codeproject.com has some really good resources to start up. So let's get started.

https://www.cr0.org/paper/to-jt-party-at-ring0.pdf


But Before we can start reversing the core component, take a look at the diagram mentioned below.




In above picture, the green elements are user mode components (Ring3). The diagram actually shows how apps.exe (a user mode application ) calls the kernel mode driver. We will try to cover each an every section mentioned in above diagram and try to reverse them, to understand how thing works.

Choosing a Target Driver to Reverse:


Its always better to start with simple thing. In this article we will reverse the Beep driver. The Beep Driver component provides the beep driver in the beep.sys file. This component also provides some supporting registry information. This is probably the smallest built in Kernel module in windows OS. It has only 6 routines.

First Step : Building apps.exe


We will start with section 1. So first we will start with a very basic C code. [beep.c]

#include<windows.h>
int main(){
     Beep( 50, 750 );
     return 0;
}

The code is very simple and straight forward. You can see its calling Beep function which produces Beep sound. The function beep resides in Kernel32.dll. After compiling the code you should get beep.exe file. Running beep.exe should generate beep sound.

BOOL WINAPI Beep(
  _In_  DWORD dwFreq,
  _In_  DWORD dwDuration
);

Locating the Driver\Device:


From below screen shot we can see, Beep driver has actually created a Device called "Beep".And you can also see many other information like major functions supported by this driver and many more.



Sniffing all I/O Request Packets (IRP) to Beep Device using IRP Tracker utility:


IRP Tracker is very cool and powerful utility. It can actually sniff the Ring3 and Ring0 gateway and show details of messages passed from user mode process and kernel driver. Using this tool we are going to sniff all request which beep.exe actually actually sending from ring3 to ring0 to produce the beep sound.

Ok, So to start sniffing we need to provide the tool the driver name we want to sniff.  Go to File and select driver. Now you need to provide the driver name which we want to sniff. In this case we are only interested in sniffing the "Beep" driver. So now we started sniffing all messages between user and kernel. Now its time to execute the beep.exe we just compiled . When you execute the beep.exe file you will see few new entries in IRP Tracker window.


Now if you look at the IRP Address Sequence Number column you will see first entry is ntCreateFile() and the last entry is ntClose(). In between them you can see ntDeviceIoControlFile is getting called. Now if you look at the major function column you will find "DEVICE_CONTROL". If you look in msdn you will find that this API is actually used to send IOCTL codes from user land to kernel driver.

NTSTATUS WINAPI NtDeviceIoControlFile(
  _In_   HANDLE FileHandle,
  _In_   HANDLE Event,
  _In_   PIO_APC_ROUTINE ApcRoutine,
  _In_   PVOID ApcContext,
  _Out_  PIO_STATUS_BLOCK IoStatusBlock,
  _In_   ULONG IoControlCode,
  _In_   PVOID InputBuffer,
  _In_   ULONG InputBufferLength,
  _Out_  PVOID OutputBuffer,
  _In_   ULONG OutputBufferLength
);

IRP tracker utility can also provide us the IOCTL code the user mode application sending to kernel driver.


We can see its sending IOCTL code 0x10000 (BEEP_SET) to that device. Keep this in mind. We are going to come back to this in a minute.

Reversing Beep() [Kernel32.dll]


To reverse the Beep routine, lets load Kernel32.dll in IDA Pro. After its loaded lets jump into Beep routine and you should see something like this.


And we can see, its trying to communicate with the device Beep \\Device\\Beep by calling NtCreateFile. When communicating with a Kernel mode driver , any user land application uses NtCreateFile. If its successful this function returns one handle the target device. Using that handle we can read / write to that device. We will come to this later.

If we go little further inside Beep!Kernel32 we should see its trying to verify few parameters passed to it and after that it calls NtDeviceIoControlFile().





I hope now you are able to connect the dots now. If you can remember its the same sequence of function call you have seen in IRP tracker utility. Its already known to us that this NtDeviceIoControlFile is used for sending IOCTL codes to kernel driver.

Reversing ntdll.dll [NtDeviceIoControlFile()]


Now we will have a closer look at the call of NtDeviceIoControlFile. We have seen in msdn that the 6th parameter of NtDeviceIoControlFile is the IOCTL code.

NTSTATUS WINAPI NtDeviceIoControlFile(
  _In_   HANDLE FileHandle,
  _In_   HANDLE Event,
  _In_   PIO_APC_ROUTINE ApcRoutine,
  _In_   PVOID ApcContext,
  _Out_  PIO_STATUS_BLOCK IoStatusBlock,
  _In_   ULONG IoControlCode,
  _In_   PVOID InputBuffer,
  _In_   ULONG InputBufferLength,
  _Out_  PVOID OutputBuffer,
  _In_   ULONG OutputBufferLength
);

Lets verify that in NtDeviceIoControlFile function routine.


Hope you are able the connect the dots now. In above image you can see, first its loading 10000h into ebx register and then passing it to NtDeviceIoControlFile(). This is the same IOCTL code we have seen in IRP tracker utility.

Now lets attach the beep.exe file with immunity debugger and set a break point at NtDeviceIoControlFile(). After setting up the break point if we continue the execution, we should break at this point as shown in screen shot below.



If you look at the entry point of NtDeviceIoControlFile() routine, you will see below instruction,

MOV EAX,42
MOV EDX,7FFE0300
CALL DWORD PTR DS:[EDX]
RETN 28

Now from this sequence of instructions we can understand that its probably going for a system call. Now if we go further and follow the CALL DWORD PTR DS:[EDX] instruction we will get something like this.



LEA EDX,DWORD PTR SS:[ESP+8]
INT 2E

From this INT 2E it's now absolutely clear that its going to invoke a software interrupt. Now EAX is actually pointing to 0x42. So we can say this is the system call number. We can verify this from any SDDT dumping utility. In this case I've used ICESword tool to dump the System Service Descriptor Table.



You can see systemcall 0x42 is actually pointing to Kernel version of NtDeviceIoControlFile() and its resides in the main windows kernel component which is ntoskrnl.exe. After invoking the interrupt OS will switch to kernel-mode to execute a system service. KiSytemServices is going to take the call.
The ‘int’ instructor causes the CPU to execute a software interrupt, i.e. it will go into the Interrupt Descriptor Table at index 2e and read the Interrupt Gate Descriptor at that location. The CPU switches automatically to the kernel-mode stack. The CPU automatically saves the user-mode program’s SS, ESP, EFLAGS, CS and EIP registers on the kernel-mode stack.

More Details Here

Breaking the Beep Driver(Beep.sys):


So till this point we have seen that how user land application is sending request to the kernel driver. Now we will see, how the kernel driver actually process the user land application request and act accordingly. For this we will reverse the Beep.sys file which is the main driver PE file.

After loading the driver into IDA you should first see the Driver Entry subroutine. We know DriverEntry is the first routine called after a driver is loaded. Since its responsible for initializing the driver we should find all the IOCTL handler functions in this driver entry. First thing you will see IoCreateDevice() is getting called.


NTSTATUS IoCreateDevice(
  _In_      PDRIVER_OBJECT DriverObject,
  _In_      ULONG DeviceExtensionSize,
  _In_opt_  PUNICODE_STRING DeviceName,
  _In_      DEVICE_TYPE DeviceType,
  _In_      ULONG DeviceCharacteristics,
  _In_      BOOLEAN Exclusive,
  _Out_     PDEVICE_OBJECT *DeviceObject
);

The IoCreateDevice() routine creates a device object for use by a driver. You should see call to this function on every driver's driver entry routine.

Now that we have successfully created our \Device\Beep device driver. So going further into the DriverEntry we will get a structure like this.



Equivalent C code will be something like this:

DriverObject->DriverStartIo = sub_1051A;
DriverObject->DriverUnload = DriverUnload;
DriverObject->MajorFunction[0] = sub_1046A;
DriverObject->MajorFunction[2] = sub_104B8;
DriverObject->MajorFunction[14] = sub_10400;
DriverObject->MajorFunction[18] = sub_10354;

More practical idea about IRP Major Functions can be found here

Digging further into all above mentioned IRP handlers (sub_xxxx), it was found that sub_10354 actually responsible for handling all IoControls. So we can conclude,

DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = Beephandler;

Now lets jump into sub_10354 in IDA pro. and have a look what its trying to do. Inside sub_10354 you should see somthing like this.


On a bigger picture you should notice calls to below functions.

KeRemoveDeviceQueue,KfRaiseIrql,IoAcquireCancelSpinLock,IoReleaseCancelSpinLock etc etc.Right now we are not very interested in any of above functions. If you are interested you can explore msdn. The call we are interested in is HalMakeBeep.

Call To HalMakeBeep

Reversing the HAL.dll:


Now we will jump into HalMakeBeep routine. HalMakeBeep routine actually resides into hal.dll.

The Windows Hardware Abstraction Layer (HAL) is implemented in Hal.dll. Hardware abstractions are sets of routines in software that emulate some platform-specific details, giving programs direct access to the hardware resources.They often allow programmers to write device-independent applications by providing standard Operating System (OS) calls to hardware. Each type of CPU has a specific instruction set architecture or ISA. One of the main functions of a compiler is to allow a programmer to write an algorithm in a high-level language without having to care about CPU-specific instructions. Then it is the job of the compiler to generate a CPU-specific executable. The same type of abstraction is made in operating systems, but OS APIs now represent the primitive operations of the machine, rather than an ISA.  This allows a programmer to use OS-level operations (i.e. task creation/deletion) in their programs while still remaining portable over a variety of different platforms.[Source : Wiki]

So to look at the assembly of HalMakeBeep we have to load up hal.dll in IDA. After loading hal.dll we will jump into HalMakeBeep routine. You should see lot of inline assembly inside this function.




Every PC has an internal speaker. It can generating beeps of different frequencies. We can actually control the speaker by providing a frequency number which defines the pitch of the beep, then turning the speaker on for the duration of the beep.

https://courses.engr.illinois.edu/ece390/books/labmanual/io-devices-speaker.html


Here the frequency number we provide is nothing but a a counter value. Our computer uses it to determine how long is to wait between sending pulses to the internal speaker. More clearly a smaller frequency number will cause the pulses to be sent quicker, and it will result a higher pitch.Here the frequency number actually tells the PC how many of these cycles to wait before sending another pulse.

Mainly we can communicate with the speaker controller using IN and OUT instructions. Below I've mentioned few steps in generating a beep:

  1. First we need to send the value 182 to port 43h. This will actually set the speaker up.
  2. Next thing is, sending the frequency number to port 42h. Since this is an 8-bit port, we must use two OUT instructions to do this. Send the least significant byte first, then the most significant byte.
  3. After that, to start the beep sound, bits 1 and 0 of port 61h must be set to 1. Since the other bits of port 61h have other uses, they must not be modified. Therefore, you must use an IN instruction first to get the value from the port, then do an OR to set the two bits, then use an OUT instruction to send the new value to the port.
  4. Pause for the duration of the beep.
  5. We can turn off the beep by resetting bits 1 and 0 of port 61h to 0. Remember that since the other bits of this port must not be modified, you must read the value, set just bits 1 and 0 to 0, then output the new value.


So now if we look at the HalMakeBeep routine it should make sense. You can see that its doing the same thing just describe above.

Thanks for reading. Hope you have enjoyed this post. If you believe i did something wrong anywhere and you want me to correct or i've missed something to cover, please drop an email or comment below.

cheers,

References:


MSDN: http://msdn.microsoft.com/
Wiki : http://en.wikipedia.org/wiki/Hal.dll
https://courses.engr.illinois.edu/ece390/books/labmanual/io-devices-speaker.html

Tuesday, February 11, 2014

Building Assembly Control Flow Graph(CFG) at Runtime for Reverse Engineering Using Python

A control flow graph (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. In this post I'm going share one python tool which I've written few days back to build control flow graph of any function at run-time very quickly.

What it Does?

This tool actually help you to visualize any function's control flow graph at time of its execution. It also gives de-reference information of executed instructions.



To build CFG of any function you need to provide the entry point and exit point of that particular function you want to analyze. In last part of this post I've have posted one video which demonstrates how to use this tool.

How it's gonna help?


This tool actually can help you to reverse complex functions by creating control flow graph of it at runtime. So it reduces reverse engineering efforts a lot in many cases. It also gives you de-reference information each and every instruction executed. From this information you can easily find out at any certain point which register is point to to which place (stack / heap).

Sample Control Flow Graph Generated by visdasm:


http://htmlpreview.github.io/?https://github.com/debasishm89/visdasm/blob/master/Report.html

Download:


This tool is available for download at my Github page:

How to Use this tool?[Video Demo]





This tool uses below libraries:

  1. Pydbg
  2. Pydasm
  3. Jquery [For control flow graph]
  4. JqueryUI [For control flow graph]
  5. PlumberJS[For control flow graph]
Last Words:

I'vent tested this script much. I am modifying this tool everyday. So in some cases it may throw dirty errors.