Wednesday, January 25, 2012

Bypass Captcha using Python and Tesseract OCR engine

A CAPTCHA is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a person. The process usually involves one computer (a server) asking a user to complete a simple test which the computer is able to generate and grade.The term "CAPTCHA" was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford (all of Carnegie Mellon University). It is an acronym based on the word "capture" and standing for "Completely Automated Public Turing test to tell Computers and Humans Apart".

In this post I am going to tell you guys how to crack weak captcha s using python and Tesseract OCR engine.Few days back I was playing around with an web application.The application was using a captcha as an anti automation technique when taking users feedback.

First let me give you guys a brief idea about how the captcha was working in that web application.
Inspecting the captcha image I have found that the form loads the captcha image in this way:
<img src=""> 
From this you can easily understand that the “captcha.php” file returns an image file.
If we try access the url each and every time it generates an image with a new random digit.
To make this clearer to you, Let me give you an example
Suppose after opening the feedback form you got few text fields and a captcha.Suppose at a certain time the captcha loaded with a number for ex. "4567".
So if you use that code "4567" the form will be submitted successfully.

Now the most interesting thing was if you copy the captcha image url (which is in this case) and open the image in new tab of same browser ,the cpatcha will load with a different number as I have told you earlier. Suppose you have got "9090" this time. Now if you try to submit the feedback form with the number that’s was loaded earlier with the feedback form( which was "4567" )the application will not accept that form. If you enter “9090” then the application will accept that form.
For more clear idea I have created this simple Fig.

Now my strategy to bypass this anti automation techniques was
1)Download the image only from 
2)Feed that image to OCR Engine
3)Craft an http POST request with all required parameter and the decoded captcha code, and POST it.

Now what is happening here??

When you are requesting the image file, the server will do steps 1 to 5 as shown in figure.
Now when we are posting the http request, the server will match the received captcha code with the value that was temporarily stored. Now the code will definitely match and server will accept the form.

Now I have used this Python Script to automated this entire process.

from PIL import Image
import ImageEnhance
from pytesser import *
from urllib import urlretrieve
def get(link):
im ="temp.png")
nx, ny = im.size
im2 = im.resize((int(nx*5), int(ny*5)), Image.BICUBIC)"temp2.png")
enh = ImageEnhance.Contrast(im)
enh.enhance(1.3).show("30% more contrast")
imgx ='temp2.png')
imgx = imgx.convert("RGBA")
pix = imgx.load()
for y in xrange(imgx.size[1]):
    for x in xrange(imgx.size[0]):
        if pix[x, y] != (0, 0, 0, 255):
            pix[x, y] = (255, 255, 255, 255)"bw.gif", "GIF")
original ='bw.gif')
bg = original.resize((116, 56), Image.NEAREST)
ext = ".tif""input-NEAREST" + ext)
image ='input-NEAREST.tif')
print image_to_string(image)

Here I am only posting code of OCR engine. If your are a python lover like me you can use "httplib" python module to do the rest part.This script is not idependent. pytesser python module is requred to run this script.PyTesser is an Optical Character Recognition module for Python. It takes as input an image or image file and outputs a string.
PyTesser uses the Tesseract OCR engine, converting images to an accepted format and calling the Tesseract executable as an external script.

You can get this package @

The script works in this way.
1)First the script will download the captcha image using python module "urlretrive"
After that It will try to clean backgroug noises.

2)When this is done the script will make the image beigger to better understading.
3)At last it will feed that processed image to OCR engine.
Here is another python script which is very useful while testing captchas.You can add these line to your script if the taget captcha image is too small.This python script can help you to change resolution of any image.

from PIL import Image
import ImageEnhance

im ="test.png")
nx, ny = im.size
im2 = im.resize((int(nx*5), int(ny*5)), Image.BICUBIC)"final_pic.png")
enh = ImageEnhance.Contrast(im)
enh.enhance(1.3).show("30% more contrast")

Thanks for reading.I hope It was helpful.Feel free to share and drop comments.

Friday, January 13, 2012

Basic Reverse Engineering with GDB

In computers, debugging is the process of locating and fixing or bypassing bugs (errors) in computer program code or the engineering of a hardware device.Debugging is the Fundamentals part of Exploit Development .When you are writing an exploit you are going to need to be able to execute the code in your target application in a variety of different ways, to give you the appropriate amount of control to monitor the code and memory closely when needed. You may want to run normally at one point, to go step by step through each individual instruction at another, and sometimes to have it run quickly to a particular point allowing you to take control once that point is reached.
Luckily, this is all possible via the use of a debugger by using breakpoints as well as the various methods for stepping through code.In this article will try to describe most common features of GDB.First we will take a simple C program.Compile it, And after that break it with GDB.

GDB, the GNU Project debugger, allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed.

GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act:

Start your program, specifying anything that might affect its behavior.
Make your program stop on specified conditions.
Examine what has happened, when your program has stopped.
Change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

After some basic debugging we will use some portable Linux based tools to gather more information about a Linux Executable.

So here we will debug this simple C program using gdb.

int my_function(wchar_t *a)
        return wprintf(a);
int main()
        return my_function(L"Hello World!\n");

First of all we will use gcc compiler to compile the C prog.

debasish@debasish-desktop:~$ nano MYprog.c
debasish@debasish-desktop:~$ gcc -o MYprog MYprog.c
MYprog.c:2:18: warning: extra tokens at end of #include directive
debasish@debasish-desktop:~$ ./MYprog
Hello World!
debasish@debasish-desktop:~$ ^C

So we have successfully compiled our C program and its working fine.

Now we will debug this program with gdb debugger.We will use following commands.

debasish@debasish-desktop:~$ gdb MYprog
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /home/debasish/MYprog...(no debugging symbols found)...done.

So now gdb will load the program and at entry point it will pause the execution.
Then we will use the command "start" to start the debugging process.

(gdb) start
Temporary breakpoint 1 at 0x804841a
Starting program: /home/debasish/MYprog

Temporary breakpoint 1, 0x0804841a in main ()

We can see that is showing the break point  is at 0x0804841a.

Now we will use the command "layout asm" to see the assembly code in a proper order.

Now you should get a window like this.

and $0xfffffff0,%esp ¦0x804841d
sub $0x10,%esp ¦0x8048420
movl $0x80484f0,(%esp) ¦0x8048427
call 0x8048404 ¦0x804842c
leave ¦0x804842d
ret ¦0x804842e nop ¦0x804842f nop ¦0x8048430 <__libc_csu_fini> push %ebp ¦0x8048431 <__libc_csu_fini+1> mov %esp,%ebp ¦0x8048433 <__libc_csu_fini+3> pop %ebp ¦0x8048434 <__libc_csu_fini+4> ret ¦0x8048435 lea 0x0(%esi,%eiz,1),%esi ¦0x8048439 lea 0x0(%edi,%eiz,1),%edi ¦0x8048440 <__libc_csu_init> push %ebp ¦0x8048441 <__libc_csu_init+1> mov %esp,%ebp ¦0x8048443 <__libc_csu_init+3> push %edi ¦0x8048444 <__libc_csu_init+4> push %esi ¦0x8048445 <__libc_csu_init+5> push %ebx

Now in extreme left side the address shown, is the virtual address. The ">" sign indicates that the Break point is at 0x804841a.Which is our main function.

The first instruction is 
sub    $0x10,%esp
This will substructure the 10 from the ESP.
Next move instruction takes the value $0x80484f0 and put it in stack.We all know that Stack grows downward in memory!
Now more interestingly if you look at the 2nd line of the code you can see $0x80484f0 is the starting address of the string Hello World.
To validate that we can use this command.

(gdb) printf "%s\n",0x80484f0

Now it will return the first character of our string that is a H.
One thing to note that GDB cant print wide character to it will just return "H".

Now its obvious that adding 4 with this we will get our next character.

And adding more bytes will give our full string "Hello World"

Now step by step execution of assembly instructions is very important while trying to understand flow of any program.We can do this using "si" command."si" stands for "step into". When si is entered gdb will execute the next instruction just after break point.

Cont is another gdb command which can be used to run rest of the instructions at a time.

Now when playing with debugger its very important that at the same time you look at the status of the stack and registers.In interactive disassembler like Immunity,Olly debug in windows you can just easily monitor them.But for a command line debugger it will be not that easy.
At any point of time when you wanna check any register content you can do this just by using the command "print"
so to check the value at which EAX is pointing we have to enter 

"print $eax"

There are more in gdb. Hopefully I will write another article on it.
One other tool that can be very useful for  reverse engineering Linux based prog is "hexdump"

Use the hexdump tool with -C option will dump raw hex dump of executable.Which we usually get at the lower left corner in case of Immunity debugger or Ollydebug.

Now if you wanna see first 16 bytes of the executable then you can use the option -n.

For example 

hexdump -C -n 16 MYprog

This will print the header part of executable.
The command "file" also can be used to retrieve some useful information about any executable.

readelf -h Myprog

This command will give the header information of this executable in detail.This will also retrieve the program entry pint.

ndisasm is another cool tool comes with Ubuntu using that you can actually disassemble the binary.
ndisasm -u -o 0x[entry-point] -e 0x320 MYprog | less

the option -e will escape fist 320 bytes.Which is nothing but the header part.

But if you notice you can see this is not the code we have just seen in gdb.

The reason is it the entry point.The code present here is used by the application for setting up the stack.

Now after this following instructions when stack is already configured ,if we jump at the address 0x8048358 we can have the assembly code we just saw in gdb.

08048395  51                push ecx
08048396  56                push esi
08048397  6817840408        push dword 0x8048417
0804839C  E8B7FFFFFF        call dword 0x8048358

Look at the screen shot [red marked]. After the NOP sleds we can see the codes we have just seen in gdb.

It was the most fundamental of debugging linux application.I hope it was helpful.I will try to write more on gdb later on.

A Meeting with Dr. Watson(Debugging Dead Locks)

Dr. Watson is an application debugger included with the Microsoft Windows operating system. It may be named drwatson.exe, drwtsn32.exe or dwwin.exe, depending on the version of Windows.Dr. Watson for Windows is a program error debugger that gathers information about your computer when an error (or user-mode fault) occurs with a program.Dr. Watson creates a text file that is Drwtsn32.log.

By default, the log file created by Dr. Watson is named Drwtsn32.log and is saved in the following location:
drive:\Documents and Settings\All Users.WINNT\Application Data\Microsoft\Dr Watson
Virtual Memory
On Windows NT and above the memory is divided into two parts, the user mode and the kernel mode. What differentiates user mode from kernel mode is the privilege level. The primary memory restriction placed on user-mode programs is that they cannot access any of the kernel-mode memory.

Although a user-mode program can try to directly communicate with a hardware device, the system prevents it from doing so. You probably have seen the result of such an attempt, the point where Dr. Watson pops up.
A kernel-mode component must determine whether an exception is the result of a legal or an illegal operation; when a kernel-mode component catches an illegal exception, it notifies the Dr. Watson user-mode application.

kernel-mode device drivers and subsystems can almost do anything they want. This lack of protection also emphasize the need to take care when loading a third-party device driver, because once in kernel mode the software has complete access to all OS data.
To configure Dr. Watson, follow these steps:
Click Start, and then click Run.
Type drwtsn32, and then click OK.
Types of Memory Dump Files

When Stop errors occur, the system automatically dumps the contents of its RAM to the paging file, and then writes the pagefile contents to the systemdrive root by default. Analyzing the dump file can help provide more information about the root cause of a problem.

Small memory dump
Small memory dump files contain the least information, but consume the least disk space, 64 kilobytes (KB). Small memory dump files are sometimes referred to as "mini" dump files.

Kernel memory dump
This is an intermediate size dump file that records only kernel-level memory and can occupy several megabytes (MB) of disk space. When a Stop error occurs, Windows saves a kernel memory dump file to a file named systemroot\Memory.dmp and create a small memory dump file in the systemroot\Minidump folder

Complete memory dump
A complete memory dump file contains the entire contents of physical memory when the Stop error occurred. The file size is equal to the amount of physical memory installed plus 1 MB.
When a Stop error occurs, the operating system saves a complete memory dump file to a file
named systemroot\Memory.dmp and creates a small memory dump file in the systemroot\Minidump folder.

How to generate a memory dump file using Dr.Watson:

Register Dr.Watson as the default debugger in the operating system:

Go to Start > Run > type drwtsn32 -i and press Enter.

Check that the program has been successfully registered as the default debugger and click OK.

Configure Dr.Watson:

Go to Start > Run > type drwtsn32 and press Enter to start Dr.Watson.

Select a folder for saving log file in the Log File Path field, e.g. ะก:\drwtsn.

Select the same folder in the Crash Dump field.

Select Full in the Crash Dump Type section.

Click OK.

Check that the full dump file has been successfully generated.

DeadLock Situation

A deadlock is a situation in which two computer programs sharing the same resource are effectively preventing each other from accessing the resource, resulting in both programs ceasing to function.

Program 1 requests resource A and receives it.
Program 2 requests resource B and receives it.
Program 1 requests resource B and is queued up, pending the release of B.
Program 2 requests resource A and is queued up, pending the release of A.

Now neither program can proceed until the other program releases a resource. The operating system cannot know what action to take. At this point the only alternative is to abort (stop) one of the programs.

Debugging Dead Lock Situation with Windump:

When you reach a deadlock, the PC appears to hang. With a userdump you can get the information to resolve this problem.

WinDbg !locks command will examine process critical section list and display all locked critical sections, lock count and thread id of current critical section owner.

0:000> !locks
CritSec NTDLL!LoaderLock+0 at 784B0348
LockCount          4
RecursionCount     1
OwningThread       624
EntryCount         6c3
ContentionCount    6c3
*** Locked

CritSec LOCALSPL!SpoolerSection+0 at 76AB8070
LockCount          3
RecursionCount     1
OwningThread       1c48
EntryCount         646
ContentionCount    646
*** Locked

If we look at threads #624 and #1c48 we could see them mutually waiting for each other:
If we look at threads #624 and #1c48 we could see them mutually waiting for each other:

TID#624 owns CritSec 784B0348 and is waiting for CritSec 76AB8070

TID#1c48 owns CritSec 76AB8070 and is waiting for CritSec 784B0348


. 12 Id: bc0.624 Suspend: 1 Teb: 7ffd3000 Unfrozen
0000024c 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb
76ab8000 76a815ef 76ab8070 NTDLL!RtlpWaitForCriticalSection+0×9e
76ab8070 76a844f8 00cd1f38 NTDLL!RtlEnterCriticalSection+0×46
00cd1f38 76a8a1d7 00000000 LOCALSPL!EnterSplSem+0xb
00000000 00000000 00cd1f38 LOCALSPL!FindSpoolerByNameIncRef+0×1f
00000000 777f19bc 00000001 LOCALSPL!LocalGetPrinterDriverDirectory+0xe
00000000 777f19bc 00000001 spoolss!GetPrinterDriverDirectoryW+0×59
00000000 777f19bc 00000001 spoolsv!YGetPrinterDriverDirectory+0×27
00000000 777f19bc 00000001 WINSPOOL!GetPrinterDriverDirectoryW+0×7b
50000000 00000001 00000000 BRHLUI04+0×14ea
50002ea0 50000000 00000001 BRHLUI04!DllGetClassObject+0×1705
00000000 00000000 000cb570 NTDLL!LdrpRunInitializeRoutines+0×1df
000cc8f8 0288ea30 0288ea38 NTDLL!LdrpLoadDll+0×2e6
000cc8f8 0288ea30 0288ea38 NTDLL!LdrLoadDll+0×17)
000c1258 00000000 00000008 KERNEL32!LoadLibraryExW+0×231
000c150c 0288efd8 00000000 UNIDRVUI!PLoadCommonInfo+0×17e
000c150c 0288efd8 00000007 UNIDRVUI!DwDeviceCapabilities+0×1a
00070000 00071378 00000045 UNIDRVUI!DrvDeviceCapabilities+0×19

. 13 Id: bc0.1c48 Suspend: 1 Teb: 7ffd2000 Unfrozen
0000010c 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb
784b0301 78468d38 784b0348 NTDLL!RtlpWaitForCriticalSection+0×9e
784b0348 74fb4344 00000000 NTDLL!RtlEnterCriticalSection+0×46
74fb0000 02c0f2a8 00000000 NTDLL!LdrpGetProcedureAddress+0×122
74fb0000 02c0f2a8 00000000 NTDLL!LdrGetProcedureAddress+0×17
74fb0000 74fb4344 02c0f449 KERNEL32!GetProcAddress+0×41
017924b0 00000000 00000001 ws2_32!CheckForHookersOrChainers+0×1f
00000101 02c0f344 017924b0 ws2_32!WSAStartup+0×10f
00cdf79c 02c0f4f4 76a8c9bc LOCALSPL!GetDNSMachineName+0×1e
00000000 76a8c9bc 780276a2 LOCALSPL!GetPrinterUrl+0×2c
0176f570 ffffffff 01000000 LOCALSPL!UpdateDsSpoolerKey+0×322
0176f570 76a8c9bc 01792b90 LOCALSPL!RecreateDsKey+0×50
00000000 00000002 01792b90 LOCALSPL!SplAddPrinter+0×521
01791faa 0176a684 76a5cd34 WIN32SPL!InternalAddPrinterConnection+0×1b4
01791faa 02c0fa00 02c0fabc WIN32SPL!AddPrinterConnectionW+0×15
00076f1c 02c0fabc 01006873 spoolss!AddPrinterConnectionW+0×49
00076f1c 00000001 77107fb0 spoolsv!YAddPrinterConnection+0×17
00076f1c 02020202 00000001 spoolsv!RpcAddPrinterConnection+0xb
01006868 02c0fac0 00000001 rpcrt4!Invoke+0×30
00000000 00000000 000d22c8 rpcrt4!NdrStubCall2+0×655
000d22c8 00076fe0 000d22c8 rpcrt4!NdrServerCall2+0×17
010045fc 000d22c8 02c0fe0c rpcrt4!DispatchToStubInC+0×32
0000002b 00000000 02c0fe0c rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0×100
000d22c8 00000000 02c0fe0c rpcrt4!RPC_INTERFACE::DispatchToStub+0×5e
000d3210 00076608 813b0013 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0×1dd
000d21d0 02c0fe50 000d3210 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0×10c
770c9ad0 00076608 770cb6d8 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0×229
00076608 770cb6d8 0288f9a8 rpcrt4!RecvLotsaCallsWrapper+0×9
00074a50 02c0ffec 77e7438b rpcrt4!BaseCachedThreadRoutine+0×11f
00076e68 770cb6d8 0288f9a8 rpcrt4!ThreadStartRoutine+0×18
770d1c54 00076e68 00000000 KERNEL32!BaseThreadStart+0×52

The command kv. reveals the function callback stack for the thread that was active at the time of the trip. This stack is read from bottom to top: the topmost function is the last function to be called and the bottommost function is the first function to be called for this thread.

Once you locate the thread, you see it has a call to the WaitForCriticalSection function, which means that not only does it have a lock, but it is also waiting for an object that is locked by something else. You can find out what is locking the object by looking at the first parameter of the WaitForCriticalSection call.