Friday, March 9, 2018

The Art of Large Scale Cumulative Binary Diffing

I've been performing patch diffing on various windows based softwares for past couple of years now. Patch diffing is a process of comparing two binary builds of the same code – a known vulnerable one and the one containing the security fix for a vulnerability. Bindiff is very popular among security researchers because it is generally used to gather information about patched security vulnerabilities, find root causes and vectors.

In 2017 December I've delivered a talk at BlackHat Europe where I've showcased many patched vulnerabilities in VMWare workstation, mostly identified using binary patch diffing different releases of VMWare workstation.

Performing BinDiffing on a huge software is very challenging specially when affected components are not known. Let's take the example of VMWare workstation. If we unpack VMWare workstation installer we find nearly 300 binary files. Among these huge number of binary files, finding the components on which security fixes were applied is very difficult and time consuming. Especially for my VMWare research I had to do this on a cumulative basis because I had to analyze each and every released security patches of last one year. Following screenshots lists out VMWare workstation releases and their version details.

To be able to identify the components modified in each workstation release and to understand the modification logic, we had to perform binary diffing like this

Performing binary diffing on each and every patch released by vmware for last one year was indeed very tedious. So I thought of automating the whole process. In this blog post I'm going briefly explain the way it was done and also release the source code of the python library I've written to automate this. The same code can be re-used (with little modification) to perform cumulative binary diffing on any windows based software.

Manually Unpacking The Installer

There were some manual efforts involved initially. Manually unpacking the installer files was the first step. For vmware workstation it can be done using following command 

VMware-workstation-full-xxxx-xxxx.exe /extract "folder path"

For other softwares files can be directly taken from "Program Files" folder.

After the installer files are unpacked they are kept in an organized way in same directory. Point to be noted here directory structure of all the unpacked file should be same like this because these paths would be accessed the the Mass binary diffing program.


Introducing MassDiffer

MassDiffer is small python program developed to automate the tedious process of cumulative binary diffing. This script has few dependencies 
  1. IDA Pro idaq.exe & idaq64.exe
  2. Zynamics Binexport
  3. BinDiff  - differ.exe & differ64.exe

This is exactly how the MassDiffer works,

1. Path of two directories containing unpacked installer files (old release and new release) are passed to the program for example C:\MassDiffing\VMware-workstation-full-12.5.0.exe\ and C:\MassDiffing\VMware-workstation-full-12.5.1.exe\

2. The program traverses two directories, tries to find out files which can have binary code and create a mapping between the old version and new version of the file.

3. After the mapping is created it checks if the binary image version has changed in the new release or not. If binary images are identical in both the releases nothing will be done to those files.

4. Next, if it finds any change in binary image file version it process them. First it generates set of IDB files for both old and new version.

5. From the IDB file it generates BinExport file with help of zynamics binexport.

6. From BinExport files it generates BinDiff file. Point to be noted here BinDiff files are nothing but a sqlite database, where all the diffing results are stored. Simple sqlite query can be made to extract information from the BinDiff file about the modified function in two binary files.

7. Next step, BinDiff database file is parsed and modified function details and addresses are extracted from BinDiff database.

8. Once these information are gathered a MS Excel report is prepared which gives a visual representation of the modified components. Following screenshot shows a sample MassDiff output for VMware-workstation version 12.5.1 and 12.5.2 release. As it can be easily pointed out that in this release VMWare had only applied fix to two binary files they are vmware-vmx.exe and vmwarebase.dll and only few functions in these binary files were modified. This reduces a lot of repetitive work.

Source Code :

Source code is available at
The program was developed and tested on IDA Pro Version 6.5 + Bindiff version 4.2.

Thanks for reading. Hope you've enjoyed :)