How to Compare Binary Files on Linux

Linux laptop with a bash prompt
fatmawati ahmad zaenuri/Shutterstock.com

How can you check if two linux binaries are the same? When it comes to executable files, differences can indicate unwanted or malicious behavior. Here’s the easiest way to check if they’re different.

Comparison of binaries

Linux is rich in possibilities for comparing and analyzing text files. That diff The command compares two files for you and highlights the differences. It can even contain some lines on either side of the changes to provide context around the changed lines. And the colordiff The command adds color to make visual analysis of the differences even easier.

Developers and authors use diff to highlight the differences between different versions of program source code files or draft texts. It’s quick and easy, and you don’t need any technical knowledge to spot the differences between text strings.

In the world of binaries, things aren’t that simple. Binary files do not consist of plain text. They consist of many bytes that contain numeric values. If it is a compressed file such as a TAR archive or a ZIP file, these values ​​represent the compressed files stored in the archive file along with the symbol tables needed to decompress and extract the files.

If the binary file is an executable file, the numeric values ​​of the file’s bytes are interpreted as such things as machine code instructions for the CPU, metadata, labels, or encoded data. Modifications to a binary or a library file are likely to result in different behavior when the binary is run or used by another application.

It’s easy to spoof a file’s creation or modification date and time. This means that there can be two versions of a file with the same name, file size – if the changes replace the existing content byte for byte – and date stamp. And yet one of the files could have been altered.

Secure Hash Algorithms

A secure hash algorithm is a math-based algorithm. It creates a 64-bit value by scanning all the bytes in a file and applying a mathematical transformation to them to generate the hash value. Every day, the same file will always produce the same hash. Even a one byte difference results in a radically different hash.

Often the hash of a file is displayed on the download page. You should generate a hash for the file once you’ve downloaded it. If it differs from the hash shown on the web page, you know the file is compromised. It was either tampered with and replaced with the original file – to trick people into downloading the corrupted file – or it was corrupted in transit.

On our test computer we have two copies of the same file, a shared library. The files have been renamed so they can be in the same directory. In theory, these files should be the same. After all, they should be the same version of the shared library.

ls -l *.so

Two binaries that look the same

The files have the same size, same date stamps, and same time stamps. To the casual observer, they will look the same. Let’s use them sha256sum command and generate a hash for each file.

sha256sum binary_file1.so
sha256sum binary_file2.so

Generating hashes for the two binaries

The hashes are completely different, which clearly indicates that there are differences between the two files. If the website shows the hash of the original file, you can discard the mismatched file.

find the differences

If you want to see the changes, there are ways to do that too. You don’t need to be able to decompile the file or understand assembly or machine code just to see the changes. Understand what these changes mean, and what their purpose is would of course require deeper technical knowledge. But just knowing how extensive the changes are can give a clue as to what happened to the file.

If we use diff For the two binaries, we get a response that’s a bit disappointing.

diff binary_file1.so binary_file2.so

Using diff with two binaries gives very little information

We already knew the files were different. let us try it cmp .

cmp binary_file1.so binary_file2.so

Using cmp with two binaries gives a bit more information, but not much

That tells us a little bit more. The first byte that differs between the two files is byte number 13451. That is, counting from the beginning of the binary, byte 13451 is different in the two binaries. So 13451 is the offset of the first difference from the beginning of the file.

It just so happens that there are bytes throughout the file that contain the hexadecimal value 0x10. This is the value that Linux uses as the newline character in text files. That cmp The command found 131 bytes with this value between the start of the binary and the position of the first difference. So it thinks it’s on line 132. It really doesn’t mean anything in this context.

If we add those -l (verbose) option we get useful information.

cmp -l binary_file1.so binary_file2.so

Using the -l option with cmp to list the changed bytes

All deviating bytes are listed. The byte number or offset, the value from the first file, and the value from the second file are displayed with one byte per line of output.

The byte values ​​are displayed in octal format instead of the usual hexadecimal format used with binary files. Still, we learned something else. All changed bytes are in a continuous sequence. Their offsets are increased by one for each byte.

That hexdump The tool outputs a binary file in the terminal window. If we use those -C (canonical) option, the output lists on each line the offset, the values ​​of 16 bytes at that offset, and—if any—the ASCII representation of the byte values.

hexdump -C binary_file1.so

The canonical hexdump output of a binary

We can use the output of hexdump as entrance to diffto let diff work as if it were reading two text files.

diff <(hexdump binary_file1.so) <(hexdump binary_file2.so)

Using diff and hexdump to get the differences between two files

diff finds the different lines and shows the hexadecimal byte values ​​from the first file over the values ​​from the second file. The offset of the first line is 0x3480 or 13440 decimal. Earlier, cmp informed us that the first change occurred at byte 13451, which is 0x348B. That actually corresponds to what we see here.

The output of diff is in two-byte blocks. The first pair of bytes are bytes 0 and 1 from the offset of 0x3480, the second block contains bytes 2 and 3 from the offset. Block 6 contains bytes 0xA and 0xB or 10 and 11 in decimal notation. These are bytes 13450 and 13451. And we can see that these are the first bytes that differ. The first five pairs of bytes are the same in both files.

However, because diff counts from base zero what cmp Calls 13451 will be byte 13540 diff. And to make things even more confusing, the byte order is reversed in each two-byte block diff. The bytes are actually listed in this order: 1 and 0, 3 and 2, 5 and 4, 7 and 6, and so on.

The command is also computationally intensive – two hexdumps and a diff all at once – especially when the files being compared are large.

But if hexdump -C can send an ASCII version of the binary to the terminal window, why don’t we redirect the output to text files and then compare those two text files with diff?

hexdump -C binary_file1.so > binary1.txt
hexdump -C binary_file2.so > binary2.txt
diff binary1.txt binary2.txt

Redirect hexdump to create two text files and use diff to compare the text files

The difference between the two files is shown in two short snippets. Next to it is an ASCII representation. There will be a pair of extracts for each difference between the files. In this example there is only one difference.

That’s all very nice, but wouldn’t it be great if there was something that would do all of that for you?

VBinDiff

The VBinDiff program can be installed from the usual repositories for all major distributions. Use this command to install it on Ubuntu:

sudo apt install vbindiff

Installing VBinDiff on Ubuntu

On Fedora you need to type:

sudo dnf install vbindiff

Installing VBinDiff on Fedora

Manjaro users must use pacman.

sudo pacman -Sy vbindiff

Installing VBinDiff on Fedora

To use the program, pass the name of the two binaries on the command line.

vbindiff binary_file1.so binary_file2.so

Passing two binaries to VBinDiff on the command line

The terminal-based application opens showing both files in a scrolling view.

VBinDiff displays two binaries

You can use the mouse scroll wheel or the Up Arrow, Down Arrow, Home, End, Page Up, and Page Down keys to move through the files. Both files scroll.

Press the “Enter” key to jump to the first difference. The difference is highlighted in both files.

VBinDiff highlights differences between two binaries

If there were more differences, pressing “Enter” would show the next difference. Pressing “q” or “Esc” will end the program.

What is the difference?

If you work on a computer owned by someone else and you are not allowed to install packages, you can use cmp, diffand hexdump. If you need to capture the output for further processing, these are also the tools you should use.

But if you are allowed to install packages, VBinDiff makes your workflow easier and faster. And actually, using VBinDiff with a single binary is an easy and convenient way to search through binaries, which is a nice bonus.

TIED TOGETHER: How to peek into binaries from the Linux command line

Leave a Reply

Your email address will not be published. Required fields are marked *