Advertisment

Decompiling Java Code

author-image
PCQ Bureau
New Update

Java

is currently one of the hottest technologies around, and with the growing

use of the Net, it’s poised for an explosive growth. Like almost

everything else connected to the Net, Java has its dark sides too. Here, we

attempt to shed some light on one such dark facet of Java–the ease of

decompiling Java executables.

Advertisment

Decompilation is the reverse

of compilation–a decompiler is a tool that converts the executable machine

code back to the source code that produced it. The technique of

decompilation is not a novel one. In fact, the first decompiler appeared in

the early 1960s, about a decade after its compiler counterpart. However,

writing a decompiler has always been an uphill task and the result was never

totally satisfactory. This is because of several technical problems for

example, the difficulty in combining a sequence of machine instructions to

get back the programming construct–like an if-then loop–that generated

it. Unfortunately, that isn’t true any more with Java.

Today, there are decompilers

available for Java that allow anybody, even with limited knowledge of the

language, to reverse engineer most Java class files–that is, get back

their source code. Although no language is decompilation-proof, in the case

of Java, the very strengths that have made it a huge success are its

weaknesses against a decompiler.

Before going into how this

happen, let’s take a look at the Java class files structure.

Advertisment

Java class files

Java class

files are binaries, the result of compiling Java source code using a

compiler tool such as Sun’s javac. They contain:

  • Symbolic information:

    the names of attributes and methods in the Java source program

  • Byte codes: the

    result of compiling methods in the Java source program, and optionally

  • Debugging information

Advertisment

Class files don’t contain

source code or comments. Still, they contain all the information that’s

needed by a Java interpreter to execute the application. The symbolic

information contained in the class file describes precisely the class

structure, the inheritance tree, the methods, and the attributes.

Let’s get to how Java’s

strengths as a language make it vulnerable to decompilation attacks.

Platform independence>

Platform

independence is compile once, run everywhere.

Advertisment

A single class file in Java

can run on a variety of platforms. Java achieves this by being both a

compiled as well as an interpreted language. A Java source file is first

compiled into an intermediate byte code representation. These byte codes, as

we explained above, are contained in the class file. These can be

interpreted by the Java interpreter, also called the Java Virtual Machine (JVM).

The byte codes are actually a

platform-independent machine language–that is, instructions for a virtual

processor. The JVM shields the application from the real hardware by

emulating this virtual processor–hence the name virtual machine. The

platform-dependent aspects are isolated in the JVM and different JVM

implementations are needed for different hardware platforms. This

intermediate representation is the trick that makes Java programs

cross-platform.

The following method makes

this concept of byte codes clearer. The method "main" here is

taken from a "Hello world" program written in Java.

Advertisment

public static

void main(String<> args)

{

System.out.println("Hello

World");

Advertisment

}

This method is compiled into

four byte codes:

  1. getstatic

    java.io.PrintStream.out

  2. ldc

    "Hello World"

  3. invokevirtual

    void println(java.lang. String)

  4. return

Advertisment

As can be seen in the above

example, the byte codes are similar to assembly language but without

registers. This is due to the fact that the virtual processor emulated by

the JVM is a Stack machine. That is, it has no explicit registers to store

data. The operands for all its instructions come from the in-memory stack.

The virtual processor only keeps track of the location of the next

instruction to be executed (traditionally called the Program Counter )

and a pointer to the top of the stack (called the Stack Pointer



).

All this leads to an instruction set.....>

All this leads to an

instruction set that’s much simpler than that of a real processor. This

simplicity of instructions makes compilation easier and also leads to faster

execution. But sadly enough, it also makes decompilation easier and makes

the "Decompile once, run everywhere" dream of the pirate come

true.

Object-oriented nature

That Java is

an object-oriented language is a well-known fact. Lesser known, however, is

the fact that this is also true of the JVM. The JVM actually emulates an

object-oriented processor. In other words, the byte codes support

object-oriented operations. This fact is illustrated by the byte codes for

the instructions "getstatic" and "invokevirtual" in the

above example. Using these object-supporting instructions, the sequence of

four byte codes above makes a call to the "println" method of the

"java.io.PrintStream" object. To support this object-oriented

model, the JVM is also responsible for:

  • Dynamic

    loading of Java classes

  • Inheritance

    and polymorphism. that is, when calling a method, it walks the

    inheritance tree to call overridden methods correctly

  • Memory

    management and garbage collection

The object-oriented nature of

the virtual machine has major implications on the Java class file format. In

particular, to support dynamic loading of classes and method invocation, the

JVM needs symbolic information. There is no other place to keep this

symbolic information except in the class file itself. Thus, this information

is also available to a decompiler and helps it to restore the same names for

classes, methods and attributes in the decompiled file as were used in the

original Java source file.

Protecting your Java

applications

Does that

mean that there is no way of preventing class files from being decompiled?

A technique called Code

obfuscation provides some hope to developers. An obfuscator, when applied to

class files, makes it harder for decompilers to extract useful information

from them. It works by changing most of the symbolic information present in

a class file. For example, it might replace human-friendly class names, like

Employee, with less friendly names like 112. As a result, it becomes very

difficult to make any sense of the code, even when the class file has been

successfully decompiled.

Some obfuscators go even

further. They introduce byte code combinations that are notoriously

difficult to decompile. Typically, they do so by modifying byte codes,

adding useless and harmless instructions. An example of this is adding some

dead code–code that is never executed, for example, the code put after the

"return" statement–to the methods. So, even as the program runs

alright, it might not decompile properly.

Unfortunately, many of the

present-day Java decompilers can detect this, and successfully decompile a

class file that’s obfuscated in such a manner.

There are lots of obfuscators

available and some are even free. One of the free ones is HashJava, which

can be downloaded from www.sbktech.org.

Another one, Jobe is free for non-commercial use and is available at www.cs.ucsd.edu/users/ej.

Advertisment