Android APK decompilation

8 Jun 2015

I'll try here to give a short overview of the various tools that can be used to analyse Android packaged applications (APKs) as they are found in the wild.

The main motivation behind this document is that when you want to know how to reverse an APK to Java source code, the most frequent answer is "use dex2jar and jd-gui", but this is an outdated answer: these tools produce poorly reversed code, are not clearly maintained or open-source and sometimes don't work at all.

Also note that I do not take into account the decompilers and converters included in the Soot framework as they have different goals, but they do a good job in general, and are really quick at fixing issues.

A word on compilation

So first things first, before we get on tools for APK reversing, here is a summary of the procedure to follow when you create an APK.

APK creation

  1. Write a Java program using Android elements (like activities, etc), with a manifest, resources and all the stuff you need.
  2. The Java code (.java files) is compiled to Java bytecode (.class files) by some Java compiler, usually its the javac shipped in all JDKs.
  3. If there is native code (C/C++) in the application, it is compiled by the NDK to a binary for the chosen architectures
  4. The Java bytecode is converted to Dalvik bytecode with the dx tool, and packaged in the classes.dex blob.
  5. At some point (I guess it's here), resources get converted to a binary format for some reason; e.g. the application manifest, AndroidManifest.xml, is transformed to a binary format (AXML).
  6. Then everything gets packaged in a ZIP file (but with .apk extension), which is signed with jarsigner and optimized with zipalign.

At this point, you get a ready-to-ship APK.

Post-treatments during installation

The application gets some modification when it's being installed on a device, and it's interesting to know what happens there.

With Android 5.0, the Dalvik virtual machine is supposed to disappear because the Dalvik bytecode contained in APKs is converted to native code.

Before Android 5.0, Dalvik bytecode was simply used by the Dalvik VM, it was only optimised by dexopt, which created an .odex file from the classes.dex.

After Android 5.0, the bytecode is converted to native code by dex2oat to a regular ELF file.

Wikipedia has a nice scheme to sum it up.

In short, it's going to fuck up a good part of state-of-the-art intrusion detection systems which were based on modifications and analysis of the Dalvik VM (so Blare stands strong!). On a more positive note, APK should still by shipped only with Dalvik bytecode for retro-compatibility, so bytecode analysis is still a relevant topic (and this article isn't useless).

Source code decompilers

In no particular order. Decompilation of resources is explained in the next section.

dex2jar / jar2dex

These are the tools usually recommanded to convert DEX archives in JAR archives (so Dalvik to Java bytecode and the other way around). Hence it's not a Java decompiler.

It is not really maintened, nor really well written. For a better alternative, look for Dare below.

Repo (google code).

JD-Gui

Java decompiler with a simple GUI to visualise and extract source code. The code is of poor quality, you can end up with snippets like this:

while (true)
{
  return 1;
  startUpdater();
  continue;
  startUpdater();
}

Methods sometimes fail to decompile properly if they're too obfuscated or complicated (but that may happen on every Java decompiler if I remember correctly). These methods are written as Java bytecode in a comment, so you can't recompile from this source after.

This is not (or hardly) usable as a black-box command-line tool because the internal decompiler lib is not public; you have to use the GUI.

trashed.jpg

Site.

Bytecode Viewer (BCV)

This is a better alternative to JD-Gui: a Java decompiler, which show a compatibility with Android (but actually it seems to use dex2jar for that purpose). It presents itself like a decompilation framework "à la JEB".

The quality of decompiled code looks correct; actually, it is based on several decompilers and seems to be able to benefit from all their advantages to output the better decompiled code possible.

Sadly it doesn't seem to be really usable as a command-line tool, and weights more than 371 MB...

Site and repo.

JEB

JEB, alias the Interactive Android Decompiler (joke with IDA's name, in rot1), is a commercial tool specialised in Android app reverse.

It works directly on Dalvik bytecode, and can decompile directly in Java, so it may be better than classical methods which usually convert Dalvik bytecode to Java bytecode and decompile that Java bytecode. Decompiling directly Dalvik bytecode should then avoid possibly losing some information. Well, that's a point of view.

There is a strong emphasis on malware analysis:

The good thing about JEB is that the Dalvik disassembler and the Java decompiler is written from scratch and designed from top to bottom to be able to analyse malware (including anti-decompilation tricks in Dalvik bytecode level)

It gives all the features you usually have in a reverse framework, like IDA for x86 binaries: variable renaming, comments in code, several views (bytecode, Java), proper Java decompilation, Smali export, etc.

Sadly, a personal license costs $1K (with 30% off for researchers) and there is no trial version, so no thanks.

Site.

CFR

Now we're getting to know the smaller Java decompilers, which usually work only in command-line, and have that Unix philosophy of doing one thing and doing it well (which is extremely lacking in Android malware analysis).

CFR is a modern Java decompiler (by modern I mean it handles Java 8 features), with a minimalist front-end in command-line. The code it gives is quite satisfying and fails hardly ever. Actually, it's the main decompiler used by Bytecode Viewer.

CFR seems to be, in general, the best Java decompiler available, but that means that you need to get some Java bytecode from Dalvik bytecode first.

Site.

Procyon

Done by a co-worker of the guy behind CFR. It looks nice but he seems to be rather encouraging people to use CFR instead of Procyon, so I didn't really bothered to test it in depth. May be worth a check though.

Repo.

Ded & Dare

Ded and Dare are two projects of the same team: they are Dalvik to Java bytecode converters. Dare is the sequel of the Ded project and should be used today; there seems to be no reason to still use Ded in 2015.

Dare presents excellent performances in its article, with a flaw-less conversion in more than 99.99% of classes found in free Google Play apps. I tried it on some malwares, it installs and runs just fine.

DAD

DAD (DAD is A Decompiler) is the decompiler used by Androguard (presented below). There are few infos on it, beside that it's supposed to be quick:

DAD is the default decompiler (python) of Androguard (include in the project), so you have nothing to do for the installation, and DAD is very fast due to the fact that it is a native decompiler, and the decompilation is done on the fly for each class/method.

Well, I'll just skip this one I guess. I suppose it produces rather fine but sub-optimal results.

JAD

An old Java decompiler. From what I read, it had some success some years ago, but it's now completely abandoned to the point where there is no official site anymore, only mirrors to old binaries, so I'll trash this one.

Moreover, according to the decompiled code comparisons on the JEB website, JAD tends to put gotos everywhere, which is dirty af in Java code.

Mirrors, if you really need to take a look.

JadX

The last decompiler I tried is an interesting project which reverses everything in an APK: bytecode and resources.

The decompiled code is decent, but it fails on some slightly obfuscated apps, like the com.google.ads.util.c.c.a method in the 8c2f25178e80f8edfb0ade73075eb681 sample.

Repo.

Other tools

Other tools are worth checking out, especially to reverse APK resources, which were not always handled by the projects above.

Apktool

Tool meant to help APK reverse engineering, still actively developed.

Do not get confused though, Apktool can not decompile Dalvik bytecode to Java; but it can output a Smali version of the application code. Smali is a language that looks like high-level Dalvik bytecode.

Apktool can reverse Android resources like AXML, so it's useful to get a readable AndroidManifest.xml, multimedia files, etc.

It is also supposed to be able to recompile an APK it decompiled before. It can be a very useful feature if you don't mind editing Smali, as it's much easier to use the pair of commands below than to recreate a complete Java project from an application.

# Reverse foo.apk's content in the "foo" folder
$ apktool d foo.apk

# Modify Smali code or resources as you wish...
# ... and rebuild an APK
$ apktool b foo -o new_foo.apk

It's not perfectly robust though, APK unzipping is done with the standard Java lib which tends to throw up when the file isn't perfectly standard.

Site and repo.

Androguard

Another set of tools dedicated to Android reverse engineering.

Androguard is composed mainly of command-line tools, some that can be used in an interactive shell. I like that hacker way of providing a set of tools meant to fulfil specific reverse needs.

Here is a list of the most relevant tools:

Sadly it's not very stable: when I tried to use AndroXGMML et ApkViewer, they both crashed on very simple applications :( Moreover, it ships with 3 decompilers, but not state-of-the-art ones: DAD, Ded et JAD (with dex2jar).

There is a nice effort done on graph visualisation in this framework though; it is able to create control-flow graphs (CFG) of methods and do neat IDA-like views of bytecode. There is also some tries to generate a function call graph, which is not a simple problem (more info on their wiki).

So, what should I use?

To get the best Java source code and resources, you should use the following tools, according to what you want to do.

That's not a fool-proof way, I'm not sure there is one, but it should let you inspect sources of the vast majority of Android applications and malwares available in the wild :)

Compile again!

Now that I have a fully reversed application, maybe with modified code and resources, how can I create a new functional APK from it?

I'm still not sure about a reliable way to do it. If all methods were decompiled successfully, you should be able to compile them again, run dx, encode resources, etc.

See my later post about using Apktool and the Smali language.