Reverse engineering an application requires attackers to invest time and effort doing manual and automatic analyses. When a new version of the application is released, this investment could be lost completely, if all the analyses had to be re-done. The gained insights into how an application functions might be transferred from one version to the next, however, if the versions do not differ too much. Diffing tools are thus valuable to reverse engineers attempting to transfer their knowledge across versions, as well as to defenders trying to assess this attack vector, and whether or how much a new version has to be diversified. While such diffing tools exist and are in widespread use for binary applications, they are in short supply for Android apps.
This paper presents ApkDiff, a tool for diffing Android apps based on the semantic features of the class structure. To evaluate our tool we selected 20 popular financial apps available in the Google Play Store, and tracked their version updates over eight months. We found that on average 79% of all classes had a unique match across version updates. When we consider only classes for which we detect explicit obfuscations being applied (by applying heuristics on their identifiers), we still find that we can find a match for 56% of the classes (ranging from 23% to 85%), suggesting that these obfuscated apps are not resilient to our matching strategies. Our results suggest that ApkDiff provides a valuable approach to diffing Android apps.
Program obfuscation is one of the frequently used methods to make malware hard to analyze. Among the various obfuscation techniques, Mixed Boolean-Arithmetic (MBA) obfuscation, which mixes arithmetic and Boolean operations in an expression, is often considered hard to solve. Recently, synthesis-based methods have emerged to simplify MBA-obfuscated expressions. However, despite promising results, they still have limitations. Fortunately, recent work in super optimization shows that stochastic synthesis is generally sped up by a proper restart strategy. We adopt this principle to enhance the performance of existing works. Experimental results show that we would achieve improvement in the rate of correct answers and better length reduction.
Mixed Boolean-Arithmetic (MBA) expressions are frequently used for obfuscation. As they combine arithmetic as well as Boolean operations, neither arithmetic laws nor transformation rules for logical formulas can be applied to suitably complex expressions, making MBAs hard to simplify and solve.
In 2019, Liu et al. demystified linear MBAs, leveraging a transformation between the set B={0, 1} of bit values and the set Bn of words of length n∊N for linear MBAs, originally introduced by Zhou et al. in 2007. With their MBA-Blast and MBA-Solver algorithms, they outperform existing tools noticably in terms of performance as well as ability to simplify of such MBAs.
We propose a surprisingly simple algorithm called SiMBA that improves upon MBA-Blast and MBA-Solver in that it can deobfuscate all linear MBAs, does not miss particularly simple solutions and takes only a fraction of their runtime.