- Dong, Shuaike;
- Li, Menghao;
- Diao, Wenrui;
- Liu, Xiangyu;
- Liu, Jian;
- Li, Zhou;
- Xu, Fenghao;
- Chen, Kai;
- Wang, Xiaofeng;
- Zhang, Kehuan
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction.