What is it that makes an app malicious? One important factor is that malicious apps treat sensitive data differently from benign apps. To capture such differences, we mined the top 2,866 benign Android applications for their data flow from sensitive sources, and compare these flows against those found in malicious apps. We find that

  1. for every sensitive source, the data ends up in a small number of typical sinks;
  2. these sinks differ considerably between benign and malicious apps;
  3. these differences can be used to flag malicious apps due to their abnormal data flow;
  4. malicious apps can be identified by their abnormal data flow alone, without requiring known malware samples.

In our evaluation, our MUDFLOW prototype correctly identified 86.4% of all novel malware, and 90.1% of novel malware leaking sensitive data.



The dataset as well as scripts for the statistical analysis are free for use with an obligatory citation of the MUDFLOW paper.

  • Download the preprint of the MUDFLOW paper here.
  • Download the whole dataset of the MUDFLOW experiments here
  • Downalod scripts for reproducing results from the paper here
  • Download improved scripts for our statistical analysis: here. NOTE: they produce results that are different from the paper