Problem

Dynamic analysis is a general term for techniques that verify the behavior of applications during their execution.  This project consists of a study on how the detection of data flow integrity violations in PHP code can be achieved by means of dynamic analysis tools.


Experimental part

The aim is to implement a simple dynamic analysis tool to detect common attacks against PHP code running in a web server. Specifically, the tool has to verify if unsanitized/unvalidated input reaches certain sensitive sinks (or sensitive functions). 

The tool has to run over sequences of PHP instructions executed by the server (traces). These traces can be extracted using a PHP debugger like Xdebug (http://xdebug.org/).

The tool essentially has to search for certain vulnerable patterns in the traces. All patterns have 4 elements:

  • name of vulnerability (e.g., SQL injection)
  • a set of entry points (e.g., $_GET, $_POST),
  • a set of sanitization/validation functions (e.g., mysql_real_escape_string),
  • and a set of sensitive sinks (e.g., mysql_query). 

If a data flow comes from an entry point and reaches a sensitive sink:

  • if it passes through a sanitization/validation function, there is probably no vulnerability;
  • if it does not pass through a sanitization/validation function, there is probably a vulnerability.

The tool has to look for these patterns in the traces.

The patterns are loaded from a file with a simple format: the file is a sequence of patterns; each pattern is represented by 4 lines, one per element; each element of a set (entry point, sanitization/validation function, sensitive sink) is separated by a comma.  An example file with two patterns is the following:

SQL injection

$_GET,$_POST,$_COOKIE

mysql_escape_string,mysql_real_escape_string,mysql_real_escape_string 

mysql_query,mysql_unbuffered_query,mysql_db_query

SQL injection

$_GET,$_POST,$_COOKIE 

pg_escape_string,pg_escape_bytea

pg_query,pg_send_query

More patterns can be extracted from the table in this page: http://awap.sourceforge.net/support.html


The tool can have two levels of complexity (for increasing grades):

  1. The tool checks if a trace has an entry point and afterwards calls a sensitive sink. If that is the case, it returns a warning and prints the input, the sensitive sink and its arguments. If between the input and the sensitive sink a sanitization/validation function is called, this is also told in the warning. Notice that this solution can show false vulnerabilities as the input may not actually be provided to the sensitive sink.
  2. The tool does the same but only if there is a data flow from an entry point to a sensitive sink, possibly passing in a sanitization function. An example of such a data flow (without sanitization function) is the following: 

    $a = $ POST[‘a’]; 
    $b = $a; 
    $c = $b; 
    $q = “SELECT * FROM critical WHERE c=‘$c’ ”;

    On the contrary of (1), this requires tracking the propagation of data between variables so it is more complex to implement, but also provides more precise detection.

The tool should be evaluated experimentally with a few PHP scripts (e.g., from Mutillidae, a vulnerable web application). If possible, it should be installed in the virtual machine provided in the SSof course.


Report

The report shall:

  1. Present the design of the tool, the main design options, and the output of the tool for a few examples. (Maximum 2 pages.)
  2. Discuss the state of the art of research on dynamic information flow for detecting security vulnerabilities analysis techniques that are directed towards web-applications.  The following articles should be included.  You are encouraged to also consider references to and from them, as found via Google Scholar, and to further search for other relevant ones.
  3. Explain how the results achieved in the papers mentioned in the previous point apply to the problem that is being addressed.  Give examples of how they could be used or adapted to the concrete scenario that the experimental part of the project addresses.  Propose more elaborate tool for tackling the problem that is considered in this project, that takes into consideration the related research.