Our running example can be found in Example.java.
We will run a lightweight naive analysis (-a micro
option) on the generated jar
file (-i Example.jar
option). This analysis has only a few basic rules but
it's a good representative skeleton of actual analyses. Since Doop performs a
whole program analysis, the library will be analyzed along with application
code. We can specify the desired version for the library code of the platform (--platform java_8
or --platform android_24
option). If no platform is provided by the user, it is set to a default value (java_8
).
#!bash
$ ./doop -a micro -i docs/doop-101-examples/Example.jar --stats none
After the analysis has run, we can gather results by examining the database. For example, we can view the table generated for the call graph relation by viewing the following file:
#!bash
$ less last-analysis/CallGraphEdge.csv
Input facts are auto-generated by the framework (using Soot and then imported to the database so our rules can process them.
The input schema and the rules for the micro analysis can be found in
file souffle-logic/analyses/micro/analysis.dl
.
For example, the following rule states that when we have a heap allocation
of a ?heap
object that is assigned to variable ?var
inside a method deemed
reachable by our analysis, then we can infer that the variable may point to
this heap object.
#!java
VarPointsTo(?var, ?heap) :-
AssignHeapAllocation(?heap, ?var, ?inMethod),
Reachable(?inMethod).
Furthermore, when we have some kind of assignment (direct or indirect) from one variable to another, and we know that the source variable may point to some heap object, then the target variable may point to the same heap object.
#!java
VarPointsTo(?to, ?heap) :-
Assign(?to, ?from),
VarPointsTo(?from, ?heap).
Notice here that this rule is recursive; previously known facts about the
VarPointTo
relation may lead to the inference of additional facts. Doop
analysis rules are mutually recursive in complex ways.
After the end of an analysis, a symbolic link for the resulting database can be
found under the results
directory. Also, for convenience, a second symbolic
link is created at top level called last-analysis
, each time pointing to the
latest successful analysis.
Note that micro
used here attempts to be self-contained, so it
contains all core login in the same file. To examine the other
analyses of the Doop framework, follow this structure:
- Their input schema can be found in
souffle-logic/facts/flow-insensitivite-schema.dl
. - The rules for an analysis A, can be found in
souffle-logic/analyses/$A/analysis.dl
.
There are two ways to process the analysis results: we can run custom logic alongside the analysis logic (while the analysis runs), or let the analysis compute its results and post-process them.
To illustrate these two approaches, assume we want to find all
variable-points-to information for method Example.test()
, when doing
a context-insensitive analysis.
Put the following logic in file extra.dl:
#!java
.decl Temp(v: Var, h: Value)
Temp(v, h) :-
mainAnalysis.VarPointsTo(_, h, _, v),
Var_DeclaringMethod(v, "<Example: void test(int)>").
.output Temp
Note that some relations belong to the "mainAnalysis" component and thus must be qualified.
Then, run the analysis using the custom logic:
#!bash
$ ./doop -a context-insensitive -i docs/doop-101-examples/Example.jar --stats none --extra-logic extra.dl
The new relation is now included in the results:
#!bash
$ cat last-analysis/Temp.csv
<Example: void test(int)>/@this <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l0#_0 <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l3#_32 <Example: void test(int)>/new Cat/1
<Example: void test(int)>/l4#_33 <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack5 <Example: void test(int)>/new Dog/0
<Example: void test(int)>/$stack6 <Example: void test(int)>/new Cat/1
<Example: void test(int)>/$stack7 <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack8 <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2#_26 <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_1#_28 <Example: void test(int)>/new Dog/0
<Example: void test(int)>/l2_$$A_2#_29 <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_2#_29 <Example: void test(int)>/new Dog/0
The advantage of running custom logic this way is that it has access to all analysis relations (not just those written on disk). The disadvantage is that changes to this custom logic require running the analysis again (and waiting for the analysis logic to be recompiled).
In a file temp.dl put the code:
#!java
.decl Var_DeclaringMethod(v: symbol, m: symbol)
.input Var_DeclaringMethod(IO="file", filename="Var-DeclaringMethod.facts", delimiter="\t")
.decl VarPointsTo(c1: symbol, h: symbol, c2: symbol, v: symbol)
.input VarPointsTo(IO="file", filename="VarPointsTo.csv", delimiter="\t")
.decl Temp(v: symbol, h: symbol)
Temp(v, h) :-
VarPointsTo(_, h, _, v),
Var_DeclaringMethod(v, "<Example: void test(int)>").
.output Temp
Copy the Var-DeclaringMethod.facts so that they are in the same
directory as the output relation VarPointsTo (replace $id
with your
analysis id):
#!bash
$ cp out/$id/facts/Var-DeclaringMethod.facts out/$id/database/
Run the query and view its results:
#!bash
$ souffle -F out/$id/database/ temp.dl
$ cat Temp.csv
<Example: void test(int)>/@this <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l0#_0 <Example: void main(java.lang.String[])>/new Example/0
<Example: void test(int)>/l3#_32 <Example: void test(int)>/new Cat/1
<Example: void test(int)>/l4#_33 <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack5 <Example: void test(int)>/new Dog/0
<Example: void test(int)>/$stack6 <Example: void test(int)>/new Cat/1
<Example: void test(int)>/$stack7 <Example: void test(int)>/new Cat/2
<Example: void test(int)>/$stack8 <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2#_26 <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_1#_28 <Example: void test(int)>/new Dog/0
<Example: void test(int)>/l2_$$A_2#_29 <Example: void test(int)>/new Cat/0
<Example: void test(int)>/l2_$$A_2#_29 <Example: void test(int)>/new Dog/0
We frequently analyze various programs from the DaCapo Benchmarks
suite using a variety of advanced analyses. E.g.,
let's analyze the antlr
benchmark using a 2 type-sensitive analysis.
#!bash
$ ./doop -a 2-type-sensitive+heap -i benchmarks/dacapo-2006/antlr.jar --dacapo --platform java_8
Towards the end of execution, Doop will report a set of metrics gathered from
the analyzed program. Those metrics are computed through the use of various
queries on the resulting database. Those can be found under
souffle-logic/addons/statistics
.
You can use bin/mkjar
to easily generate a jar file from a single java file.
The generated jar file will contain local variable debugging information (e.g.,
variable names).
Example:
#!bash
$ ./bin/mkjar Example.java 1.8
added manifest
adding: Dog.class(in = 292) (out= 210)(deflated 28%)
adding: Animal.class(in = 434) (out= 280)(deflated 35%)
adding: Cat.class(in = 505) (out= 296)(deflated 41%)
adding: Example.class(in = 1055) (out= 653)(deflated 38%)
You can use bytecode2jimple
to easily generate Jimple (or Shimple--the ssa
variant) from a jar file. For more information, invoke bytecode2jimple without
arguments.
Example:
#!bash
$ ./bin/bytecode2jimple -lsystem -d jimple-dir Example.jar
$ ls jimple-dir
Animal.jimple Cat.jimple Dog.jimple Example.jimple
This section is about interacting with the legacy LogicBlox analysis logic. This analysis logic is no longer maintained but may be useful for experimenting with published artifacts or other past work.
As we already saw, the easiest way to interact with the database is to simply print all the entries of a certain predicate.
#!bash
$ bloxbatch -db last-analysis -print FieldPointsTo
predicate: FieldPointsTo(HeapAllocation, FieldSignature, HeapAllocation)
...
/--- start of FieldPointsTo facts ---\
[113914]Example.test/new Cat/1, [11417]<Cat: Cat parent>, [113915]Example.test/new Cat/2
\---- end of FieldPointsTo facts ----/
Doop represents (and abstracts away) objects by using their allocation point in
the program. Example.test/new Cat/1
refers to the second (zero-based
indexing) Cat object allocated inside method Example.test()
.
As expected, the parent
field of the second Cat object may point to the third Cat object.
Our first query is to ask for VarPointTo
entries of variables declared in Example.morePlay()
.
#!bash
$ bloxbatch -db last-analysis -query \
'_(?var, ?heap) <- VarPointsTo(?var, ?heap), Var:DeclaringMethod(?var, "<Example: void morePlay(Cat)>").'
Example.morePlay/@this, Example.main/new Example/0
Example.morePlay/r0, Example.main/new Example/0
Example.morePlay/r3, Example.test/new Cat/1
Example.morePlay/r4, Example.test/new Cat/1
Example.morePlay/r1, Example.test/new Cat/1
Example.morePlay/@param0, Example.test/new Cat/1
Example.morePlay/r2, Example.test/new Cat/2
Example.morePlay/r3, Example.test/new Cat/2
Example.morePlay/r4, Example.test/new Cat/2
The string provided to the -query
option can be a set of left and right arrow
Datalog rules. Newly defined predicates have to start with _
since they will
only exist for the duration of the query evaluation. Refmode values can be used
directly, and the engine will automatically substitute them with their internal
IDs. E.g., the following part
#!java
Var:DeclaringMethod(?var, "<Example: void morePlay(Cat)>")
is equivalent to
#!java
Var:DeclaringMethod(?var, ?method), MethodSignature:Value(?method:"<Example: void morePlay(Cat)>")
Note here that Doop analyzes Java bytecode. Input facts are generated using Soot, which transforms Java bytecode to Jimple, a language based on three address code. As a result new temporary variables are introduced and also original variable names might be lost (they can be retained through specific flags in javac and Soot).
A more advanced query can be found in
query2.logic. Essentially, we compute a
transitive closure on the CallGraphEdge
predicate. The logic used in a query
can be as complicated as in any "normal" Datalog program.
#!bash
_path(?fromMethod, ?toMethod) <-
CallGraphEdge(?invocation, ?toMethod),
Instruction:Method[?invocation] = ?fromMethod.
_path(?fromMethod, ?toMethod) <-
_path(?fromMethod, ?toMethodMid),
CallGraphEdge(?invocation, ?toMethod),
Instruction:Method[?invocation] = ?toMethodMid.
#!bash
$ bloxbatch -db last-analysis -query -file docs/doop-101-examples/query2.logic
<Example: void main(java.lang.String[])>, <Example: void test(int)>
<Example: void test(int)>, <Example: void morePlay(Cat)>
<Example: void main(java.lang.String[])>, <Example: void morePlay(Cat)>
<Example: void test(int)>, <Cat: void setParent(Cat)>
<Example: void main(java.lang.String[])>, <Cat: void setParent(Cat)>
<Example: void test(int)>, <Cat: Cat getParent()>
<Example: void morePlay(Cat)>, <Cat: Cat getParent()>
<Example: void main(java.lang.String[])>, <Cat: Cat getParent()>
<Example: void test(int)>, <Cat: void play()>
<Example: void morePlay(Cat)>, <Cat: void play()>
<Animal: Animal playWith(Animal)>, <Cat: void play()>
<Example: void main(java.lang.String[])>, <Cat: void play()>
<Example: void test(int)>, <Dog: void play()>
<Example: void main(java.lang.String[])>, <Dog: void play()>
<Example: void test(int)>, <Animal: Animal playWith(Animal)>
<Example: void morePlay(Cat)>, <Animal: Animal playWith(Animal)>
<Example: void main(java.lang.String[])>, <Animal: Animal playWith(Animal)
The line Instruction:Method[?invocation] = ?fromMethod
found in the previous
query uses a special form of predicate known as a functional predicate.
Those are similar to normal ones, but they act like a map. Values found between
the square brackets are mapped to only on value on the right.
For example, one metric is the computation of casts that potentially may fail. It joins input facts as well as facts computed during execution to infer casts where the related variable may point to an object that is incompatible with the type of the cast.
#!java
_Stats:Simple:PotentiallyFailingCast(?type, ?from, ?to) <-
_Stats:Simple:ReachableCast(_, ?type, ?to, ?from),
Stats:Simple:InsensVarPointsTo(?heap, ?from),
HeapAllocation:Type[?heap] = ?heaptype,
! AssignCompatible(?type, ?heaptype).
The use of _
as a predicate parameter denotes that we don't care for a
specific value. It represent a parameter that is not bound.
The above query can be found isolated in query3.logic.
#!bash
$ bloxbatch -db last-analysis -query -file docs/doop-101-examples/query3.logic
Datalog supports the use of aggregation functions. One such function is
count
. E.g., let's to compute the total number of VarPointsTo entries.
#!bash
$ bloxbatch -db last-analysis -query \
'_[] = ?n <- agg<<?n = count()>> VarPointsTo(_, _, _, _).'
4569312
Using additional runtime flags, you can emit extra debug information related to
the evaluation process of the Datalog engine. Explaining what kind of
information is collected, is out of the scope of this tutorial, but this can be
fed to bin/log-analyzer.py
which will then emit profile information for each
Datalog rule.
Example:
#!bash
$ ./doop -a context-insensitive -i Example.jar -- -logLevel debugDetail@factbus > log.txt
$ ./bin/log-analyzer.py log.txt
ImpossibleExceptionHandler(?handler :: ExceptionHandler, ?type :: Type, ?instruction :: Instruction) <-
PossibleExceptionHandler(?handler :: ExceptionHandler, ?type :: Type, ?instruction :: Instruction),
ExceptionHandler:Before(?previous :: ExceptionHandler, ?handler :: ExceptionHandler),
PossibleExceptionHandler(?previous :: ExceptionHandler, ?type :: Type, ?instruction :: Instruction).
=> 7.27862s
...