Static Program Analysis(book note)
Introduction
- Asking questions about program statically - can be trivially reduced to the halting problem(undecidable)
- Give the best approximation as possible (ability to answer IDK)
- conservative approximations to only answer know if it always know or IDK
A Tiny Programming Language
Syntax of TIP
- Expression
- Statement
- Function
- Pointer
- Programs - a collection of functions
Control Flow Graphs
- directed graph
- a part of code as a node in CFG
- one point of entry and one point of exit
Type Analysis
Types
- various operations might be intended only for certain “types” of arguments
- finite types
- regular types - regular trees - contains only finitely many different subtrees
- regular types can be represented by finit automata1
Type constraints
- introduce type variable for identifier and non-identifier expression
- systematically defined for each construct
- solution assigns to each type variable a type - all constraints satisfied
Solving constraints
- If solution exist, computed in linear time using unification algorithm
- regular type: infinite unfolding \(\phi = (\& int, \phi) \rightarrow int\)
- recursive types also required for data structures
Slack and Limitations
- type analysis is only approximate
- Certain program will be unfairly rejected
- Ignores some runtime errors: dereference null, reading uninitialized vars, escaping stack cell …
Lattice Theory
- lattice
- abstract values
- definite information - only if all will path and data will evaluate to the same abstract value
- Abstract value \(\bot\) for expressions whose values are not numbers
Lattices
Partial Order : a Set S equipped with a binary relation \(\sqsubseteq\)
- reflexivity \(\forall x \in S: x \sqsubseteq x\)
- transitivity \(\forall x,y,z \in S: x\sqsubseteq y \wedge y\sqsubseteq z \Rightarrow x \sqsubseteq z\)
- anti-symmetry \(\forall x, y \in S: x \sqsubseteq y \wedge y\sqsubseteq x \Rightarrow x = y\)
when \(x \sqsubseteq y\) y is a safe approximation or an over-approximation of x, x is at least as precise as y
- upper bound for X \(X\sqsubseteq y\) \(\forall x \in X, x\sqsubseteq y\)
- least upper bound \(\sqcup X\)
- lower bound for X \(y \sqsubseteq X\)
- least lower bound \(\sqcap X\)
lattice partial order in which \(\sqcap X\) exists for all \(X \subseteq S\) - complete lattice
- a unique larget element \(\top = \sqcup S\)
- a unique smallest element \(\bot = \sqcap S\)
- finite lattices
- \(\bot,\top\) exists
- \(\forall x,y\) have a least upper bound and a greatest lower bound
- \(x \sqcup y\)
- \(x \sqcap y\)
- Hasse diagram
- elements are nodes
- order relation is the transitive closure of edfes leading from lower to higehr nodes
height is the length of the longest path from bot to top
powerset lattice
Constructing lattices
product of lattices are of finite height if each lattices is: sum of each height
sum operator the height is the maximum height
lift a lattice is creating a new bottom
if A is a set, flat is creating a lattice with only one layer - height 2
map lattice: A is a set and L is a lattice \(A \mapsto L = \{[a_1 \mapsto x_1, \dots, a_n \mapsto x_n | x_i \in L\}\) ordered pointwise
\(height(A\mapsto L) = |A| \cdot height(L)\)
Equations and fixed-point
Monotonic endofunction of lattice: the order is transfered, does not mean extensive that the function can produce all element in the lattice
fixed-point \(f(x)=x\), least fixed point \(x\sqsubseteq y\) for all fixed point y
equation systems for finite height lattice L \(F_i(x_1, \dots, x_n), F = (F_1,\dots,F_n)\)
If all the functions monotone then the system has a unique least solution
Fixed-point theorem: unique least fixed point for monotone function f \(fix(f)=\bigsqcup_{i\geq 0} f^i(\bot)\) continous application of f
- fixed-point algorithm is bounded by the height of the lattice
- the cost of computing f
- the cost of testing equality
Dataflow analysis with the Monotone Framework
- CFG and lattice L with finite height
- may be parameterized with the given program
- dataflow constraint that relates the value of the variable of the corresponding node to those of other nodes
- sound - all solutions corresponds to correct information about the program, but conservative
Fixed-point algorithms
- Continuous applying F
- chaotic iteration - with the variable picked nondeterministically
- Worklist algorithm - add and remove - dependence graph as a map
Example : Sign analysis
- \(L: Vars \mapsto Sign\)
- \([[v]] = JOIN(v) = \bigsqcup_{w \in pred(v)} [[v]]\) - depends on the order of the traversal
- \(eval(\sigma,id) = \sigma(id)\)
- \(eval(\sigma, intconst) = sign(intconst)\)
- \(eval(\sigma, E_1 op E_2) = \bar{op}(eval(\sigma,E_1),eval(\sigma, E_2))\)
- Monotonicity check \(O(n^3)\)
Example: Liveness
- live - current value may be read during the remaining of the program
- powerset lattice
Example: Available Expression
- available - if value is computed previously
- \(\downarrow id\) to remove all expression includes \(id\)
Example: Very busy expressions
- evaluated again before value changes
- code hoisting
Example: Reaching definitions
- can be used to construct def-use graph
- dead code elimination and code motion
Forwards, backwards, may, and must
- forwards analysis
- compute information about the bast behavior
- depends on the predecessors
- backwards analysis
- compute information about the future behavior
- depends on the successors
- may analysis
- describe information that may possibly be ture, upper approximation
- using union operator to combine information
- must analysis
- describe information that must definitely be true, lower approximation
- using intersection operator to combine information
Examples
- Initialized variables
- Constant propagation
- Interval analysis
Widening
- widening using a function \(w: L^n \rightarrow L^n\)
- \(w \circ F\)
Narrowing
- applying F after widening
- \(\forall i : fix \sqsubseteq F^{i+1}(fixw) \sqsubseteq F^i(fixw) \sqsubseteq fixw\)
- Not guaranteed to converge - heuristics must determine how many times to apply narrowing
?↩