Static Program Analysis(book note)

Introduction

Asking questions about program statically - can be trivially reduced to the halting problem(undecidable)
Give the best approximation as possible (ability to answer IDK)
conservative approximations to only answer know if it always know or IDK

A Tiny Programming Language

Syntax of TIP

Expression
Statement
Function
Pointer
Programs - a collection of functions

Control Flow Graphs

directed graph
a part of code as a node in CFG
one point of entry and one point of exit

Type Analysis

Types

various operations might be intended only for certain “types” of arguments
finite types
regular types - regular trees - contains only finitely many different subtrees
- regular types can be represented by finit automata¹

Type constraints

introduce type variable for identifier and non-identifier expression
systematically defined for each construct
solution assigns to each type variable a type - all constraints satisfied

Solving constraints

If solution exist, computed in linear time using unification algorithm
regular type: infinite unfolding \(\phi = (\& int, \phi) \rightarrow int\)
recursive types also required for data structures

Slack and Limitations

type analysis is only approximate
- Certain program will be unfairly rejected
- Ignores some runtime errors: dereference null, reading uninitialized vars, escaping stack cell …

Lattice Theory

lattice
abstract values
definite information - only if all will path and data will evaluate to the same abstract value
Abstract value \(\bot\) for expressions whose values are not numbers

Lattices

Partial Order : a Set S equipped with a binary relation \(\sqsubseteq\)

reflexivity \(\forall x \in S: x \sqsubseteq x\)
transitivity \(\forall x,y,z \in S: x\sqsubseteq y \wedge y\sqsubseteq z \Rightarrow x \sqsubseteq z\)
anti-symmetry \(\forall x, y \in S: x \sqsubseteq y \wedge y\sqsubseteq x \Rightarrow x = y\)
when \(x \sqsubseteq y\) y is a safe approximation or an over-approximation of x, x is at least as precise as y
upper bound for X \(X\sqsubseteq y\) \(\forall x \in X, x\sqsubseteq y\)
- least upper bound \(\sqcup X\)
lower bound for X \(y \sqsubseteq X\)
- least lower bound \(\sqcap X\)

lattice partial order in which \(\sqcap X\) exists for all \(X \subseteq S\) - complete lattice

a unique larget element \(\top = \sqcup S\)
a unique smallest element \(\bot = \sqcap S\)
finite lattices
- \(\bot,\top\) exists
- \(\forall x,y\) have a least upper bound and a greatest lower bound
  - \(x \sqcup y\)
  - \(x \sqcap y\)
Hasse diagram
- elements are nodes
- order relation is the transitive closure of edfes leading from lower to higehr nodes
height is the length of the longest path from bot to top
powerset lattice

Constructing lattices

product of lattices are of finite height if each lattices is: sum of each height

sum operator the height is the maximum height

lift a lattice is creating a new bottom

if A is a set, flat is creating a lattice with only one layer - height 2

map lattice: A is a set and L is a lattice \(A \mapsto L = \{[a_1 \mapsto x_1, \dots, a_n \mapsto x_n | x_i \in L\}\) ordered pointwise

\(height(A\mapsto L) = |A| \cdot height(L)\)

Equations and fixed-point

Monotonic endofunction of lattice: the order is transfered, does not mean extensive that the function can produce all element in the lattice

fixed-point \(f(x)=x\), least fixed point \(x\sqsubseteq y\) for all fixed point y

equation systems for finite height lattice L \(F_i(x_1, \dots, x_n), F = (F_1,\dots,F_n)\)

If all the functions monotone then the system has a unique least solution

Fixed-point theorem: unique least fixed point for monotone function f \(fix(f)=\bigsqcup_{i\geq 0} f^i(\bot)\) continous application of f

fixed-point algorithm is bounded by the height of the lattice
the cost of computing f
the cost of testing equality

Dataflow analysis with the Monotone Framework

CFG and lattice L with finite height
may be parameterized with the given program
dataflow constraint that relates the value of the variable of the corresponding node to those of other nodes
sound - all solutions corresponds to correct information about the program, but conservative

Fixed-point algorithms

Continuous applying F
chaotic iteration - with the variable picked nondeterministically
Worklist algorithm - add and remove - dependence graph as a map

Example : Sign analysis

\(L: Vars \mapsto Sign\)
\([[v]] = JOIN(v) = \bigsqcup_{w \in pred(v)} [[v]]\) - depends on the order of the traversal
\(eval(\sigma,id) = \sigma(id)\)
\(eval(\sigma, intconst) = sign(intconst)\)
\(eval(\sigma, E_1 op E_2) = \bar{op}(eval(\sigma,E_1),eval(\sigma, E_2))\)
Monotonicity check \(O(n^3)\)

Example: Liveness

live - current value may be read during the remaining of the program
powerset lattice

Example: Available Expression

available - if value is computed previously
\(\downarrow id\) to remove all expression includes \(id\)

Example: Very busy expressions

evaluated again before value changes
code hoisting

Example: Reaching definitions

can be used to construct def-use graph
dead code elimination and code motion

Forwards, backwards, may, and must

forwards analysis
- compute information about the bast behavior
- depends on the predecessors
backwards analysis
- compute information about the future behavior
- depends on the successors
may analysis
- describe information that may possibly be ture, upper approximation
- using union operator to combine information
must analysis
- describe information that must definitely be true, lower approximation
- using intersection operator to combine information

Examples

Initialized variables
Constant propagation
Interval analysis

Widening

widening using a function \(w: L^n \rightarrow L^n\)
\(w \circ F\)

Narrowing

applying F after widening
\(\forall i : fix \sqsubseteq F^{i+1}(fixw) \sqsubseteq F^i(fixw) \sqsubseteq fixw\)
Not guaranteed to converge - heuristics must determine how many times to apply narrowing

?↩