Static Program Analysis(book note)

Introduction

  • Asking questions about program statically - can be trivially reduced to the halting problem(undecidable)
  • Give the best approximation as possible (ability to answer IDK)
  • conservative approximations to only answer know if it always know or IDK

A Tiny Programming Language

Syntax of TIP

  • Expression
  • Statement
  • Function
  • Pointer
  • Programs - a collection of functions

Control Flow Graphs

  • directed graph
  • a part of code as a node in CFG
  • one point of entry and one point of exit

Type Analysis

Types

  • various operations might be intended only for certain “types” of arguments
  • finite types
  • regular types - regular trees - contains only finitely many different subtrees
    • regular types can be represented by finit automata1

Type constraints

  • introduce type variable for identifier and non-identifier expression
  • systematically defined for each construct
  • solution assigns to each type variable a type - all constraints satisfied

Solving constraints

  • If solution exist, computed in linear time using unification algorithm
  • regular type: infinite unfolding \(\phi = (\& int, \phi) \rightarrow int\)
  • recursive types also required for data structures

Slack and Limitations

  • type analysis is only approximate
    • Certain program will be unfairly rejected
    • Ignores some runtime errors: dereference null, reading uninitialized vars, escaping stack cell …

Lattice Theory

  • lattice
  • abstract values
  • definite information - only if all will path and data will evaluate to the same abstract value
  • Abstract value \(\bot\) for expressions whose values are not numbers

Lattices

Partial Order : a Set S equipped with a binary relation \(\sqsubseteq\)

  • reflexivity \(\forall x \in S: x \sqsubseteq x\)
  • transitivity \(\forall x,y,z \in S: x\sqsubseteq y \wedge y\sqsubseteq z \Rightarrow x \sqsubseteq z\)
  • anti-symmetry \(\forall x, y \in S: x \sqsubseteq y \wedge y\sqsubseteq x \Rightarrow x = y\)
  • when \(x \sqsubseteq y\) y is a safe approximation or an over-approximation of x, x is at least as precise as y

  • upper bound for X \(X\sqsubseteq y\) \(\forall x \in X, x\sqsubseteq y\)
    • least upper bound \(\sqcup X\)
  • lower bound for X \(y \sqsubseteq X\)
    • least lower bound \(\sqcap X\)

lattice partial order in which \(\sqcap X\) exists for all \(X \subseteq S\) - complete lattice

  • a unique larget element \(\top = \sqcup S\)
  • a unique smallest element \(\bot = \sqcap S\)
  • finite lattices
    • \(\bot,\top\) exists
    • \(\forall x,y\) have a least upper bound and a greatest lower bound
      • \(x \sqcup y\)
      • \(x \sqcap y\)
  • Hasse diagram
    • elements are nodes
    • order relation is the transitive closure of edfes leading from lower to higehr nodes
  • height is the length of the longest path from bot to top

  • powerset lattice

Constructing lattices

product of lattices are of finite height if each lattices is: sum of each height

sum operator the height is the maximum height

lift a lattice is creating a new bottom

if A is a set, flat is creating a lattice with only one layer - height 2

map lattice: A is a set and L is a lattice \(A \mapsto L = \{[a_1 \mapsto x_1, \dots, a_n \mapsto x_n | x_i \in L\}\) ordered pointwise

\(height(A\mapsto L) = |A| \cdot height(L)\)

Equations and fixed-point

Monotonic endofunction of lattice: the order is transfered, does not mean extensive that the function can produce all element in the lattice

fixed-point \(f(x)=x\), least fixed point \(x\sqsubseteq y\) for all fixed point y

equation systems for finite height lattice L \(F_i(x_1, \dots, x_n), F = (F_1,\dots,F_n)\)

If all the functions monotone then the system has a unique least solution

Fixed-point theorem: unique least fixed point for monotone function f \(fix(f)=\bigsqcup_{i\geq 0} f^i(\bot)\) continous application of f

  • fixed-point algorithm is bounded by the height of the lattice
  • the cost of computing f
  • the cost of testing equality

Dataflow analysis with the Monotone Framework

  • CFG and lattice L with finite height
  • may be parameterized with the given program
  • dataflow constraint that relates the value of the variable of the corresponding node to those of other nodes
  • sound - all solutions corresponds to correct information about the program, but conservative

Fixed-point algorithms

  • Continuous applying F
  • chaotic iteration - with the variable picked nondeterministically
  • Worklist algorithm - add and remove - dependence graph as a map

Example : Sign analysis

  • \(L: Vars \mapsto Sign\)
  • \([[v]] = JOIN(v) = \bigsqcup_{w \in pred(v)} [[v]]\) - depends on the order of the traversal
  • \(eval(\sigma,id) = \sigma(id)\)
  • \(eval(\sigma, intconst) = sign(intconst)\)
  • \(eval(\sigma, E_1 op E_2) = \bar{op}(eval(\sigma,E_1),eval(\sigma, E_2))\)
  • Monotonicity check \(O(n^3)\)

Example: Liveness

  • live - current value may be read during the remaining of the program
  • powerset lattice

Example: Available Expression

  • available - if value is computed previously
  • \(\downarrow id\) to remove all expression includes \(id\)

Example: Very busy expressions

  • evaluated again before value changes
  • code hoisting

Example: Reaching definitions

  • can be used to construct def-use graph
  • dead code elimination and code motion

Forwards, backwards, may, and must

  • forwards analysis
    • compute information about the bast behavior
    • depends on the predecessors
  • backwards analysis
    • compute information about the future behavior
    • depends on the successors
  • may analysis
    • describe information that may possibly be ture, upper approximation
    • using union operator to combine information
  • must analysis
    • describe information that must definitely be true, lower approximation
    • using intersection operator to combine information

Examples

  • Initialized variables
  • Constant propagation
  • Interval analysis

Widening

  • widening using a function \(w: L^n \rightarrow L^n\)
  • \(w \circ F\)

Narrowing

  • applying F after widening
  • \(\forall i : fix \sqsubseteq F^{i+1}(fixw) \sqsubseteq F^i(fixw) \sqsubseteq fixw\)
  • Not guaranteed to converge - heuristics must determine how many times to apply narrowing

  1. ?