Advanced Scientific Computing

Guest Lectures

Here

Numerical algorithms and errors

Source and types of errors

  • Errors in the problem
    • mathematical model
    • input data
  • Approximation errors
    • Discretization
    • Convergence
    • Roundoff

Example : Trade-off between discretization & roundoff for compute derivative.

Algorithm Properties

  • Accuracy
    • contrast with precision
  • Efficiency: hard to measure
    • algo, code
    • cpu, memory, disk, network
    • performance vs complexity
  • Robustness

Problem conditioning and stability

  • ill-conditioned: small purturbation -> large difference in result
  • stability:
  • well posed: exists solution, unique, continuous change with initial

Condition number

how output change for a small change in the input. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input.

  • for a nonlinear function(continuous): \(\kappa(f) = \frac{x f'}{f}\)
  • As a rule of thumb, if the condition number \(\kappa(A) = 10^k\), then you may lose up to k digits of accuracy on top of what would be lost to the numerical method due to loss of precision from arithmetic methods.

Roundoff Errors

  • inevitable
  • every elementary operation (+,*) introduces a small, random relative error, up to about \(10^{-16}\)
  • sometime can be ignored but may cause error

\[fl(x) = \begin{cases} 1.d_1d_2 \cdots d_t \times 2^{e} & d_{t+1}=0 \\ \mbox{nearest even} & otherwise \end{cases}\] relative floating point error(rounding, truncate will double the error): \[\frac{|fl(x) - x|}{|x|} \leq \frac12 2^{-t}\]

  • rounding methods
    • truncate
    • round up
    • round down
    • round to the nearest value, ties away from zero
    • round to the nearest value, ties to even Most Used
  • Floating point number representation: number of significant digits
  • spacing not constant(absolute), constant(relative)
  • at 0, subnormal numbers
  • Quadruple: 14+113
  • Double precision:
    • s=1,exp=11,frac=52(1 implicit bit, p=53)
    • rounding unit : \(\eta = 1/2 b^{1-p} = 2^{-53} \approx 1.1\times 10^{-16}\)
  • Single precision:
    • s=1,exp=8,frac=23
    • rounding unit: \(\eta = 2^{-24}\)
  • exp 0000 and FFFF reserved
    • bias : 7FFF
    • signed zero and subnormals
    • inf and NaNs
  • exact rounding uses guard digits to ensure that relative error in each elementary arithmetic operation is bounded by \(\eta\)
    • guard digits : using one more digits in computation
    • can be as large as b-1 -> bounded by 2*machine precision
  • roundoff error accumulation
    • linear in the most cases, each time about the rounding unit level
    • x+y, when x>>y, x/y, when 1 >> y
    • magnifies by cancellation error(when two almost equal numbers are subtracted from each other),relative error
  • IEEE : exact rounding, as if rounding happens after the computation
  • overflow & underflow
    • \(\sqrt{a^2+b^2} = \sqrt{1^2+(b/a)^2}, |a|>|b|\)

general floating point system \((\beta, t, L ,U)\):

  • \(\beta\): base
  • t: precision
  • L: lower bound
  • U: upper bound
Int Long Single Double
+/- 1 1 4 4
x 3 5 4 4
/ 14~40 23~87 16 20

abelian group: closure, associativity, identity, inverse, commutivity

Solving equations

  • Bisection
  • secant
  • Newton’s

Desired properties

  • efficient
  • robust
  • minimal amount of additional data
  • requires f to satisfy only minimal smoothness properties
  • generalizes easily an naturally to many equations in many unknowns

Fixed point iteration

  • From \(f(x)=0\) to \(x=g(x)\)
  • solve for \(x = g(x)\)
  • \(x_{k+1}=g(x_k)\) converges, the result is one of the solving methods

Bisection

  • Simple & Safe
  • slow
  • hard to generalize to high dimension

Fixed point theorem:Will be at least a crossing when, and use bisection search till close to zero \[f(a)\cdot f(b)<0\]

\[\exists \rho < 1, |g'(x)\leq \rho|,a<x<b\] \[|x_{k+1} - x^*| \leq \rho |x_k - x^*|\]

  • Brouwer fixed point theorem : \([a,b] \rightarrow [a,b]\)
  • Contraction mapping principle : \(f'(x) < 1\)

rate of convergence: \[e_{k+1} \leq \frac{1}{2} e_k, e_k = |x_k - x^*|\]

Rate of Convergence

  • linearly: \(e_{k+1} \leq \rho e_k\)
  • quadratically: \(e_{k+1} \leq \rho e_k^2\)
  • superlinear: linear + \(\exists n \in N, \exists \rho_k \rightarrow 0, \mbox{for } k>n\)
  • \(\frac{e_{k+1}}{e_k} = \mu\), \(\mu \rightarrow 0\) superlinear, \(\mu \rightarrow 1\) sublinear

Newton’s Method

  1. function & initial guess
  2. \(x_{k+1}= x_{k} - \frac{f(x_k)}{f'(x_k)}\)
  3. until \(|x_{k}- x^*|,|x_k - x_{k-1}|,|f(x_k)|\)

Convergence of Newton’s Method:

  • First derivative exists
  • Tylor expansion at the root, subtract \(x^*\) on both side
  • \(e_{k+1} \leq C e_k^2\)
  • Multiple root: linear rate

Secant Method

  • Avoid supplying \(f'\)
  • Secant: line joining two point on a function
  • \(x_{k+1} = x_{k} - \frac{f(x_k)(x_k-x_{k-1})}{f(x_k) - f(x_{k-1})}\)
  • can’t solve \(x^{1/3}\)
  • convergence rate(superlinearly): \(e_{k+1} \leq C e_k^\phi, \phi = \frac{1+\sqrt{5}}{2}\)
  • proof:
    • min \(x^*\) on both side
    • \(e_{n+1} \rightarrow e_n e_{n-1}, e_n \rightarrow e_{n-1}^p\)
    • get the \(\phi\)

Errors

  • Convergence is greater than roundoff errors
  • terminating condition greater than rounding unit
  • well-conditioned problem(opposite ill-conditioned: multiple root(flat near the roor))

Linear algebra refresh & more

Here

Linear least-squares(L2-norm)

\[\min_x||b-Ax||_2\]

where A is \(m\times n, m > n\) and A has full column rank

Equivalent: \[\psi(x) = \frac12 ||b-Ax||^2 = \frac12 \sum_{i=1}^m(b_i - \sum_{j=1}^n a_{ij} x_j)^2\]

Necessary condition: \(\frac{\partial}{\partial x_k} \psi(x) = 0 \rightarrow A^TAx=A^Tb\) This is the normal equation. normal because the relation between \(x,Ax\). From minimization to solving linear equations. Also to note that \(A^TA\) is positive definite.

\(x=(A^TA)^{-1}A^Tb\): The matrix multiplying b is the pseudo-inverse of A.

Least square problem: the least square problem has a unique solution that satisfy the normal equation.

Interpolation vs Data fitting

Data fitting:

  • The long term trend
  • Discard errors
  • For ease of manipulation
  • Take \(n<m\)

Data fitting in other norm: L-1 L-inf makes linear programming problems

solution

  • Use Chelosky decomposition on normal equations
    • fast, intuitive, less robust
  • QR standard, more expansive, more robust
  • SVD significantly more expensive and robust

Normal equation

  • Chelosky decomposition \(B=A^TA=G^TG\)
  • \(Gz=y\)
  • \(G^Tx=z\)

  • \(\kappa(B) = \kappa(A)^2\)

QR for LS

Full-version QR :

  1. Decompose \[A=Q \begin{bmatrix} R \\ 0 \end{bmatrix}\]
  2. Compose \[\begin{bmatrix} c \\ d \end{bmatrix} = Q^T b\]
  3. Solve upper triangular system \(Rx=c\)

Q is orthogonal , R is upper triangular \[||b-Ax|| = ||Q^Tb-\begin{bmatrix} R \\ 0 \end{bmatrix} x||\]

Economy-version SR:

  1. Decompose \[A=Q_{m\times n}R\]
  2. \(c=Q^Tb\)
  3. solve \(Rx=c\)
  • solve nn linear equations
  • solve LS
  • find eigenvalues/vectors

QR for square matrix: \(A=QR\), Q is orthogonal, R is upper triangular

  • uniqueness
  • existance
    • \(A^TA\) is symmetric positive definite can be decomposed into \(LL^T\)
    • \(Q=A(L^T)^{-1}\)
    • \(A=QR=QL^T=QR\)
  • QR for solving linear equations…. condition number: not squared

Compute QR decomp :

  • Construct R bit by bit till Q is orthogonal: Gram-Schmidt
  • Apply a sequence of orthogonal transformation to A till A is upper triangular: Householder-reflector

Orthogonal transformation:\(T: V \rightarrow V\) that preserves inner product \(<u,v>\), combinations of rotations & reflections

inner product

  • \(<\alpha,\beta> = \alpha \beta\)
  • \(<u,v> = u \cdot v = ||u|| \; ||v|| \cos \theta\)

Householder reflector/ transformation:\(Q^TA=R\), col by col to zero out the elements under

  • \(Q_k\) is orthogonal
  • \(Q_k\) zero out elements below diagonal
  • \(Q_k\) preserves previously introduced zeros
  • \[Q_k = \begin{bmatrix} I_{(k-1)\times (k-1)} & 0 \\ 0 & F \end{bmatrix}\]
  • F : householder reflector, \(Fx=||x||e_1\)
  • \(F=I-2uu^T, u = \frac{v}{||v||}\)
  • For numerically stable algorithms you might want to choose the one that moves the furtherest, \(v_-\)
  • \(x=A(k:m:,k), v_k = x+sign(x_1) ||x||e_1\)

Iterative methods

  • fill-in effect
  • don’t need exact solution
  • warm start: “initial guess” to improve performance
  • only \(Ax\) is given

Basic idea

  • given \(Ax=b\) where A is square and invertible
  • find \(B\) and vector \(c\) such that \(I-B\) is invertible
  • The solution is the same as \(x=Bx+c\)
  • Solve it as fixed point methods
    • stationary \(B,c\) doesn’t depends on \(x\)
  • under certain conditon it will converges to the solution
    • \[\lim_{x \rightarrow \inf} x_k=x, \forall x\]
    • spectral radius : \(\rho(B) = \max\{ \lambda \} < 1\)

Splitting of A:

  • \(A=M-N\)
  • A is invertible
  • M is invertible and easy to invert
  • \(x=M^{-1}Nx+M^{-1}b\)
  • \(B=M^{-1}N ; C=M^{-1}b\)

Jacobi method/Gaussian-seidel

M,N disjoint

  1. A = D - E - F
    • D is the diagonal
    • (-E) the lower sub triangular
    • (-F) the upper sub triangular
  2. Jacobi: M=D, N=E+F
    • \(B=M^{-1}N=I - D^{-1} A\)
    • \(c = M^{-1} b = c - D^{-1}b\)
  3. GS: M=D-E, N=F
    • \(B=M^{-1}N=(D-E)^{-1}F\)
    • \(c=M^{-1}b=(D-E)^{-1}b\)
    • A specific version of the relaxation method where coefficient is 1: \(L_1\)

Under certain cases : GS converges faster than Jacobi

Relaxation medthod

Successive over-relaxation

Weighted version of GS methods

  • \(M = \frac{D}{w} -E, N=\frac{1-w}{w}D+F\)
  • \(L_w = B=M^{-1}N = (D-wE)^{-1}[(1-w)D + w F]\)
  • \(c=M^{-1}b = w(D-wE)^{-1}b\)
  • \(w \in (1,2)\) better
  • \(w \in (0,1)\) not so useful for speedup of convergence

Convergence

  • \(w \in (0,2) \rightarrow \rho[L_w] < 1\)
  • \(\exists w_{opt} \rightarrow \min\{ \rho \}\)

  • \(e_k = B \cdot e_{k-1} = B^k e_0\)
  • \(||e_k|| = ||B^k|| e_0\)
  • \(||B|| < 1 \rightarrow \rho(B) < 1\)1
  • rate of convergence: \(-\log_{10} \rho(B)\)

Ostrouski-Reich Theorem: If \(A =D - E - F\) and is Hermitian positive definite. If \(0<w<2\) then the relaxation method converges

Termination Criterion

\[\frac{||x-x_k||}{||x||} \leq \kappa(A) \frac{||r_k||}{||b||}\] -> tolerance

\[||x-x_k|| \leq ||B|| ||x-x_{k-1}|| \leq ||B|| (||x-x_k||+||x_k-x_{k-1}||)\] \[||e_k|| \leq \frac{||B||}{||I-B||}||x_k-x_{k-1}||\]

  • Both GS and Jacobi method converges
  • GS method converges faster than jacobi method(twice as fast,under certain conditions: tridiagonal matrix)
    • \(\rho(L_1) = \rho^2(J)\)

if A is tridiagonal and symmetric

  • they all converges
  • \(\exists w_0 = \frac{2}{1+\sqrt{1-\rho^2(J)}} \rightarrow \rho(L_{w_0}) = w_0 -1 \mbox{ is minimum}\)
  • If \(\rho(J)>0 \rightarrow \rho(L_{w_0}) < \rho(L_1) = \rho^2(J) < \rho(J)\)
  • if \(\rho(J) = 0, w_0 = 1 , \rho(L_1)=0\)

Conjugate Gradient Method

  • A is symmetric positive definite
  • solving Ax=b
  • Solution is the minimizer of \(f(x)= \frac12 x^TAx - b^Tx+C\)
  • And iteration has the form \(x_{k+1} = x_k + \alpha_k d_k\)
  • \(f'(x) = \frac12 A^Tx + \frac12 Ax -b\)
  • solution to \(Ax=b\) is a critical point of \(f(x)\).

Method of Steepest Descent(SD)

  • Move in the direction that the function decrease most rapidly \(-f'(x_k)\)
  • error: \(e_k = x_k -x\)
  • residual: \(r_k = b-Ax_k = -A e_k\)
  • claim: \(r_k = -f'(x_k)\)
  • till gradient is orthogonal to the direction
  1. \(r_k = b - A x_k\)
  2. \(\alpha_k = \frac{r^T_kr_k}{r^TkAr_k}\)
  3. \(d^T(b-A(x+ad)) = 0 = d^Td - ad^TAd\)
  4. \(x_{k+1} = x_k + \alpha_k r_k\)

Convergence bahavior:

  • \(\kappa(A) = ||A||||A^{-1}||\)
  • For SPD \(\kappa(A)= \lambda_{max} / \lambda_{min}\)
  • \(||e_i|| \leq (\frac{\kappa - 1 }{\kappa +1})^i||e_o||_A\)
  • \(\frac{f_i -f}{f_0 -f} \leq (\frac{\kappa - 1}{\kappa +1})^{2i})\)
  • energy norm : \(||e||_A = \sqrt{e^T A e}\)

Orthogonal Direction

Every step’s direction is orthogonal to that of before.

Conjugate Direction(CD)

  • Each time move in one dimension till it reaches that required to reach the final solution
  • condition: \(d_i^Te_{i+1}=0 = d_i^T(e_i + \alpha_i d_i)\)
  • \(\alpha_i = - \frac{d_iT^ e_i}{d_i^Td_i}\)
  • This can be achieved by making the direction not orthogonal but A-orthogonal(conjugate) to each other.
    • \(d_i^TAd_j = 0\)

Every time the step is till the direction is conjugate to that of the residue, converges in n steps \[\alpha_i = \frac{d_i^T r_i}{d_i^T A d_i}\]

Gram-Schmidt conjugation: given n linearly independent vectors produce A-orthogonal vectors \(d_0,d_1, \dots, d_{n-1}\)

  • \(d_0 = u_0\)
  • \(d_i = u_i + \sum_{k=0}^{i-1} \beta_{ik} d_k, \beta_ij = - \frac{u_i A d_j}{d_j^TAd_j}\)

Conjugate Gradient(CD)

A kind of conjugate direction method, Search directions are constructed by conjugate of residuals. Page 184.

\(d_0 = r_0\) & \(e_1 \perp_A d_0\)

  • \(D_i = span\{ d_0,d_1, \dots, d_{i-1} \}\)
  • \(e_i = e_0 + D_i \rightarrow \min (||e_i||_A)\)

Convergence bahavior: \[||e_i||_A \leq 2 (\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1})^i ||e_0||_A\]

Eigenvalues and eigenvectors

rotation matrix have complex eigenvalues(recommend to remove it before proceed:\(AA^T\)

  • Power method: \(x'_i=Ax_{i-1},x_i = \frac{x'_i}{x'_i}\) this can be used to find the largest eigenvalue and its vector

Inverse matrix have the same eigenvectors but \(\lambda_i \rightarrow 1/\lambda_i\).

  • Inverse iteration: \(B=A-\alpha I \rightarrow ( \frac1{\lambda_i - \alpha}, x_i), x'_i = ( A-\alpha I)x_{i-1}\), alpha close to spectral radius

Rayleigh quotient: \(\lambda = v^TAv\) always push towards \(lambda_1\).

  • Rayleigh quotient: solve \(A-\alpha I,v_k\), estimate alpha, using the above method until changes no more

SVD

Zero out smaller eigenvalues in \(\Sigma\) gives better conditioned problem & Uses less storage

pseudo-inverse \(A^+\), Regularization, truncated SVD

  • Orthogonal trasformation: doesn’t change eigenvalues \(Q^TAQ\)
  • For QR method if R is upper triangular, it’s diagonal are its eigenvalues.

Multigrid Method

Motivation: Vertical percolation , “liquid go through”

\(p* = 0.618\) fixed point for the probability that water goes through(the pattern occuring), coarsening function. \(R(p) = 2p^2(1-p)^2 + 4p^3(1-p)+p^4\)

This number makes “coarsening invariance”, that coarsening the grid will not change the probability that it percolate

Finite-difference discretization: \(-u" = f\), coarsening step \(E'_i = R(E_i) = E_{i-1} + 2 E_i + E_{i+1}\), can reduce it to smaller systems that gives the same answer -> “coarsening invariant”

Essential Aspect: A problem can contain many components with diff scales, use of many scales of discretization to represents all components of this solution.

V-cycle

This can have multiple level, call v-cycle recursively

  1. Pre-smoothing: Reduce the high frequency components of the error \(e=x*-x\) and residual \(r=b-A_1x\) by smoothing \(x:=S_1(x,b)\)
    • Presmoothing: relaxed richardson, relaxed jacobi, relaxed Gaussian-seidel
  2. Restriction: Now the error/residual can be well approximated on the coarse grid as \(r_0 = Rr\), by using the restriction operator \(R: V_1' \rightarrow V_0'\).
    • Sampling, avg, 121-weighted, fraction of neighboring grid to center
  3. Solve the residual equation \(A_0e=r_0\) on the coarse grid
  4. Prolongation: Interpolate the error e onto the fine grid, by using prolongation operator \(P:U_0 \rightarrow U_1\)
    • Fraction of node contribute to neighboring node
  5. Improve the current solution \(x:=x + Pe\)
  6. Post-smoothing: Smooth again:\(x:= S_2(x,b)\)

W-cycle

level sequence : 321010121010123

use w-cycle when coarse grid correction is not accurate

Nonlinear Systems and Optimization

Multivariate Taylor expansion(function result is a vector): \(f(x+p) = f(x) + J(x) p + O(||p||^2)\)

Jacobian matrix: \[J(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \cdots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \\ \end{bmatrix}\]

Newton’s method for nonlinear systems

  1. for k = 1…n
  2. solve \(J(x_k) p = - f(x_k)\)
  3. \(x_{k+1} = x_k + p\)
  • jacobi eval might be non-trivial
  • jacobi can be singular
  • random start to find all solution

Unconstrained optimization

For scalar function f(x), on n dimensions

gradient \(\nabla \phi(x)\) and Hessian matrix \(\nabla^2 \phi(x)\)

Taylor expansion in several variables(function value is scalar):

\[\phi(x+p) = \phi(x) + \nabla p + \frac12 p^T H p + O(||p||^3)\]

Critical points where gradient is zero:

  • local minimum(maximum)
  • saddle point: those are stationary point but not a local extreme
  • Hessian matrix is positive(negative) semi-definite
  • sufficient condition is when Hessian is positive(negative) definite

Gradient based methods

\(x_{k+1} = x_k + \alpha_k p_k\) \(p_k = - B_k^{-1} \nabla \phi(x_k)\)

  • Newton’s method: \(B=\nabla^2 \phi(x_k)\)
    • converges quadratically, retains Hessian sparsity
    • requires hessian and the evaluation of it.
    • salving ls at each iteration
    • no control over convergence
    • weak line search sometime used to find the best alpha
  • Secant method (quasi-newton): \(B_{k+1} p_k = y_k, y_k = \nabla \phi (x_{k+1}) - \nabla \phi (x_k)\)
    • B is approximation of Hessian
  • Gradient descent
    • \(B=I\)
  • Newton + Gradient descent
    • \(B_k = \nabla^2 \phi(x_k) + \mu_k I\)
  • BFGS method
    • directly compute \(B_k^{-1}\)
    • Rank-2 update
    • superlinear convergence
  • Inexact Newton, Newton CG
    • Preconditioned Conjugate Gradient (\(P^{-1} \approx A^{-1}\)

Nonlinear least square

\[\min_x \phi(x) = \frac12 ||g(x) - b||^2\]

\[\nabla \phi(x) = H(x)^T(g(x) - b)\]

  • Gauss-Newton method
    • \(\nabla^2 \phi(x) = A(x)^T A(x) + L(x)\)
    • \(L_{i,j} = \sum \frac{d^2 g_l}{d x_i x_j}(g_l-b_l)\)
    • drop L and \(A^TAp_k = - \nabla \phi(x_k)\)
    • \(x_{k+1} = x_k + p_k\)
    • Faster when L is small

Constrained problem

\[\Omega = \{ x \in R^n | c_i(x)=0, i \in \epsilon, c_i(x) \geq 0, i \in I \}\]

  • Equality constraint
    • reducing space, algebraic, domain \(\Omega\) has empty interior
  • inequality constraint
    • combinatorial, \(\Omega\) can have non empty interior
  • Active set
    • \(A(x) = \epsilon \cup \{i \in i | c_i(x)=0\}\)

At optimal \(x^*, \nabla \phi (x^*) \propto \nabla c(x^*)\), the gradient is aligned, where the constant of proportionality \(\lambda\) cannot be negative(depends on whether looking for the minimum)

  • Constraint quantification: let \(A^T_x\) be the matrix whose columns are the gradients of all active constraints.
    • It must have full column rank
  • Quadratic programming with equality constraint
    • \(\min \phi (x) = \frac12 x^THx - d^Tx\) s.t. \(Ax=b\)
    • A has full row rank
    • Lagrangian: \(L(x,\lambda) = \phi(x) - \sum \lambda_i c_i(x)\)
    • KKT condition:
      • \(\nabla_x L(x^*,\lambda^*) = 0\)
      • \(c_i(x^*) = 0 , i \in \epsilon\)
      • \(c_i(x^*) \geq 0\)
      • \(\lambda_i \geq 0\)
      • \(\lambda_i^* c_i(x^*) = 0\)
      • \(\begin{bmatrix} H & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} d \\ b \end{bmatrix}\)
    • Adding extra terms that the result of the new system is an approximation of old
  • Linear programing & its dual

Polynomial interpolation

General approximation and interpolation

  • Data fitting: resonable function that fits the data
    • sometime might make sense to interpolate it(if there is no noise & want to be precise)
  • Approximating function

  • Interpolation : have some freedom to choose \(x_i\) cleverly(use the most expressive data points before sampling)
  • may be able to consider the global interpolation error
  • prediction & manipulation

  • interpolating Data
  • interpolating fuction
  • linear combination of basis functions
    • polynomial -> polynomial interpolation
    • \(A\lambda = y\) A - matrix of basis function, \(\lambda\) coefficients, \(y\) observed value
    • constructing and evaluating
  • Basis function(polynomial interpolation):
    • Monomial basis : gives a Vandermonde matrix (non-singular) simple
    • Lagrange basis \(\phi_j = \Pi_{i,i\neq j} \frac{x-x_i}{x_j - x_i}\) most stable
      • \(L_{ii} = 1, L_{ij}=0\)
      • A is Identity
    • Newton Polynomial Basis \(\phi_j = \Pi_{i,i\leq j-1} (x-x_i)\) adaptive
  • interpolation have to pass through all the data point (different from regression)
  • there is always a unique n-polynomial solution

Newton’s polynomial basis: divided difference form, which factor depends on which

\[f[x_i]=f(x_i)\] \[f[x_i,...,x_j]=\frac{f[x_{i+1}, ...,x_j] - f[x_i,...,x_{j-1}]}{x_j -x_i}\] \[f[z_0,...,z_k] = \frac{f^{(k)}(\eta)}{k!}\]

  • Existence of an adaptive interpolant

Error estimate:\(\psi\) is newton basis, mean value theorem

\[f(n) - p_n(x) = \frac{f^{(n+1)}(\eta)}{(n+1)!} \psi_n (x)\]

Chebyshev interpolation

Choose the data abscissae so as to minimize the quantity \(\max |\psi|\)

Chebyshev points on [-1,1] : \[x_i = cos(\frac{2i+1}{2(n+1)} \pi), i=0,...,n\]

Interpolation error using Chebyshev points: [-1,1] \(\beta = 2^{-n}\)

Interpolating also derivative values

Unique osculating polynomial:

\[p^{(k)}_n(t_i) = f^{(k)}(t_i)\]

Hermite cubic

\((t_0,f(t_0)),(t_0,f'(t_0)),(t_0,f''(t_0)),...\) solve for coefficients(taylor expansion)

Newton interpolating form

\[p_n(x) = \sum_j f[x_0,...,x_j] \prod_i (x-x_i)\] \[ f[x_k,...,x_j]=\begin{cases} \frac{f[x_{k+1},...,x_j]-f[x_k,...,x_{j-1}]}{x_j-x_k}, & x_k \neq x_j, \\ \frac{f^{(j-k)}(x_k)}{(j-k)!}, & x_k = x_j, \end{cases}\]

Piecewise polynomial interpolation

  • Polynomial with really high degree
  • new points introduced will cause global recomputation
  • parametric curve
  • surface interpolation

divide interval into subintervals with interpolant on each subinterval that satisfy interpolation condition and global smoothness, \(h=\max(t_i - t_{i-1})\) smaller h gives better interpolation

  • how the global smoothness is achieved

  • Piecewise linear
  • Piecewise constant
  • error is maximized in the middle point
  • \(\frac18 (\frac{t_i-t_{i-1}}{2})^2 f"(\xi)\)

  • Piecewise Cubic
  • in favor to quadratic for smoothness condition

  • Piecewise cubic hermite interpolation
  • where also values of the first derivative are given

Boundary condition

  • Free boundary, natural spline
  • clamped boundary, complete spline
  • Not-a-knot: third derivative continuous.

Parametric interpolation

Set up an intermediate variable and interpolate on both dimensions

Bezier polynomial

Multidimensional interpolation

\[V= \sum c_i \phi_i\]

bilinear interpolation

radial basis function

Approximation

  • approximation error: integral over the domain
  • function norm, \(L1,L2,L_\inf\)
  • Best least-square approximation(l2-norm)
    • \(v=\sum_j c_j \phi_j\)
    • \(\sum_j c_j \int \phi_j \phi_k = \int f \phi_k\)
    • \(Bx=b\), B is symmetric positive definite
    • \(B_{j.k} = \int \phi_j \phi_k\)
    • \(b_j = \int f \phi_j\)
    • Bad B matrix: monomial basis gives Hilbert matrix
    • Good B matix: orthogonal basis function \(0 , j \neq k, 1 otherwise\)
    • scale and translate: \(t=\frac12[(b-a)x + (a+b)]\) or \(x = \frac{2t-a-b}{b-a}\)
    • Legendre polynomials: Gram-Schmidt when \(w =1\)
      • \(\phi_0(x)=1\)
      • \(\phi_1(x)=x\)
      • \(\phi_{j+1}(x) = \frac{2j+1}{j+1}x\phi_j(x) - \frac{j}{j+1}\phi_{j-1}(x)\)
      • diag: \(\frac{2}{2j+1}\)
      • zeroes: gauss point
    • trigonometric polynomial
      • \(\phi_0(x) = \frac1{\sqrt{2\pi}}, \phi_{2i-1}(x)=\frac1{\sqrt{\pi}}\sin(ix),\phi_{2i}(x)=\frac1{\sqrt{\pi}}\cos(ix)\)
      • diag: 1 orthonormal
  • discrete least square algor: \(B=A^TA, y=A^Tb\)

  • Weighted least-square problem
    • Gram-Schmidt process
      • \(\phi_0(x)=1\)
      • \(\phi_1(x)=x-\beta_1\)
      • \(\phi_j(x)=(x-\beta_j)\phi_{j-1}(x) - \gamma_j \phi{j-2}(x)\)
      • \(\beta_j=\frac{\int_a^bxw(x)[\phi_{j-1}(x)]^2dx}{\int_a^bw(x)[\phi_{j-1}(x)]^2dx}\)
      • \(\gamma_j=\frac{\int_a^bxw(x)\phi_{j-1}(x)\phi_{j-2}(x)dx}{\int_a^bw(x)[\phi_{j-2}(x)]^2dx}\)
    • chebyshev polynomial: \(\frac1{\sqrt{1-x^2}}\)
    • Laguerre polynomial: \(w=e^{-x},(0,\inf)\)
    • Hermite Polynomial: \(w=e^{-x^2},(-\inf,\inf)\)

Chebyshev min-max property

  • \(\{x\}\) chebyshev points
  • \(T_0(x)=0\)
  • \(t_n= \Pi_k(x-x_k)\)
  • Monic polynomial, have smallest maximum magnitude, over interval \(max=\frac1{2^{n-1}}\)

Numerical differentiation

Rely on polynomial approximation:

if \(P_n(x) \approx f(x)\), \(P'_n(x) \approx f'(x)\).

Tylor series theorem: approximate \(f(x+h)\) using \(f(x)\) and \(f'(x)\)

reversed tylor series to approximate derivative

2 points(\(h\)),3 points(\(h^2\)),5 points(\(h^4\)): centered,non-centered

Richardson extrapolation: generating higher order numerical methods from lower order ones, $h, 2h $ higher order. remove the leading error term. \(h \rightarrow 2h\)

  • not compact
  • reliance on the existence of higher and higher nicely bounded derivatives of f
  • enables easy production of higher order method from lower order ones

Interpolate & differentiation

Numerical integration

Interpolate & integrate, summation of the integrated basis function times coefficients.

  • basic rule
    • trapezoidal: better than simpson when almost periodic (both end close to zero)
    • midpoint
    • Simpson(quadratic)
    • Newton-Cotes formulas
    • Works good when the step is small
  • Composite methods
    • use basic rule on subintervals
    • piecewise interpolation & integration
    • error is the sum of that on each intervals
    • \(rh=b-a\)
  • Precision exact for order p
  • Accuracy leading error term \(q=p+1\)
  • Orthogonal basis function

  • Gaussian quadrature formulas
  • Gauss points:
    • zeroes of the Legendre polynomials \(\phi_{n+1}(x)\)
    • n+1 achieve precision of 2n+1
    • Weights : \(a_j = \frac{2(1-x_j^2)}{[(n+1)!\phi_n(x_j)]^2}\)
    • error: \(\frac{2^{2n+3}((n+1)!)^4}{(2n+3)((2n+2)!)^2}f^{(2n+2)}(\eta)\)
  • Adaptive quadrature
    • Adaptive subdivision: adaptive process, locally refined
    • if the derivative doesn’t varies violently,
    • error can be approximated by difference in estimate

  1. this is not equivalent in showing that \(||B||\) is bound by \(\rho(B)\)