Advanced Scientific Computing

Guest Lectures
Numerical algorithms and errors
Roundoff Errors
Solving equations
Linear algebra refresh & more
Linear least-squares(L2-norm)
- Interpolation vs Data fitting
- solution
  - Normal equation
  - QR for LS
Iterative methods
Eigenvalues and eigenvectors
- SVD
Multigrid Method
- V-cycle
- W-cycle
Nonlinear Systems and Optimization
Polynomial interpolation
Piecewise polynomial interpolation
Approximation
Numerical differentiation
Numerical integration

Guest Lectures

Here

Numerical algorithms and errors

Source and types of errors

Errors in the problem
- mathematical model
- input data
Approximation errors
- Discretization
- Convergence
- Roundoff

Example : Trade-off between discretization & roundoff for compute derivative.

Algorithm Properties

Accuracy
- contrast with precision
Efficiency: hard to measure
- algo, code
- cpu, memory, disk, network
- performance vs complexity
Robustness

Problem conditioning and stability

ill-conditioned: small purturbation -> large difference in result
stability:
well posed: exists solution, unique, continuous change with initial

Condition number

how output change for a small change in the input. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input.

for a nonlinear function(continuous): $\kappa(f) = \frac{x f'}{f}$
As a rule of thumb, if the condition number $\kappa(A) = 10^k$, then you may lose up to k digits of accuracy on top of what would be lost to the numerical method due to loss of precision from arithmetic methods.

Roundoff Errors

inevitable
every elementary operation (+,*) introduces a small, random relative error, up to about $10^{-16}$
sometime can be ignored but may cause error

\[fl(x) = \begin{cases} 1.d_1d_2 \cdots d_t \times 2^{e} & d_{t+1}=0 \\ \mbox{nearest even} & otherwise \end{cases}\] relative floating point error(rounding, truncate will double the error): \[\frac{|fl(x) - x|}{|x|} \leq \frac12 2^{-t}\]

rounding methods
- truncate
- round up
- round down
- round to the nearest value, ties away from zero
- round to the nearest value, ties to even Most Used
Floating point number representation: number of significant digits
spacing not constant(absolute), constant(relative)
at 0, subnormal numbers
Quadruple: 14+113
Double precision:
- s=1,exp=11,frac=52(1 implicit bit, p=53)
- rounding unit : $\eta = 1/2 b^{1-p} = 2^{-53} \approx 1.1\times 10^{-16}$
Single precision:
- s=1,exp=8,frac=23
- rounding unit: $\eta = 2^{-24}$
exp 0000 and FFFF reserved
- bias : 7FFF
- signed zero and subnormals
- inf and NaNs
exact rounding uses guard digits to ensure that relative error in each elementary arithmetic operation is bounded by $\eta$
- guard digits : using one more digits in computation
- can be as large as b-1 -> bounded by 2*machine precision
roundoff error accumulation
- linear in the most cases, each time about the rounding unit level
- x+y, when x>>y, x/y, when 1 >> y
- magnifies by cancellation error(when two almost equal numbers are subtracted from each other),relative error
IEEE : exact rounding, as if rounding happens after the computation
overflow & underflow
- $\sqrt{a^2+b^2} = \sqrt{1^2+(b/a)^2}, |a|>|b|$

general floating point system $(\beta, t, L ,U)$:

$\beta$: base
t: precision
L: lower bound
U: upper bound

	Int	Long	Single	Double
+/-	1	1	4	4
x	3	5	4	4
/	14~40	23~87	16	20

abelian group: closure, associativity, identity, inverse, commutivity

Solving equations

Bisection
secant
Newton’s

Desired properties

efficient
robust
minimal amount of additional data
requires f to satisfy only minimal smoothness properties
generalizes easily an naturally to many equations in many unknowns

Fixed point iteration

From $f(x)=0$ to $x=g(x)$
solve for $x = g(x)$
$x_{k+1}=g(x_k)$ converges, the result is one of the solving methods

Bisection

Simple & Safe
slow
hard to generalize to high dimension

Fixed point theorem:Will be at least a crossing when, and use bisection search till close to zero \[f(a)\cdot f(b)<0\]

\[\exists \rho < 1, |g'(x)\leq \rho|,a<x<b\] \[|x_{k+1} - x^*| \leq \rho |x_k - x^*|\]

Brouwer fixed point theorem : $[a,b] \rightarrow [a,b]$
Contraction mapping principle : $f'(x) < 1$

rate of convergence: \[e_{k+1} \leq \frac{1}{2} e_k, e_k = |x_k - x^*|\]

Rate of Convergence

linearly: $e_{k+1} \leq \rho e_k$
quadratically: $e_{k+1} \leq \rho e_k^2$
superlinear: linear + $\exists n \in N, \exists \rho_k \rightarrow 0, \mbox{for } k>n$
$\frac{e_{k+1}}{e_k} = \mu$, $\mu \rightarrow 0$ superlinear, $\mu \rightarrow 1$ sublinear

Newton’s Method

function & initial guess
$x_{k+1}= x_{k} - \frac{f(x_k)}{f'(x_k)}$
until $|x_{k}- x^*|,|x_k - x_{k-1}|,|f(x_k)|$

Convergence of Newton’s Method:

First derivative exists
Tylor expansion at the root, subtract $x^*$ on both side
$e_{k+1} \leq C e_k^2$
Multiple root: linear rate

Secant Method

Avoid supplying $f'$
Secant: line joining two point on a function
$x_{k+1} = x_{k} - \frac{f(x_k)(x_k-x_{k-1})}{f(x_k) - f(x_{k-1})}$
can’t solve $x^{1/3}$
convergence rate(superlinearly): $e_{k+1} \leq C e_k^\phi, \phi = \frac{1+\sqrt{5}}{2}$
proof:
- min $x^*$ on both side
- $e_{n+1} \rightarrow e_n e_{n-1}, e_n \rightarrow e_{n-1}^p$
- get the $\phi$

Errors

Convergence is greater than roundoff errors
terminating condition greater than rounding unit
well-conditioned problem(opposite ill-conditioned: multiple root(flat near the roor))

Linear algebra refresh & more

Here

Linear least-squares(L2-norm)

\[\min_x||b-Ax||_2\]

where A is $m\times n, m > n$ and A has full column rank

Equivalent: \[\psi(x) = \frac12 ||b-Ax||^2 = \frac12 \sum_{i=1}^m(b_i - \sum_{j=1}^n a_{ij} x_j)^2\]

Necessary condition: $\frac{\partial}{\partial x_k} \psi(x) = 0 \rightarrow A^TAx=A^Tb$ This is the normal equation. normal because the relation between $x,Ax$. From minimization to solving linear equations. Also to note that $A^TA$ is positive definite.

$x=(A^TA)^{-1}A^Tb$: The matrix multiplying b is the pseudo-inverse of A.

Least square problem: the least square problem has a unique solution that satisfy the normal equation.

Interpolation vs Data fitting

Data fitting:

The long term trend
Discard errors
For ease of manipulation
Take $n<m$

Data fitting in other norm: L-1 L-inf makes linear programming problems

solution

Use Chelosky decomposition on normal equations
- fast, intuitive, less robust
QR standard, more expansive, more robust
SVD significantly more expensive and robust

Normal equation

Chelosky decomposition $B=A^TA=G^TG$
$Gz=y$
$G^Tx=z$
$\kappa(B) = \kappa(A)^2$

QR for LS

Full-version QR :

Decompose \[A=Q \begin{bmatrix} R \\ 0 \end{bmatrix}\]
Compose \[\begin{bmatrix} c \\ d \end{bmatrix} = Q^T b\]
Solve upper triangular system $Rx=c$

Q is orthogonal , R is upper triangular \[||b-Ax|| = ||Q^Tb-\begin{bmatrix} R \\ 0 \end{bmatrix} x||\]

Economy-version SR:

Decompose \[A=Q_{m\times n}R\]
$c=Q^Tb$
solve $Rx=c$

solve nn linear equations
solve LS
find eigenvalues/vectors

QR for square matrix: $A=QR$, Q is orthogonal, R is upper triangular

uniqueness
existance
- $A^TA$ is symmetric positive definite can be decomposed into $LL^T$
- $Q=A(L^T)^{-1}$
- $A=QR=QL^T=QR$
QR for solving linear equations…. condition number: not squared

Compute QR decomp :

Construct R bit by bit till Q is orthogonal: Gram-Schmidt
Apply a sequence of orthogonal transformation to A till A is upper triangular: Householder-reflector

Orthogonal transformation:$T: V \rightarrow V$ that preserves inner product $<u,v>$, combinations of rotations & reflections

inner product

$<\alpha,\beta> = \alpha \beta$
$<u,v> = u \cdot v = ||u|| \; ||v|| \cos \theta$

Householder reflector/ transformation:$Q^TA=R$, col by col to zero out the elements under

$Q_k$ is orthogonal
$Q_k$ zero out elements below diagonal
$Q_k$ preserves previously introduced zeros
\[Q_k = \begin{bmatrix} I_{(k-1)\times (k-1)} & 0 \\ 0 & F \end{bmatrix}\]
F : householder reflector, $Fx=||x||e_1$
$F=I-2uu^T, u = \frac{v}{||v||}$
For numerically stable algorithms you might want to choose the one that moves the furtherest, $v_-$
$x=A(k:m:,k), v_k = x+sign(x_1) ||x||e_1$

Iterative methods

fill-in effect
don’t need exact solution
warm start: “initial guess” to improve performance
only $Ax$ is given

Basic idea

given $Ax=b$ where A is square and invertible
find $B$ and vector $c$ such that $I-B$ is invertible
The solution is the same as $x=Bx+c$
Solve it as fixed point methods
- stationary $B,c$ doesn’t depends on $x$
under certain conditon it will converges to the solution
- \[\lim_{x \rightarrow \inf} x_k=x, \forall x\]
- spectral radius : $\rho(B) = \max\{ \lambda \} < 1$

Splitting of A:

$A=M-N$
A is invertible
M is invertible and easy to invert
$x=M^{-1}Nx+M^{-1}b$
$B=M^{-1}N ; C=M^{-1}b$

Jacobi method/Gaussian-seidel

M,N disjoint

A = D - E - F
- D is the diagonal
- (-E) the lower sub triangular
- (-F) the upper sub triangular
Jacobi: M=D, N=E+F
- $B=M^{-1}N=I - D^{-1} A$
- $c = M^{-1} b = c - D^{-1}b$
GS: M=D-E, N=F
- $B=M^{-1}N=(D-E)^{-1}F$
- $c=M^{-1}b=(D-E)^{-1}b$
- A specific version of the relaxation method where coefficient is 1: $L_1$

Under certain cases : GS converges faster than Jacobi

Relaxation medthod

Successive over-relaxation

Weighted version of GS methods

$M = \frac{D}{w} -E, N=\frac{1-w}{w}D+F$
$L_w = B=M^{-1}N = (D-wE)^{-1}[(1-w)D + w F]$
$c=M^{-1}b = w(D-wE)^{-1}b$
$w \in (1,2)$ better
$w \in (0,1)$ not so useful for speedup of convergence

Convergence

$w \in (0,2) \rightarrow \rho[L_w] < 1$
$\exists w_{opt} \rightarrow \min\{ \rho \}$
$e_k = B \cdot e_{k-1} = B^k e_0$
$||e_k|| = ||B^k|| e_0$
$||B|| < 1 \rightarrow \rho(B) < 1$¹
rate of convergence: $-\log_{10} \rho(B)$

Ostrouski-Reich Theorem: If $A =D - E - F$ and is Hermitian positive definite. If $0<w<2$ then the relaxation method converges

Termination Criterion

\[\frac{||x-x_k||}{||x||} \leq \kappa(A) \frac{||r_k||}{||b||}\] -> tolerance

\[||x-x_k|| \leq ||B|| ||x-x_{k-1}|| \leq ||B|| (||x-x_k||+||x_k-x_{k-1}||)\] \[||e_k|| \leq \frac{||B||}{||I-B||}||x_k-x_{k-1}||\]

Both GS and Jacobi method converges
GS method converges faster than jacobi method(twice as fast,under certain conditions: tridiagonal matrix)
- $\rho(L_1) = \rho^2(J)$

if A is tridiagonal and symmetric

they all converges
$\exists w_0 = \frac{2}{1+\sqrt{1-\rho^2(J)}} \rightarrow \rho(L_{w_0}) = w_0 -1 \mbox{ is minimum}$
If $\rho(J)>0 \rightarrow \rho(L_{w_0}) < \rho(L_1) = \rho^2(J) < \rho(J)$
if $\rho(J) = 0, w_0 = 1 , \rho(L_1)=0$

Conjugate Gradient Method

A is symmetric positive definite
solving Ax=b
Solution is the minimizer of $f(x)= \frac12 x^TAx - b^Tx+C$
And iteration has the form $x_{k+1} = x_k + \alpha_k d_k$
$f'(x) = \frac12 A^Tx + \frac12 Ax -b$
solution to $Ax=b$ is a critical point of $f(x)$.

Method of Steepest Descent(SD)

Move in the direction that the function decrease most rapidly $-f'(x_k)$
error: $e_k = x_k -x$
residual: $r_k = b-Ax_k = -A e_k$
claim: $r_k = -f'(x_k)$
till gradient is orthogonal to the direction

$r_k = b - A x_k$
$\alpha_k = \frac{r^T_kr_k}{r^TkAr_k}$
$d^T(b-A(x+ad)) = 0 = d^Td - ad^TAd$
$x_{k+1} = x_k + \alpha_k r_k$

Convergence bahavior:

$\kappa(A) = ||A||||A^{-1}||$
For SPD $\kappa(A)= \lambda_{max} / \lambda_{min}$
$||e_i|| \leq (\frac{\kappa - 1 }{\kappa +1})^i||e_o||_A$
$\frac{f_i -f}{f_0 -f} \leq (\frac{\kappa - 1}{\kappa +1})^{2i})$
energy norm : $||e||_A = \sqrt{e^T A e}$

Orthogonal Direction

Every step’s direction is orthogonal to that of before.

Conjugate Direction(CD)

Each time move in one dimension till it reaches that required to reach the final solution
condition: $d_i^Te_{i+1}=0 = d_i^T(e_i + \alpha_i d_i)$
$\alpha_i = - \frac{d_iT^ e_i}{d_i^Td_i}$
This can be achieved by making the direction not orthogonal but A-orthogonal(conjugate) to each other.
- $d_i^TAd_j = 0$

Every time the step is till the direction is conjugate to that of the residue, converges in n steps \[\alpha_i = \frac{d_i^T r_i}{d_i^T A d_i}\]

Gram-Schmidt conjugation: given n linearly independent vectors produce A-orthogonal vectors $d_0,d_1, \dots, d_{n-1}$

$d_0 = u_0$
$d_i = u_i + \sum_{k=0}^{i-1} \beta_{ik} d_k, \beta_ij = - \frac{u_i A d_j}{d_j^TAd_j}$

Conjugate Gradient(CD)

A kind of conjugate direction method, Search directions are constructed by conjugate of residuals. Page 184.

$d_0 = r_0$ & $e_1 \perp_A d_0$

$D_i = span\{ d_0,d_1, \dots, d_{i-1} \}$
$e_i = e_0 + D_i \rightarrow \min (||e_i||_A)$

Convergence bahavior: \[||e_i||_A \leq 2 (\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1})^i ||e_0||_A\]

Eigenvalues and eigenvectors

rotation matrix have complex eigenvalues(recommend to remove it before proceed:$AA^T$

Power method: $x'_i=Ax_{i-1},x_i = \frac{x'_i}{x'_i}$ this can be used to find the largest eigenvalue and its vector

Inverse matrix have the same eigenvectors but $\lambda_i \rightarrow 1/\lambda_i$.

Inverse iteration: $B=A-\alpha I \rightarrow ( \frac1{\lambda_i - \alpha}, x_i), x'_i = ( A-\alpha I)x_{i-1}$, alpha close to spectral radius

Rayleigh quotient: $\lambda = v^TAv$ always push towards $lambda_1$.

Rayleigh quotient: solve $A-\alpha I,v_k$, estimate alpha, using the above method until changes no more

SVD

Zero out smaller eigenvalues in $\Sigma$ gives better conditioned problem & Uses less storage

pseudo-inverse $A^+$, Regularization, truncated SVD

Orthogonal trasformation: doesn’t change eigenvalues $Q^TAQ$
For QR method if R is upper triangular, it’s diagonal are its eigenvalues.

Multigrid Method

Motivation: Vertical percolation , “liquid go through”

$p* = 0.618$ fixed point for the probability that water goes through(the pattern occuring), coarsening function. $R(p) = 2p^2(1-p)^2 + 4p^3(1-p)+p^4$

This number makes “coarsening invariance”, that coarsening the grid will not change the probability that it percolate

Finite-difference discretization: $-u" = f$, coarsening step $E'_i = R(E_i) = E_{i-1} + 2 E_i + E_{i+1}$, can reduce it to smaller systems that gives the same answer -> “coarsening invariant”

Essential Aspect: A problem can contain many components with diff scales, use of many scales of discretization to represents all components of this solution.

V-cycle

This can have multiple level, call v-cycle recursively

Pre-smoothing: Reduce the high frequency components of the error $e=x*-x$ and residual $r=b-A_1x$ by smoothing $x:=S_1(x,b)$
- Presmoothing: relaxed richardson, relaxed jacobi, relaxed Gaussian-seidel
Restriction: Now the error/residual can be well approximated on the coarse grid as $r_0 = Rr$, by using the restriction operator $R: V_1' \rightarrow V_0'$.
- Sampling, avg, 121-weighted, fraction of neighboring grid to center
Solve the residual equation $A_0e=r_0$ on the coarse grid
Prolongation: Interpolate the error e onto the fine grid, by using prolongation operator $P:U_0 \rightarrow U_1$
- Fraction of node contribute to neighboring node
Improve the current solution $x:=x + Pe$
Post-smoothing: Smooth again:$x:= S_2(x,b)$

W-cycle

level sequence : 321010121010123

use w-cycle when coarse grid correction is not accurate

Nonlinear Systems and Optimization

Multivariate Taylor expansion(function result is a vector): $f(x+p) = f(x) + J(x) p + O(||p||^2)$

Jacobian matrix: \[J(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \cdots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \\ \end{bmatrix}\]

Newton’s method for nonlinear systems

for k = 1…n
solve $J(x_k) p = - f(x_k)$
$x_{k+1} = x_k + p$

jacobi eval might be non-trivial
jacobi can be singular
random start to find all solution

Unconstrained optimization

For scalar function f(x), on n dimensions

gradient $\nabla \phi(x)$ and Hessian matrix $\nabla^2 \phi(x)$

Taylor expansion in several variables(function value is scalar):

\[\phi(x+p) = \phi(x) + \nabla p + \frac12 p^T H p + O(||p||^3)\]

Critical points where gradient is zero:

local minimum(maximum)
saddle point: those are stationary point but not a local extreme
Hessian matrix is positive(negative) semi-definite
sufficient condition is when Hessian is positive(negative) definite

Gradient based methods

$x_{k+1} = x_k + \alpha_k p_k$ $p_k = - B_k^{-1} \nabla \phi(x_k)$

Newton’s method: $B=\nabla^2 \phi(x_k)$
- converges quadratically, retains Hessian sparsity
- requires hessian and the evaluation of it.
- salving ls at each iteration
- no control over convergence
- weak line search sometime used to find the best alpha
Secant method (quasi-newton): $B_{k+1} p_k = y_k, y_k = \nabla \phi (x_{k+1}) - \nabla \phi (x_k)$
- B is approximation of Hessian
Gradient descent
- $B=I$
Newton + Gradient descent
- $B_k = \nabla^2 \phi(x_k) + \mu_k I$
BFGS method
- directly compute $B_k^{-1}$
- Rank-2 update
- superlinear convergence
Inexact Newton, Newton CG
- Preconditioned Conjugate Gradient ($P^{-1} \approx A^{-1}$

Nonlinear least square

\[\min_x \phi(x) = \frac12 ||g(x) - b||^2\]

\[\nabla \phi(x) = H(x)^T(g(x) - b)\]

Gauss-Newton method
- $\nabla^2 \phi(x) = A(x)^T A(x) + L(x)$
- $L_{i,j} = \sum \frac{d^2 g_l}{d x_i x_j}(g_l-b_l)$
- drop L and $A^TAp_k = - \nabla \phi(x_k)$
- $x_{k+1} = x_k + p_k$
- Faster when L is small

Constrained problem

\[\Omega = \{ x \in R^n | c_i(x)=0, i \in \epsilon, c_i(x) \geq 0, i \in I \}\]

Equality constraint
- reducing space, algebraic, domain $\Omega$ has empty interior
inequality constraint
- combinatorial, $\Omega$ can have non empty interior
Active set
- $A(x) = \epsilon \cup \{i \in i | c_i(x)=0\}$

At optimal $x^*, \nabla \phi (x^*) \propto \nabla c(x^*)$, the gradient is aligned, where the constant of proportionality $\lambda$ cannot be negative(depends on whether looking for the minimum)

Constraint quantification: let $A^T_x$ be the matrix whose columns are the gradients of all active constraints.
- It must have full column rank
Quadratic programming with equality constraint
- $\min \phi (x) = \frac12 x^THx - d^Tx$ s.t. $Ax=b$
- A has full row rank
- Lagrangian: $L(x,\lambda) = \phi(x) - \sum \lambda_i c_i(x)$
- KKT condition:
  - $\nabla_x L(x^*,\lambda^*) = 0$
  - $c_i(x^*) = 0 , i \in \epsilon$
  - $c_i(x^*) \geq 0$
  - $\lambda_i \geq 0$
  - $\lambda_i^* c_i(x^*) = 0$
  - $\begin{bmatrix} H & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} d \\ b \end{bmatrix}$
- Adding extra terms that the result of the new system is an approximation of old
Linear programing & its dual

Polynomial interpolation

General approximation and interpolation

Data fitting: resonable function that fits the data
- sometime might make sense to interpolate it(if there is no noise & want to be precise)
Approximating function
Interpolation : have some freedom to choose $x_i$ cleverly(use the most expressive data points before sampling)
may be able to consider the global interpolation error
prediction & manipulation
interpolating Data
interpolating fuction
linear combination of basis functions
- polynomial -> polynomial interpolation
- $A\lambda = y$ A - matrix of basis function, $\lambda$ coefficients, $y$ observed value
- constructing and evaluating
Basis function(polynomial interpolation):
- Monomial basis : gives a Vandermonde matrix (non-singular) simple
- Lagrange basis $\phi_j = \Pi_{i,i\neq j} \frac{x-x_i}{x_j - x_i}$ most stable
  - $L_{ii} = 1, L_{ij}=0$
  - A is Identity
- Newton Polynomial Basis $\phi_j = \Pi_{i,i\leq j-1} (x-x_i)$ adaptive
interpolation have to pass through all the data point (different from regression)
there is always a unique n-polynomial solution

Newton’s polynomial basis: divided difference form, which factor depends on which

\[f[x_i]=f(x_i)\] \[f[x_i,...,x_j]=\frac{f[x_{i+1}, ...,x_j] - f[x_i,...,x_{j-1}]}{x_j -x_i}\] \[f[z_0,...,z_k] = \frac{f^{(k)}(\eta)}{k!}\]

Existence of an adaptive interpolant

Error estimate:$\psi$ is newton basis, mean value theorem

\[f(n) - p_n(x) = \frac{f^{(n+1)}(\eta)}{(n+1)!} \psi_n (x)\]

Chebyshev interpolation

Choose the data abscissae so as to minimize the quantity $\max |\psi|$

Chebyshev points on [-1,1] : \[x_i = cos(\frac{2i+1}{2(n+1)} \pi), i=0,...,n\]

Interpolation error using Chebyshev points: [-1,1] $\beta = 2^{-n}$

Interpolating also derivative values

Unique osculating polynomial:

\[p^{(k)}_n(t_i) = f^{(k)}(t_i)\]

Hermite cubic

$(t_0,f(t_0)),(t_0,f'(t_0)),(t_0,f''(t_0)),...$ solve for coefficients(taylor expansion)

Newton interpolating form

\[p_n(x) = \sum_j f[x_0,...,x_j] \prod_i (x-x_i)\] \[ f[x_k,...,x_j]=\begin{cases} \frac{f[x_{k+1},...,x_j]-f[x_k,...,x_{j-1}]}{x_j-x_k}, & x_k \neq x_j, \\ \frac{f^{(j-k)}(x_k)}{(j-k)!}, & x_k = x_j, \end{cases}\]

Piecewise polynomial interpolation

Polynomial with really high degree
new points introduced will cause global recomputation
parametric curve
surface interpolation

divide interval into subintervals with interpolant on each subinterval that satisfy interpolation condition and global smoothness, $h=\max(t_i - t_{i-1})$ smaller h gives better interpolation

how the global smoothness is achieved
Piecewise linear
Piecewise constant
error is maximized in the middle point
$\frac18 (\frac{t_i-t_{i-1}}{2})^2 f"(\xi)$
Piecewise Cubic
in favor to quadratic for smoothness condition
Piecewise cubic hermite interpolation
where also values of the first derivative are given

Boundary condition

Free boundary, natural spline
clamped boundary, complete spline
Not-a-knot: third derivative continuous.

Parametric interpolation

Set up an intermediate variable and interpolate on both dimensions

Bezier polynomial

Multidimensional interpolation

\[V= \sum c_i \phi_i\]

bilinear interpolation

radial basis function

Approximation

approximation error: integral over the domain
function norm, $L1,L2,L_\inf$
Best least-square approximation(l2-norm)
- $v=\sum_j c_j \phi_j$
- $\sum_j c_j \int \phi_j \phi_k = \int f \phi_k$
- $Bx=b$, B is symmetric positive definite
- $B_{j.k} = \int \phi_j \phi_k$
- $b_j = \int f \phi_j$
- Bad B matrix: monomial basis gives Hilbert matrix
- Good B matix: orthogonal basis function $0 , j \neq k, 1 otherwise$
- scale and translate: $t=\frac12[(b-a)x + (a+b)]$ or $x = \frac{2t-a-b}{b-a}$
- Legendre polynomials: Gram-Schmidt when $w =1$
  - $\phi_0(x)=1$
  - $\phi_1(x)=x$
  - $\phi_{j+1}(x) = \frac{2j+1}{j+1}x\phi_j(x) - \frac{j}{j+1}\phi_{j-1}(x)$
  - diag: $\frac{2}{2j+1}$
  - zeroes: gauss point
- trigonometric polynomial
  - $\phi_0(x) = \frac1{\sqrt{2\pi}}, \phi_{2i-1}(x)=\frac1{\sqrt{\pi}}\sin(ix),\phi_{2i}(x)=\frac1{\sqrt{\pi}}\cos(ix)$
  - diag: 1 orthonormal
discrete least square algor: $B=A^TA, y=A^Tb$
Weighted least-square problem
- Gram-Schmidt process
  - $\phi_0(x)=1$
  - $\phi_1(x)=x-\beta_1$
  - $\phi_j(x)=(x-\beta_j)\phi_{j-1}(x) - \gamma_j \phi{j-2}(x)$
  - $\beta_j=\frac{\int_a^bxw(x)[\phi_{j-1}(x)]^2dx}{\int_a^bw(x)[\phi_{j-1}(x)]^2dx}$
  - $\gamma_j=\frac{\int_a^bxw(x)\phi_{j-1}(x)\phi_{j-2}(x)dx}{\int_a^bw(x)[\phi_{j-2}(x)]^2dx}$
- chebyshev polynomial: $\frac1{\sqrt{1-x^2}}$
- Laguerre polynomial: $w=e^{-x},(0,\inf)$
- Hermite Polynomial: $w=e^{-x^2},(-\inf,\inf)$

Chebyshev min-max property

$\{x\}$ chebyshev points
$T_0(x)=0$
$t_n= \Pi_k(x-x_k)$
Monic polynomial, have smallest maximum magnitude, over interval $max=\frac1{2^{n-1}}$

Numerical differentiation

Rely on polynomial approximation:

if $P_n(x) \approx f(x)$, $P'_n(x) \approx f'(x)$.

Tylor series theorem: approximate $f(x+h)$ using $f(x)$ and $f'(x)$…

reversed tylor series to approximate derivative

2 points($h$),3 points($h^2$),5 points($h^4$): centered,non-centered

Richardson extrapolation: generating higher order numerical methods from lower order ones, $h, 2h $ higher order. remove the leading error term. $h \rightarrow 2h$

not compact
reliance on the existence of higher and higher nicely bounded derivatives of f
enables easy production of higher order method from lower order ones

Interpolate & differentiation

Numerical integration

Interpolate & integrate, summation of the integrated basis function times coefficients.

basic rule
- trapezoidal: better than simpson when almost periodic (both end close to zero)
- midpoint
- Simpson(quadratic)
- Newton-Cotes formulas
- Works good when the step is small
Composite methods
- use basic rule on subintervals
- piecewise interpolation & integration
- error is the sum of that on each intervals
- $rh=b-a$
Precision exact for order p
Accuracy leading error term $q=p+1$
Orthogonal basis function
Gaussian quadrature formulas
Gauss points:
- zeroes of the Legendre polynomials $\phi_{n+1}(x)$
- n+1 achieve precision of 2n+1
- Weights : $a_j = \frac{2(1-x_j^2)}{[(n+1)!\phi_n(x_j)]^2}$
- error: $\frac{2^{2n+3}((n+1)!)^4}{(2n+3)((2n+2)!)^2}f^{(2n+2)}(\eta)$
Adaptive quadrature
- Adaptive subdivision: adaptive process, locally refined
- if the derivative doesn’t varies violently,
- error can be approximated by difference in estimate

this is not equivalent in showing that $||B||$ is bound by $\rho(B)$↩