Advanced Scientific Computing
- Guest Lectures
- Numerical algorithms and errors
- Roundoff Errors
- Solving equations
- Linear algebra refresh & more
- Linear least-squares(L2-norm)
- Iterative methods
- Eigenvalues and eigenvectors
- Multigrid Method
- Nonlinear Systems and Optimization
- Polynomial interpolation
- Piecewise polynomial interpolation
- Approximation
- Numerical differentiation
- Numerical integration
Guest Lectures
Numerical algorithms and errors
Source and types of errors
- Errors in the problem
- mathematical model
- input data
- Approximation errors
- Discretization
- Convergence
- Roundoff
Example : Trade-off between discretization & roundoff for compute derivative.
Algorithm Properties
- Accuracy
- contrast with precision
- Efficiency: hard to measure
- algo, code
- cpu, memory, disk, network
- performance vs complexity
- Robustness
Problem conditioning and stability
- ill-conditioned: small purturbation -> large difference in result
- stability:
- well posed: exists solution, unique, continuous change with initial
Condition number
how output change for a small change in the input. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input.
- for a nonlinear function(continuous): \(\kappa(f) = \frac{x f'}{f}\)
- As a rule of thumb, if the condition number \(\kappa(A) = 10^k\), then you may lose up to k digits of accuracy on top of what would be lost to the numerical method due to loss of precision from arithmetic methods.
Roundoff Errors
- inevitable
- every elementary operation (+,*) introduces a small, random relative error, up to about \(10^{-16}\)
- sometime can be ignored but may cause error
\[fl(x) = \begin{cases} 1.d_1d_2 \cdots d_t \times 2^{e} & d_{t+1}=0 \\ \mbox{nearest even} & otherwise \end{cases}\] relative floating point error(rounding, truncate will double the error): \[\frac{|fl(x) - x|}{|x|} \leq \frac12 2^{-t}\]
- rounding methods
- truncate
- round up
- round down
- round to the nearest value, ties away from zero
- round to the nearest value, ties to even Most Used
- Floating point number representation: number of significant digits
- spacing not constant(absolute), constant(relative)
- at 0, subnormal numbers
- Quadruple: 14+113
- Double precision:
- s=1,exp=11,frac=52(1 implicit bit, p=53)
- rounding unit : \(\eta = 1/2 b^{1-p} = 2^{-53} \approx 1.1\times 10^{-16}\)
- Single precision:
- s=1,exp=8,frac=23
- rounding unit: \(\eta = 2^{-24}\)
- exp 0000 and FFFF reserved
- bias : 7FFF
- signed zero and subnormals
- inf and NaNs
- exact rounding uses guard digits to ensure that relative error in each elementary arithmetic operation is bounded by \(\eta\)
- guard digits : using one more digits in computation
- can be as large as b-1 -> bounded by 2*machine precision
- roundoff error accumulation
- linear in the most cases, each time about the rounding unit level
- x+y, when x>>y, x/y, when 1 >> y
- magnifies by cancellation error(when two almost equal numbers are subtracted from each other),relative error
- IEEE : exact rounding, as if rounding happens after the computation
- overflow & underflow
- \(\sqrt{a^2+b^2} = \sqrt{1^2+(b/a)^2}, |a|>|b|\)
general floating point system \((\beta, t, L ,U)\):
- \(\beta\): base
- t: precision
- L: lower bound
- U: upper bound
Int | Long | Single | Double | |
---|---|---|---|---|
+/- | 1 | 1 | 4 | 4 |
x | 3 | 5 | 4 | 4 |
/ | 14~40 | 23~87 | 16 | 20 |
abelian group: closure, associativity, identity, inverse, commutivity
Solving equations
- Bisection
- secant
- Newton’s
Desired properties
- efficient
- robust
- minimal amount of additional data
- requires f to satisfy only minimal smoothness properties
- generalizes easily an naturally to many equations in many unknowns
Fixed point iteration
- From \(f(x)=0\) to \(x=g(x)\)
- solve for \(x = g(x)\)
- \(x_{k+1}=g(x_k)\) converges, the result is one of the solving methods
Bisection
- Simple & Safe
- slow
- hard to generalize to high dimension
Fixed point theorem:Will be at least a crossing when, and use bisection search till close to zero \[f(a)\cdot f(b)<0\]
\[\exists \rho < 1, |g'(x)\leq \rho|,a<x<b\] \[|x_{k+1} - x^*| \leq \rho |x_k - x^*|\]
- Brouwer fixed point theorem : \([a,b] \rightarrow [a,b]\)
- Contraction mapping principle : \(f'(x) < 1\)
rate of convergence: \[e_{k+1} \leq \frac{1}{2} e_k, e_k = |x_k - x^*|\]
Rate of Convergence
- linearly: \(e_{k+1} \leq \rho e_k\)
- quadratically: \(e_{k+1} \leq \rho e_k^2\)
- superlinear: linear + \(\exists n \in N, \exists \rho_k \rightarrow 0, \mbox{for } k>n\)
- \(\frac{e_{k+1}}{e_k} = \mu\), \(\mu \rightarrow 0\) superlinear, \(\mu \rightarrow 1\) sublinear
Newton’s Method
- function & initial guess
- \(x_{k+1}= x_{k} - \frac{f(x_k)}{f'(x_k)}\)
- until \(|x_{k}- x^*|,|x_k - x_{k-1}|,|f(x_k)|\)
Convergence of Newton’s Method:
- First derivative exists
- Tylor expansion at the root, subtract \(x^*\) on both side
- \(e_{k+1} \leq C e_k^2\)
- Multiple root: linear rate
Secant Method
- Avoid supplying \(f'\)
- Secant: line joining two point on a function
- \(x_{k+1} = x_{k} - \frac{f(x_k)(x_k-x_{k-1})}{f(x_k) - f(x_{k-1})}\)
- can’t solve \(x^{1/3}\)
- convergence rate(superlinearly): \(e_{k+1} \leq C e_k^\phi, \phi = \frac{1+\sqrt{5}}{2}\)
- proof:
- min \(x^*\) on both side
- \(e_{n+1} \rightarrow e_n e_{n-1}, e_n \rightarrow e_{n-1}^p\)
- get the \(\phi\)
Errors
- Convergence is greater than roundoff errors
- terminating condition greater than rounding unit
- well-conditioned problem(opposite ill-conditioned: multiple root(flat near the roor))
Linear algebra refresh & more
Linear least-squares(L2-norm)
\[\min_x||b-Ax||_2\]
where A is \(m\times n, m > n\) and A has full column rank
Equivalent: \[\psi(x) = \frac12 ||b-Ax||^2 = \frac12 \sum_{i=1}^m(b_i - \sum_{j=1}^n a_{ij} x_j)^2\]
Necessary condition: \(\frac{\partial}{\partial x_k} \psi(x) = 0 \rightarrow A^TAx=A^Tb\) This is the normal equation. normal because the relation between \(x,Ax\). From minimization to solving linear equations. Also to note that \(A^TA\) is positive definite.
\(x=(A^TA)^{-1}A^Tb\): The matrix multiplying b is the pseudo-inverse of A.
Least square problem: the least square problem has a unique solution that satisfy the normal equation.
Interpolation vs Data fitting
Data fitting:
- The long term trend
- Discard errors
- For ease of manipulation
- Take \(n<m\)
Data fitting in other norm: L-1 L-inf makes linear programming problems
solution
- Use Chelosky decomposition on normal equations
- fast, intuitive, less robust
- QR standard, more expansive, more robust
- SVD significantly more expensive and robust
Normal equation
- Chelosky decomposition \(B=A^TA=G^TG\)
- \(Gz=y\)
\(G^Tx=z\)
\(\kappa(B) = \kappa(A)^2\)
QR for LS
Full-version QR :
- Decompose \[A=Q \begin{bmatrix} R \\ 0 \end{bmatrix}\]
- Compose \[\begin{bmatrix} c \\ d \end{bmatrix} = Q^T b\]
- Solve upper triangular system \(Rx=c\)
Q is orthogonal , R is upper triangular \[||b-Ax|| = ||Q^Tb-\begin{bmatrix} R \\ 0 \end{bmatrix} x||\]
Economy-version SR:
- Decompose \[A=Q_{m\times n}R\]
- \(c=Q^Tb\)
- solve \(Rx=c\)
- solve nn linear equations
- solve LS
- find eigenvalues/vectors
QR for square matrix: \(A=QR\), Q is orthogonal, R is upper triangular
- uniqueness
- existance
- \(A^TA\) is symmetric positive definite can be decomposed into \(LL^T\)
- \(Q=A(L^T)^{-1}\)
- \(A=QR=QL^T=QR\)
- QR for solving linear equations…. condition number: not squared
Compute QR decomp :
- Construct R bit by bit till Q is orthogonal: Gram-Schmidt
- Apply a sequence of orthogonal transformation to A till A is upper triangular: Householder-reflector
Orthogonal transformation:\(T: V \rightarrow V\) that preserves inner product \(<u,v>\), combinations of rotations & reflections
inner product
- \(<\alpha,\beta> = \alpha \beta\)
- \(<u,v> = u \cdot v = ||u|| \; ||v|| \cos \theta\)
Householder reflector/ transformation:\(Q^TA=R\), col by col to zero out the elements under
- \(Q_k\) is orthogonal
- \(Q_k\) zero out elements below diagonal
- \(Q_k\) preserves previously introduced zeros
- \[Q_k = \begin{bmatrix} I_{(k-1)\times (k-1)} & 0 \\ 0 & F \end{bmatrix}\]
- F : householder reflector, \(Fx=||x||e_1\)
- \(F=I-2uu^T, u = \frac{v}{||v||}\)
- For numerically stable algorithms you might want to choose the one that moves the furtherest, \(v_-\)
- \(x=A(k:m:,k), v_k = x+sign(x_1) ||x||e_1\)
Iterative methods
- fill-in effect
- don’t need exact solution
- warm start: “initial guess” to improve performance
- only \(Ax\) is given
Basic idea
- given \(Ax=b\) where A is square and invertible
- find \(B\) and vector \(c\) such that \(I-B\) is invertible
- The solution is the same as \(x=Bx+c\)
- Solve it as fixed point methods
- stationary \(B,c\) doesn’t depends on \(x\)
- under certain conditon it will converges to the solution
- \[\lim_{x \rightarrow \inf} x_k=x, \forall x\]
- spectral radius : \(\rho(B) = \max\{ \lambda \} < 1\)
Splitting of A:
- \(A=M-N\)
- A is invertible
- M is invertible and easy to invert
- \(x=M^{-1}Nx+M^{-1}b\)
- \(B=M^{-1}N ; C=M^{-1}b\)
Jacobi method/Gaussian-seidel
M,N disjoint
- A = D - E - F
- D is the diagonal
- (-E) the lower sub triangular
- (-F) the upper sub triangular
- Jacobi: M=D, N=E+F
- \(B=M^{-1}N=I - D^{-1} A\)
- \(c = M^{-1} b = c - D^{-1}b\)
- GS: M=D-E, N=F
- \(B=M^{-1}N=(D-E)^{-1}F\)
- \(c=M^{-1}b=(D-E)^{-1}b\)
- A specific version of the relaxation method where coefficient is 1: \(L_1\)
Under certain cases : GS converges faster than Jacobi
Relaxation medthod
Successive over-relaxation
Weighted version of GS methods
- \(M = \frac{D}{w} -E, N=\frac{1-w}{w}D+F\)
- \(L_w = B=M^{-1}N = (D-wE)^{-1}[(1-w)D + w F]\)
- \(c=M^{-1}b = w(D-wE)^{-1}b\)
- \(w \in (1,2)\) better
- \(w \in (0,1)\) not so useful for speedup of convergence
Convergence
- \(w \in (0,2) \rightarrow \rho[L_w] < 1\)
\(\exists w_{opt} \rightarrow \min\{ \rho \}\)
- \(e_k = B \cdot e_{k-1} = B^k e_0\)
- \(||e_k|| = ||B^k|| e_0\)
- \(||B|| < 1 \rightarrow \rho(B) < 1\)1
rate of convergence: \(-\log_{10} \rho(B)\)
Ostrouski-Reich Theorem: If \(A =D - E - F\) and is Hermitian positive definite. If \(0<w<2\) then the relaxation method converges
Termination Criterion
\[\frac{||x-x_k||}{||x||} \leq \kappa(A) \frac{||r_k||}{||b||}\] -> tolerance
\[||x-x_k|| \leq ||B|| ||x-x_{k-1}|| \leq ||B|| (||x-x_k||+||x_k-x_{k-1}||)\] \[||e_k|| \leq \frac{||B||}{||I-B||}||x_k-x_{k-1}||\]
- Both GS and Jacobi method converges
- GS method converges faster than jacobi method(twice as fast,under certain conditions: tridiagonal matrix)
- \(\rho(L_1) = \rho^2(J)\)
if A is tridiagonal and symmetric
- they all converges
- \(\exists w_0 = \frac{2}{1+\sqrt{1-\rho^2(J)}} \rightarrow \rho(L_{w_0}) = w_0 -1 \mbox{ is minimum}\)
- If \(\rho(J)>0 \rightarrow \rho(L_{w_0}) < \rho(L_1) = \rho^2(J) < \rho(J)\)
- if \(\rho(J) = 0, w_0 = 1 , \rho(L_1)=0\)
Conjugate Gradient Method
- A is symmetric positive definite
- solving Ax=b
- Solution is the minimizer of \(f(x)= \frac12 x^TAx - b^Tx+C\)
- And iteration has the form \(x_{k+1} = x_k + \alpha_k d_k\)
- \(f'(x) = \frac12 A^Tx + \frac12 Ax -b\)
- solution to \(Ax=b\) is a critical point of \(f(x)\).
Method of Steepest Descent(SD)
- Move in the direction that the function decrease most rapidly \(-f'(x_k)\)
- error: \(e_k = x_k -x\)
- residual: \(r_k = b-Ax_k = -A e_k\)
- claim: \(r_k = -f'(x_k)\)
- till gradient is orthogonal to the direction
- \(r_k = b - A x_k\)
- \(\alpha_k = \frac{r^T_kr_k}{r^TkAr_k}\)
- \(d^T(b-A(x+ad)) = 0 = d^Td - ad^TAd\)
- \(x_{k+1} = x_k + \alpha_k r_k\)
Convergence bahavior:
- \(\kappa(A) = ||A||||A^{-1}||\)
- For SPD \(\kappa(A)= \lambda_{max} / \lambda_{min}\)
- \(||e_i|| \leq (\frac{\kappa - 1 }{\kappa +1})^i||e_o||_A\)
- \(\frac{f_i -f}{f_0 -f} \leq (\frac{\kappa - 1}{\kappa +1})^{2i})\)
- energy norm : \(||e||_A = \sqrt{e^T A e}\)
Orthogonal Direction
Every step’s direction is orthogonal to that of before.
Conjugate Direction(CD)
- Each time move in one dimension till it reaches that required to reach the final solution
- condition: \(d_i^Te_{i+1}=0 = d_i^T(e_i + \alpha_i d_i)\)
- \(\alpha_i = - \frac{d_iT^ e_i}{d_i^Td_i}\)
- This can be achieved by making the direction not orthogonal but A-orthogonal(conjugate) to each other.
- \(d_i^TAd_j = 0\)
Every time the step is till the direction is conjugate to that of the residue, converges in n steps \[\alpha_i = \frac{d_i^T r_i}{d_i^T A d_i}\]
Gram-Schmidt conjugation: given n linearly independent vectors produce A-orthogonal vectors \(d_0,d_1, \dots, d_{n-1}\)
- \(d_0 = u_0\)
- \(d_i = u_i + \sum_{k=0}^{i-1} \beta_{ik} d_k, \beta_ij = - \frac{u_i A d_j}{d_j^TAd_j}\)
Conjugate Gradient(CD)
A kind of conjugate direction method, Search directions are constructed by conjugate of residuals. Page 184.
\(d_0 = r_0\) & \(e_1 \perp_A d_0\)
- \(D_i = span\{ d_0,d_1, \dots, d_{i-1} \}\)
- \(e_i = e_0 + D_i \rightarrow \min (||e_i||_A)\)
Convergence bahavior: \[||e_i||_A \leq 2 (\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1})^i ||e_0||_A\]
Eigenvalues and eigenvectors
rotation matrix have complex eigenvalues(recommend to remove it before proceed:\(AA^T\)
- Power method: \(x'_i=Ax_{i-1},x_i = \frac{x'_i}{x'_i}\) this can be used to find the largest eigenvalue and its vector
Inverse matrix have the same eigenvectors but \(\lambda_i \rightarrow 1/\lambda_i\).
- Inverse iteration: \(B=A-\alpha I \rightarrow ( \frac1{\lambda_i - \alpha}, x_i), x'_i = ( A-\alpha I)x_{i-1}\), alpha close to spectral radius
Rayleigh quotient: \(\lambda = v^TAv\) always push towards \(lambda_1\).
- Rayleigh quotient: solve \(A-\alpha I,v_k\), estimate alpha, using the above method until changes no more
SVD
Zero out smaller eigenvalues in \(\Sigma\) gives better conditioned problem & Uses less storage
pseudo-inverse \(A^+\), Regularization, truncated SVD
- Orthogonal trasformation: doesn’t change eigenvalues \(Q^TAQ\)
- For QR method if R is upper triangular, it’s diagonal are its eigenvalues.
Multigrid Method
Motivation: Vertical percolation , “liquid go through”
\(p* = 0.618\) fixed point for the probability that water goes through(the pattern occuring), coarsening function. \(R(p) = 2p^2(1-p)^2 + 4p^3(1-p)+p^4\)
This number makes “coarsening invariance”, that coarsening the grid will not change the probability that it percolate
Finite-difference discretization: \(-u" = f\), coarsening step \(E'_i = R(E_i) = E_{i-1} + 2 E_i + E_{i+1}\), can reduce it to smaller systems that gives the same answer -> “coarsening invariant”
Essential Aspect: A problem can contain many components with diff scales, use of many scales of discretization to represents all components of this solution.
V-cycle
This can have multiple level, call v-cycle recursively
- Pre-smoothing: Reduce the high frequency components of the error \(e=x*-x\) and residual \(r=b-A_1x\) by smoothing \(x:=S_1(x,b)\)
- Presmoothing: relaxed richardson, relaxed jacobi, relaxed Gaussian-seidel
- Restriction: Now the error/residual can be well approximated on the coarse grid as \(r_0 = Rr\), by using the restriction operator \(R: V_1' \rightarrow V_0'\).
- Sampling, avg, 121-weighted, fraction of neighboring grid to center
- Solve the residual equation \(A_0e=r_0\) on the coarse grid
- Prolongation: Interpolate the error e onto the fine grid, by using prolongation operator \(P:U_0 \rightarrow U_1\)
- Fraction of node contribute to neighboring node
- Improve the current solution \(x:=x + Pe\)
- Post-smoothing: Smooth again:\(x:= S_2(x,b)\)
W-cycle
level sequence : 321010121010123
use w-cycle when coarse grid correction is not accurate
Nonlinear Systems and Optimization
Multivariate Taylor expansion(function result is a vector): \(f(x+p) = f(x) + J(x) p + O(||p||^2)\)
Jacobian matrix: \[J(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \cdots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \\ \end{bmatrix}\]
Newton’s method for nonlinear systems
- for k = 1…n
- solve \(J(x_k) p = - f(x_k)\)
- \(x_{k+1} = x_k + p\)
- jacobi eval might be non-trivial
- jacobi can be singular
- random start to find all solution
Unconstrained optimization
For scalar function f(x), on n dimensions
gradient \(\nabla \phi(x)\) and Hessian matrix \(\nabla^2 \phi(x)\)
Taylor expansion in several variables(function value is scalar):
\[\phi(x+p) = \phi(x) + \nabla p + \frac12 p^T H p + O(||p||^3)\]
Critical points where gradient is zero:
- local minimum(maximum)
- saddle point: those are stationary point but not a local extreme
- Hessian matrix is positive(negative) semi-definite
- sufficient condition is when Hessian is positive(negative) definite
Gradient based methods
\(x_{k+1} = x_k + \alpha_k p_k\) \(p_k = - B_k^{-1} \nabla \phi(x_k)\)
- Newton’s method: \(B=\nabla^2 \phi(x_k)\)
- converges quadratically, retains Hessian sparsity
- requires hessian and the evaluation of it.
- salving ls at each iteration
- no control over convergence
- weak line search sometime used to find the best alpha
- Secant method (quasi-newton): \(B_{k+1} p_k = y_k, y_k = \nabla \phi (x_{k+1}) - \nabla \phi (x_k)\)
- B is approximation of Hessian
- Gradient descent
- \(B=I\)
- Newton + Gradient descent
- \(B_k = \nabla^2 \phi(x_k) + \mu_k I\)
- BFGS method
- directly compute \(B_k^{-1}\)
- Rank-2 update
- superlinear convergence
- Inexact Newton, Newton CG
- Preconditioned Conjugate Gradient (\(P^{-1} \approx A^{-1}\)
Nonlinear least square
\[\min_x \phi(x) = \frac12 ||g(x) - b||^2\]
\[\nabla \phi(x) = H(x)^T(g(x) - b)\]
- Gauss-Newton method
- \(\nabla^2 \phi(x) = A(x)^T A(x) + L(x)\)
- \(L_{i,j} = \sum \frac{d^2 g_l}{d x_i x_j}(g_l-b_l)\)
- drop L and \(A^TAp_k = - \nabla \phi(x_k)\)
- \(x_{k+1} = x_k + p_k\)
- Faster when L is small
Constrained problem
\[\Omega = \{ x \in R^n | c_i(x)=0, i \in \epsilon, c_i(x) \geq 0, i \in I \}\]
- Equality constraint
- reducing space, algebraic, domain \(\Omega\) has empty interior
- inequality constraint
- combinatorial, \(\Omega\) can have non empty interior
- Active set
- \(A(x) = \epsilon \cup \{i \in i | c_i(x)=0\}\)
At optimal \(x^*, \nabla \phi (x^*) \propto \nabla c(x^*)\), the gradient is aligned, where the constant of proportionality \(\lambda\) cannot be negative(depends on whether looking for the minimum)
- Constraint quantification: let \(A^T_x\) be the matrix whose columns are the gradients of all active constraints.
- It must have full column rank
- Quadratic programming with equality constraint
- \(\min \phi (x) = \frac12 x^THx - d^Tx\) s.t. \(Ax=b\)
- A has full row rank
- Lagrangian: \(L(x,\lambda) = \phi(x) - \sum \lambda_i c_i(x)\)
- KKT condition:
- \(\nabla_x L(x^*,\lambda^*) = 0\)
- \(c_i(x^*) = 0 , i \in \epsilon\)
- \(c_i(x^*) \geq 0\)
- \(\lambda_i \geq 0\)
- \(\lambda_i^* c_i(x^*) = 0\)
- \(\begin{bmatrix} H & A^T \\ A & 0 \end{bmatrix} \begin{bmatrix} x \\ \lambda \end{bmatrix} = \begin{bmatrix} d \\ b \end{bmatrix}\)
- Adding extra terms that the result of the new system is an approximation of old
- Linear programing & its dual
Polynomial interpolation
General approximation and interpolation
- Data fitting: resonable function that fits the data
- sometime might make sense to interpolate it(if there is no noise & want to be precise)
Approximating function
- Interpolation : have some freedom to choose \(x_i\) cleverly(use the most expressive data points before sampling)
- may be able to consider the global interpolation error
prediction & manipulation
- interpolating Data
- interpolating fuction
- linear combination of basis functions
- polynomial -> polynomial interpolation
- \(A\lambda = y\) A - matrix of basis function, \(\lambda\) coefficients, \(y\) observed value
- constructing and evaluating
- Basis function(polynomial interpolation):
- Monomial basis : gives a Vandermonde matrix (non-singular) simple
- Lagrange basis \(\phi_j = \Pi_{i,i\neq j} \frac{x-x_i}{x_j - x_i}\) most stable
- \(L_{ii} = 1, L_{ij}=0\)
- A is Identity
- Newton Polynomial Basis \(\phi_j = \Pi_{i,i\leq j-1} (x-x_i)\) adaptive
- interpolation have to pass through all the data point (different from regression)
there is always a unique n-polynomial solution
Newton’s polynomial basis: divided difference form, which factor depends on which
\[f[x_i]=f(x_i)\] \[f[x_i,...,x_j]=\frac{f[x_{i+1}, ...,x_j] - f[x_i,...,x_{j-1}]}{x_j -x_i}\] \[f[z_0,...,z_k] = \frac{f^{(k)}(\eta)}{k!}\]
- Existence of an adaptive interpolant
Error estimate:\(\psi\) is newton basis, mean value theorem
\[f(n) - p_n(x) = \frac{f^{(n+1)}(\eta)}{(n+1)!} \psi_n (x)\]
Chebyshev interpolation
Choose the data abscissae so as to minimize the quantity \(\max |\psi|\)
Chebyshev points on [-1,1] : \[x_i = cos(\frac{2i+1}{2(n+1)} \pi), i=0,...,n\]
Interpolation error using Chebyshev points: [-1,1] \(\beta = 2^{-n}\)
Interpolating also derivative values
Unique osculating polynomial:
\[p^{(k)}_n(t_i) = f^{(k)}(t_i)\]
Hermite cubic
\((t_0,f(t_0)),(t_0,f'(t_0)),(t_0,f''(t_0)),...\) solve for coefficients(taylor expansion)
Newton interpolating form
\[p_n(x) = \sum_j f[x_0,...,x_j] \prod_i (x-x_i)\] \[ f[x_k,...,x_j]=\begin{cases} \frac{f[x_{k+1},...,x_j]-f[x_k,...,x_{j-1}]}{x_j-x_k}, & x_k \neq x_j, \\ \frac{f^{(j-k)}(x_k)}{(j-k)!}, & x_k = x_j, \end{cases}\]
Piecewise polynomial interpolation
- Polynomial with really high degree
- new points introduced will cause global recomputation
- parametric curve
- surface interpolation
divide interval into subintervals with interpolant on each subinterval that satisfy interpolation condition and global smoothness, \(h=\max(t_i - t_{i-1})\) smaller h gives better interpolation
how the global smoothness is achieved
- Piecewise linear
- Piecewise constant
- error is maximized in the middle point
\(\frac18 (\frac{t_i-t_{i-1}}{2})^2 f"(\xi)\)
- Piecewise Cubic
in favor to quadratic for smoothness condition
- Piecewise cubic hermite interpolation
where also values of the first derivative are given
Boundary condition
- Free boundary, natural spline
- clamped boundary, complete spline
- Not-a-knot: third derivative continuous.
Parametric interpolation
Set up an intermediate variable and interpolate on both dimensions
Bezier polynomial
Multidimensional interpolation
\[V= \sum c_i \phi_i\]
bilinear interpolation
radial basis function
Approximation
- approximation error: integral over the domain
- function norm, \(L1,L2,L_\inf\)
- Best least-square approximation(l2-norm)
- \(v=\sum_j c_j \phi_j\)
- \(\sum_j c_j \int \phi_j \phi_k = \int f \phi_k\)
- \(Bx=b\), B is symmetric positive definite
- \(B_{j.k} = \int \phi_j \phi_k\)
- \(b_j = \int f \phi_j\)
- Bad B matrix: monomial basis gives Hilbert matrix
- Good B matix: orthogonal basis function \(0 , j \neq k, 1 otherwise\)
- scale and translate: \(t=\frac12[(b-a)x + (a+b)]\) or \(x = \frac{2t-a-b}{b-a}\)
- Legendre polynomials: Gram-Schmidt when \(w =1\)
- \(\phi_0(x)=1\)
- \(\phi_1(x)=x\)
- \(\phi_{j+1}(x) = \frac{2j+1}{j+1}x\phi_j(x) - \frac{j}{j+1}\phi_{j-1}(x)\)
- diag: \(\frac{2}{2j+1}\)
- zeroes: gauss point
- trigonometric polynomial
- \(\phi_0(x) = \frac1{\sqrt{2\pi}}, \phi_{2i-1}(x)=\frac1{\sqrt{\pi}}\sin(ix),\phi_{2i}(x)=\frac1{\sqrt{\pi}}\cos(ix)\)
- diag: 1 orthonormal
discrete least square algor: \(B=A^TA, y=A^Tb\)
- Weighted least-square problem
- Gram-Schmidt process
- \(\phi_0(x)=1\)
- \(\phi_1(x)=x-\beta_1\)
- \(\phi_j(x)=(x-\beta_j)\phi_{j-1}(x) - \gamma_j \phi{j-2}(x)\)
- \(\beta_j=\frac{\int_a^bxw(x)[\phi_{j-1}(x)]^2dx}{\int_a^bw(x)[\phi_{j-1}(x)]^2dx}\)
- \(\gamma_j=\frac{\int_a^bxw(x)\phi_{j-1}(x)\phi_{j-2}(x)dx}{\int_a^bw(x)[\phi_{j-2}(x)]^2dx}\)
- chebyshev polynomial: \(\frac1{\sqrt{1-x^2}}\)
- Laguerre polynomial: \(w=e^{-x},(0,\inf)\)
- Hermite Polynomial: \(w=e^{-x^2},(-\inf,\inf)\)
- Gram-Schmidt process
Chebyshev min-max property
- \(\{x\}\) chebyshev points
- \(T_0(x)=0\)
- \(t_n= \Pi_k(x-x_k)\)
- Monic polynomial, have smallest maximum magnitude, over interval \(max=\frac1{2^{n-1}}\)
Numerical differentiation
Rely on polynomial approximation:
if \(P_n(x) \approx f(x)\), \(P'_n(x) \approx f'(x)\).
Tylor series theorem: approximate \(f(x+h)\) using \(f(x)\) and \(f'(x)\)…
reversed tylor series to approximate derivative
2 points(\(h\)),3 points(\(h^2\)),5 points(\(h^4\)): centered,non-centered
Richardson extrapolation: generating higher order numerical methods from lower order ones, $h, 2h $ higher order. remove the leading error term. \(h \rightarrow 2h\)
- not compact
- reliance on the existence of higher and higher nicely bounded derivatives of f
- enables easy production of higher order method from lower order ones
Interpolate & differentiation
Numerical integration
Interpolate & integrate, summation of the integrated basis function times coefficients.
- basic rule
- trapezoidal: better than simpson when almost periodic (both end close to zero)
- midpoint
- Simpson(quadratic)
- Newton-Cotes formulas
- Works good when the step is small
- Composite methods
- use basic rule on subintervals
- piecewise interpolation & integration
- error is the sum of that on each intervals
- \(rh=b-a\)
- Precision exact for order p
- Accuracy leading error term \(q=p+1\)
Orthogonal basis function
- Gaussian quadrature formulas
- Gauss points:
- zeroes of the Legendre polynomials \(\phi_{n+1}(x)\)
- n+1 achieve precision of 2n+1
- Weights : \(a_j = \frac{2(1-x_j^2)}{[(n+1)!\phi_n(x_j)]^2}\)
- error: \(\frac{2^{2n+3}((n+1)!)^4}{(2n+3)((2n+2)!)^2}f^{(2n+2)}(\eta)\)
- Adaptive quadrature
- Adaptive subdivision: adaptive process, locally refined
- if the derivative doesn’t varies violently,
- error can be approximated by difference in estimate
this is not equivalent in showing that \(||B||\) is bound by \(\rho(B)\)↩