Advanced Algorithm

Randomized Algorithms
Approximation Algorithms
Flows
- Ford-Fulkerson algorithm
- Examples
Linear Programming
Moving towards larger dataset
Game theory
Algorithmic fairness
Interactive proofs
Cake cutting algorithm

Randomized Algorithms

Hash

Universality property

A random hash function from U to m that satisfy \[\forall x,y \in U, P\{h(x) == h(y)\} = \frac{1}{m}\]

Hash Choice

$h_A(x) = (ax \mod p) \mod m, p > |U|, a \in \{1, 2, ... , p-1\}$

power of two choices: $\log \log n$ n balls: $n^2$ bins to prevent collisions

bloom filters

depends on requirements : is $x \in S ? \rightarrow \mbox{yes} even if x \notin S$

Randomized Algorithms’ properties

las vegas Algorithms
- always correct
- resource usage randomized
monte carlo Algorithms
- sometimes correct
- resource usage is fixed
- amplification run it n times: the chance of failure to drop exponentially(can be arbitartily small)

LV algorithm that runs in expected time T(n), run for 3T(n) steps. converts to monte carlo

Markov inequality: $P(x>3\mu) \leq \frac{1}{3}$

Min cut

cut $(S,V-S)$, size of cut is the number of edges between $S,V-S$.

contract edge : take an edge and collapse it and two endpoints to a single node

edge contaction preserves the min cut in a graph

min cut k, $|E| \geq \frac{kn}{2}$

randomly pick edges and accumulate:
prob = $1/n^2$
T times prob failure = $e^{-T/n^2}$

Freivald’s methods

verify AB=C in $n^2$ time

one way answer
take $r \in (0,1)^n$
compare $ABr | Cr$
probability is bounded by some number

Schwartz-Zipped-DeMillo-Lipton lemma

prime field $F_p = arithmetic \mod p$ for a polynominal P, it only have at most P root.

two bits string, $a_1,...,a_n$ and $b_1,...,b_n$
$A = \sum_{x=1}^{x=n}a_x x^n$
pick a field F, $|F| = n^2 $
$r \in F$, $A(r) | B(r)$
$Pr(A(r) == B(r)| A \neq B)\leq \frac{n}{n^2} = \frac{1}{n} = \frac{d}{|S|}$

Document Similarity

Jacob similarity = $\frac{|A\cap B|}{A \cup B}$

hash of doc: \[h(w) \rightarrow [0,1]\] \[h(doc) = \min\{h(w),w \in doc\}\] \[Pr(h(A) = h(B)) = Jacob similarity\]

Probabilistic Behavior

resource usage
correctness
tail distribution
- Markov inequality:
  - $E[X] = \mu,X \in R^+, \forall a>0, Pr(X\geq a\mu) \leq \frac{1}{a}$
- Chebyshev inequality:
  - $Pr(|X-\mu| \geq a \sigma) \leq \frac{1}{a^2}$
- Chernoff bound:
  - $Pr(x>(1+\delta) \mu) \leq c_\sigma^\mu$
  - $c_\sigma = [\frac{c^\sigma}{(1+\sigma)^{1+\sigma}}]$
  - Simpler form: $Pr(|X-\mu| \leq \sigma \mu) \leq 2 e^{-\frac{\mu\delta^2}{3}}$

Approximation Algorithms

Give an answer $s\in S$, have some function $V(s)$, give the value of the solution. Objective: $\min/ \max\{V(s)\}$

PAC Learning¹

Relative/absolute error comparing to best $V^*(S)$
Guarantee of running time & quality. comparing to herustics
Algorithm that will always give an answer with less than certain error
Don’t know the right answer
Max-cut : half approximation, using randomly spliting the vertices set

Finding the median in constant time

find median O(n)
approximate median: value / rank
Take one random element: $E[\Delta Rank]= n/4$
take three random element: $5/32$

Constant storage majority element

take two, same keep count
+/- count
take k bins to find the frequency of n/k

Trade-off

Deterministic / randomized
exact / approximate

Vertex cover

Degree strategy : $ s/opt log N$. opt and each progress

Set of edges chosen forms a matching -> at least k $s/opt \leq 2$

lower bound
$A \leq c f(G)$
$opt \geq f(G)$
$\frac{A}{opt} \geq f(g) \geq \frac{A}{C}$
$\frac{A}{opt} \leq C$

Set cover

Harder, NP hard, for less than $logN$ approx

NP hard Rank

independent set
set cover
vertex cover
arbitrary good

Travelling salesperson

If there’s an algorithm that approximates TSP to g(n) factor (relative apporximation) then P=NP

Hamilton cycle, path visiting each vector exactly once(gap method)
- add missing edges to have large weight, k
- TSP -> check total weight
- arbitrary precision

Approximation algor(triangular inequality)

compute MST
preorder traversal, with poping
delete multiple instances with direct distance
$OPT \geq MST$ - lower bound
cost less than 2MST(because we duplicated all edges to make it eularian)
approximation is within 2 times of the optimal

Reduce the 2 to 1.5: * observe the number of odd-degreed nodes are even * will pairing them up to make it eularian cut down the number of added edges? * convert to minimum cost matching * $C(M^+) \leq C(M in opt) \leq \frac{OPT}{2}$ * C(tour) within 1.5

Polynomial time approximation scheme(PTAS)

Want $1+\epsilon, \epsilon \geq 0$ approximation. Runing time poly(n), but arbitrary function of $\frac1\epsilon$, such as $n^2 2^{1\epsilon}$

Knapsack

n objects each with a value $v_i$ and weight $w_i$, Knapsack carries weight w, find a subset of the item with the maximum value and total weight $\leq w$

Dynamic programming solution, O(nw).

#include<iostream>

using namespace std;

int main()
{
    int n,m;
    int a[1000],w,v;
    cin >> n;
    cin >> m;
    for (int i = 0; i< n;++i)
    {
        cin >> w >> v;
        for (int j=m;j>=w;--j)
        {
            if (a[j]< a[j-w]+v)
                a[j] = a[j-w] + v;
        }
    }
    cout<< a[m];
}

Approximation algor

exists an algorithm for knapsack running in time $O(n^2 v_{max})$

For number b, replace $v_i \mbox{ by } \tilde{v_i} = \lceil \frac{v_i}{b} \rceil,\hat{v_i} = \frac{v_i}{b}$

$\tilde{OPT} = b \hat{OPT}$

Run $O(n^2 v_{max})$ on $\tilde{v_i}$, multiply the result by b.

Set $\frac{\epsilon}{2n}v_{max}, \tilde{v}_{max} \leq \frac{2n}{\epsilon} +1$, running time is $O(\frac{n^3}{\epsilon})$, $\sum_{S^*} v_i \leq \sum_{S^*} \tilde{v}_i \leq \sum_S \tilde{v}_i \leq \sum_S v_i+b$ with S being our solution $S^*$ being the optimal set of items. $OPT \leq nb + \sum v_i$

Let $b=\frac{\epsilon v^*}{n} \leq \frac{\epsilon}{n}A \rightarrow nb+a \leq A+\epsilon A$.

Dynamic programming using weight, using the weight table.

Thought process

Connect the problem with a lower bound, and devise and algorithm to connect the original problem to the lower bound. Or reduce the original problem input and devise a problem for the reduced input.

Flows

Directed graph
Source(s) and sink(t) vertices

Properties of flow

$f(e) \geq 0$
conservation constraints: for any $v \neq s,t, \; \sum(in) = \sum(out)$
Value of flow : $s: \sum(out)- \sum(in)$. same for the sink

Maximum low -> min cut(s-t cut)

partition of V into A,B with $s \in A, t \in B$
Capacity of edges $C:E \rightarrow R^+$
Capacity of cut: $||A,B|| = \sum_{v \in A, u \in B} C(v\rightarrow u)$
$\forall e \in E, F(e)\leq C(e)$

Let f be any flow, (A,B) be any st-cut: $v(f) \leq ||A,B||$

The maxflow-mincut theorem: The value of the maximum flow is equal to the minimum st-cut

special case : suppose $\exists f \& (A,B) s.t \; v(f) = ||A,B|| \; v(f) \mbox{ is maximum }, ||A,B|| \mbox{ is minimum}$

Ford-Fulkerson algorithm

As long as there is an open path(augmenting path), add the minimum flow residual of that path to the cut(augment). An edge is split as forward edge $c-f^*$ with backward edge $f^*$.

How to construct an S-T cut:

A = all vertices reachable from s
B = V - A
Claim $||A,B|| = v(f^*) = \sum f^*(A \rightarrow B) - \sum f^*(B \rightarrow A)$
by definition, if not, will make the vertices reachable from A.
Every augementing path has capacity at least 1,
$O(mC*)$

Problem:

it may take infinite time for $v(f*) = 2x+1$ where x is irrational
converges to wrong answer

Solutions: by changing how to choose the augmenting path

FAT pipes: the path that maximizes flow
- for every augmenting path: $v \geq \frac{f^*}{m}$
- $O(m^2 \log m \ln f^*)$
SHORT pipes:
- if every edge flips at most $\frac{|v|}2$ times, running time $O(m^2n)$
- the level of a vertex: shortest path from s
- $l_j(u) \geq l_i(u) + 2$ for each time it flips

Examples

Edge disjoint path: given a graph and (s,t) find the total number of edge disjoint path…same as flow with edge capacity 1

Vertex disjoint path: given a graph and (s,t) find the toal number of edge disjoint path…split the vertex as in & out vertex

Bipartite graph matching: create (s,t) with only 1 capacity to vetices in A,B

Project Management: A set of jobs J, and dependencies between them. Each job have a profits.

C = sum of all the positive profits
\[\begin{cases} C(i \rightarrow j) &= \inf \\ C(s \rightarrow v) &= P_v , + \\ C(v \rightarrow t) &= -P_v , - \\ C(a \rightarrow b) &= \inf , a \rightarrow b \end{cases}\]
C - MINCUT = result

Linear Programming

Variables : $x_1,x_2,\dots, x_n$
A cost function to maximize or minimize: $c_1x_1 + c_2x_2 + \dots + c_nx_n$
- max $\overrightarrow{c} \cdot \overrightarrow{x}$
Constraints : $a_{i1}x_1 + a_{i2}x_2 + \dots + a_{in}x_n \leq b_i$
- $A \cdot \overrightarrow{x} \leq \overrightarrow{b}$
- $x\geq 0$
Above are the Canonical form, can transform a problem to this form

Solving Linear Programming: $\overrightarrow{c}$ can be interpreted as direction that the equi-solution line moving perpendicular to that of the equi-solution line. After rotation, -> finding the lowest point

Examples

Shortest paths:

max $d_t$
$d_s = 0$
$\forall u,v \in E d_u - d_v \leq w_{u,v}$
min $\sum_{u,v \in E} w_{uv} x_{uv}$
$\sum_{s,w \in E} x_{sw} - \sum_{s,v \in E} x_{vs} = 1$
$\sum_{t,w \in E} x_{tw} - \sum_{t,v \in E} x_{vt} = -1$
$\sum_{u,w \in E} x_{uw} - \sum_{u,v \in E} x_{vu} = 0$

Max Flow:

similar to the above
except the s,t constraints

Min Cut:

$\min \sum y$
$y_{uv} + d_u + d_v \geq 0$
$y_{sv} - d_v \geq 0$
$y_{ut} + d_u \geq 0$

Duality

Primal LP

max $cx$
$Ax \leq b$
$x\geq 0$

Dual LP

min $yb$
$y^TA \geq c^T$
$y\geq 0$
primal variables -> dual constraints
primal constraints -> dual variables
combining bounds to get the tightest
unbounded -> equality
bounded to other sign -> relationhip to another sign
weak duality : $\forall \mbox{ feasible } x,y: cx \leq yb$
strong duality : $c x^* = y^* b$
1. check $Ax \leq b$
2. ask for feasible y for dual LP
3. Check $y^TA \geq c$
4. $cx=yb$

Primal-dual methods:

start with x feasible & infeasible y
use x produce a y that is less in feasible
backforth

Solve problem don’t know how to solve

Vertex cover: We have G(V,E). Find $S \subseteq V$. That covers all edges s.t. $|S|$ is minimum.

Integer programming problem, originally. NP-hard by transitivity
relaxation by reducing constraints
LP-rounding, 2-approximation
- randomized rounding: for other problem

MAX 3-SAT: 3 terms disjuction, in conjunction clause. Find assignments of variables that maximize the number of clauses satisfied in the conjuction

Another LP relaxation
Binary search
- zombie binary search
- treat weight as posibility
Linear programming
- set cover
MAXCUT: given $G(V,E), w: E \rightarrow R^+$
Find partition $V= A \cup B$ s.t. $\sum_{e \in E \cap A \times B} w_e$ is maximaized
$|x_i+x_j = 1| \rightarrow v_{ij} \leq x_i+x_j; v_{ij} \leq 2- (x_i+x_j)$
Regression when trying to relax -> randomized algorithm(relax by random toss)
integrality gap of 1/2
Nonlinear version, best to date(semi-definite program)
$y_i = \pm 1,\max \sum 1/2 - 1/2 y_i y_j$
$||y_i||^2 = 1, X_{ii} = 1$
picking any plane at random:
- $\cos^{-1} (<x,y>)/\pi \geq 0.878 S^*$
$max CX$
$A_kX \leq b_k$
$X\geq 0$
Semi-definite program can also be solved in polynomial time -> LP is a special case

Moving towards larger dataset

Compress: Original dataset P, have $S\subseteq P$ s.t. $|S| << |P|$ and $f(P) \approx f(S)$. Compute f takes $g(|P|)$ time $|P|+g(|S|)$ time

Furthest point

Find furthest neighbor to any point p, let the distance be r
hash point into grid G with cell size $c \epsilon r$
Find diameter pair in G

$O(n+\frac1{\epsilon^4}$ vs $n^2$

When $c= \frac1{\sqrt2}$, $\tilde{\Delta} \geq \Delta (1-\epsilon)$

Fixed Parameter Tractable

problem in this set might be NP-hard

The curve of dimensionality

Time / space / resources to solve a problem grows exponentially with dimension

Convex Hall: the smallest convex set that contains all the points, Points + boundaries: nonlinear increase with dimensionality

Claim: “interesting structures grows exponentially with d”

Dimensionality reduction

Principal Component Analysis
- $\max_u u^T C u, C = \sum x_i x_i^T$

Center the data
Compute $C =\sum x_i X_i^T$
Compute largest k eigenvalues: $R^d \rightarrow R^k$

Johnson - lindenstrauss lemma

\[\exists A: P\rightarrow R^d : ||P_i - P_j|| \leq (1+\epsilon) ||P_i - P_j||\] \[d = O(\frac{\log n}{\epsilon^2})\] \[||x||=1 , ||Ax|| \leq 1 + \epsilon, ||Ax|| \geq 1\]

What does the transformation looks like

A random d-dimensional subspace $A = \{a_{ij}\}, a_{ij} \in N(0,1)$, or sub-gaussian distribution even with discrete values

Game theory

Theory of games and economic behavior
Voting system
economics
biology
CS
tragedy of the commons
dominant equilibrium, every are happy to change, vickrey second-price auction
Nash equilibrium, no incentive to move assuming no one else moves.²
- every D.E is N.E
- every finite game has a nash equilibrium with “mixed” strategy
Machanism design
- voting, personal preference -> global ranking
- unanimity
- no dictator
- no strategic voting
- pref represented are rank ordering
- THM (ARROW): can’t do it
- THM (BTT): defeating the scheme is NP-hard
- vickery-clark-grove mechanism: second-price auction
bidding for ads
How long will it reach equilibrium
not NP-hard because all games have nash equilibrium
PPAD
The price of anarchy
Braess’s paradox

Algorithmic fairness

accountability & transparancy

What is fairness
- disparate impact
  - treatment
  - askewd outcome: ratio
- No post changing of rules
- group fairness
Data: Diversity, Distribution merging
Algorithm
Output: look at criteria carefully, interpretable machine learning
- class condition accuracy

Interactive proofs

back and forth between prover and verifier: if prover is all power, it is the same

\[NP=dIP, P \subseteq NP \subseteq IP = PSPACE \subseteq EXP\]

if verifier if randomized

Completeness : if x is in L, $1 \geq P(acc) \geq \frac23$
Soundness : if x not in L, $0 < P(acc) \leq \frac13$
zero knowledge
PSPACE : poly space computation, rewrite problem as a polynomial and verify polynomial
oncloud:
Client(V) sends data to Server(P)
P gives answer, V is streaming algorithm, how to verify?

Verifier protocals

give a stream return a poly verify at only one point
Nearest N
linear classifier
clustering
matrix
MST
connected component
TSP(approximate)
…

PCP - probablilistic checkable proves

\[MIP = NEXP\]

Cake cutting algorithm

How to divide a resource that can convince someone that it is fair(get at least 1/n)

I cut, You choose(disagreement are good, everyone might be convinced that it have more than 1/2.)

Moving knife(the number of decisions are infinite)

Cut, trim, get the piece that trimmed

Recursively divide cake between n-1 players, take on slice from each of the 1/n of the n-1 slice.

Envy-free cake cutting:

\[\forall i,j \mu_i(x_i) \geq \mu_i(x_j)\] \[\mu_i(x_i) \geq 1/n\]

A cuts, B pick two largest, make one same as the smallest. C pick one. Trimming is divided by B, C pick, B pick, A pick.

Recursive cut

Probably approximately correct learning ↩
google↩