Person Reidentification Using Spatiotemporal Appearance
Introduction
- biometric [16] [22]
- assume : no change of cloth
- invariant sig :
- color histogram
- clothing color
- brightness transfer(compare after calibrated)
- local signature: spatial, some variation
- interest point op
- model fitting
- two aspects
- correspondences
- gen signature & match
Overview
two approaches: interest point+model fitting
variant appearance <= large number of responses (inc likelihood of true correspondences)
model based <= decomposable triangulated graph => model shape
- localize different body parts => facilitate
Information : color(histogram: hue, saturation & normalization), structure(spatial segmentation & description)
Signature Generation
Two parts:
- histogram of hue & saturation, HUE: \[H=\arccos \frac{\log (R) -\log (G)}{\log (R) +\log (G) -2\log (B)}\]
Trad: \[H=\arctan \frac{0.5[(R-G)+(R-B)]}{\sqrt{(R-G)(R-G)+(R-B)(G-B)}}\]
- structural qualities
- foreground interior edgels
- Distance: \(D(h_i,h_j)=1-2\frac{\sum^n_{k=1}min(h_i(k),h_j(k))}{\sum^n_{k=1}h_i(k)+h_j(k)}\)
Spatiotemporal Segmentation
- group pixels of the same type of fabric
- Overseg => \(G={V,E}\) => v rep continuous region
- Sobel op1 => foreground => gaussian filter
- watershed segmentation algor [21]
- edge => spatial (watershed), temporal (same region at time t+1):
- estimate of motion field
- \(F_{N,t}(x,y) = \sum^N_{k=0}H(I_t(x,y)-I_{t+k}(x,y))\)
- I => intensity at time subscript
- \(H(z) = \begin{cases} 1, & |z|<\delta \\ 0, & \mbox{otherwise} \end{cases}\)
- Overlapping region with highest freq integral
- Weight on edge: the cost of grouping two regions together
- \(w^{t,t}_{i,i'}=|M(i,t) - M(i',t)|\)
- \(w^{t,t+1}_{i,i'}=\frac{1}{3}|M(i,t) - M(i',t+1)|\)
- M => median intensity value
- search for a minimal spanning tree
Graph Partitioning
- ten consecutive frame
- Group by: distance less than internal variation
- Internal Variation: \(I(C) = max(w^{t,t'}_{i,i'}, s.t. e^{t,t'}_{i,i'}\in E^C)\)
- Inter-cluster distance: \(D(C_m,C_n)=min(w^{t,t'}_{i,i'}, s.t. v^t_i \in V^{C_m}, v^{t'}_{i'} \in V^{C_n}, e^{t,t'}_{i,i'}\in E)\)
- Merging: \(D(C_m,C_n)\leq MI(C_m,C_n)\) where \(MI(C_m,C_n)=min(I(C_m)+\kappa / |C_m|,I(C_n)+\kappa / |C_n|)\)
Foreground-Background Separation
- compute the maximum frquency image: \(MF_{N,t}(i,j)=max(F_{N,t}(i,j),F_{-N,t}(i,j))\)
- thresholding & morphological filtering
Interest-Point Matching
- Hessian Affine invariant interest-point operator2
- Compare using distance measure and thresholding
Dynamic-Programming Model Fitting
Model-based match respective body parts
- aleviate the problem of changing relative location
- use model to segment body parts
Engergy function(minimize):
\[E(g,I) = \sum_i E_i(g_i,I) = \sum_i E^{data}_i(g_i,I) + E^{shape}_i(g_i)\]
- Shape costs defined by polar decomp of its affine transformation
- affine matrix A, polar decomp:
- \(A=\begin{bmatrix} cos \psi & -sin\psi \\ sin \psi & cos \psi \end{bmatrix} \begin{bmatrix} s_x & x_h \\ s_h & s_y \end{bmatrix} = R(\psi ) S\)
- S => scale-shear matrix R closest possible rotation matrix to A(Frobenuis matrix norm)
- \(E^{shape} = log(\frac{\lambda_1}{\lambda_2})^2 + log(1+ s_h)^2\)
- \(\lambda_1,\lambda_2\) => eigenvalues
- 1st height-width ratio, 2nd shear
- data cost attracts model to salient image features
- Only for boundary edges
- less sensitive to spurious edges then to missed ones
- Canny edge detection, \(E^{edge} = \frac{1}{n} \sum^{n}_{i=1}D(x_i,y_i)\)
- Foreground mask:
- \(E^{fg} = 1 - |\frac{N^{fg}_1}{N_1} - \frac{N^{fg}_2}{N_2}|\)
- total number of pixels and total number of pixels on one side of the window
- dynamic-programming: 0-1 backpack
- computed and compared using histogram eq