Person Reidentification Using Spatiotemporal Appearance

Introduction
- Overview
Signature Generation
Spatiotemporal Segmentation
- Graph Partitioning
- Foreground-Background Separation
Interest-Point Matching
Dynamic-Programming Model Fitting

Introduction

biometric [16] [22]
assume : no change of cloth
invariant sig :
- color histogram
- clothing color
- brightness transfer(compare after calibrated)
local signature: spatial, some variation
- interest point op
- model fitting
two aspects
- correspondences
- gen signature & match

Overview

two approaches: interest point+model fitting

variant appearance <= large number of responses (inc likelihood of true correspondences)

model based <= decomposable triangulated graph => model shape

localize different body parts => facilitate

Information : color(histogram: hue, saturation & normalization), structure(spatial segmentation & description)

Signature Generation

Two parts:

histogram of hue & saturation, HUE: \[H=\arccos \frac{\log (R) -\log (G)}{\log (R) +\log (G) -2\log (B)}\]

Trad: \[H=\arctan \frac{0.5[(R-G)+(R-B)]}{\sqrt{(R-G)(R-G)+(R-B)(G-B)}}\]

structural qualities
- foreground interior edgels
- Distance: \(D(h_i,h_j)=1-2\frac{\sum^n_{k=1}min(h_i(k),h_j(k))}{\sum^n_{k=1}h_i(k)+h_j(k)}\)

Spatiotemporal Segmentation

group pixels of the same type of fabric
Overseg => \(G={V,E}\) => v rep continuous region
- Sobel op¹ => foreground => gaussian filter
- watershed segmentation algor [21]
- edge => spatial (watershed), temporal (same region at time t+1):
  - estimate of motion field
  - \(F_{N,t}(x,y) = \sum^N_{k=0}H(I_t(x,y)-I_{t+k}(x,y))\)
  - I => intensity at time subscript
  - \(H(z) = \begin{cases} 1, & |z|<\delta \\ 0, & \mbox{otherwise} \end{cases}\)
  - Overlapping region with highest freq integral
  - Weight on edge: the cost of grouping two regions together
    - \(w^{t,t}_{i,i'}=|M(i,t) - M(i',t)|\)
    - \(w^{t,t+1}_{i,i'}=\frac{1}{3}|M(i,t) - M(i',t+1)|\)
    - M => median intensity value
search for a minimal spanning tree

Graph Partitioning

ten consecutive frame
Group by: distance less than internal variation
- Internal Variation: \(I(C) = max(w^{t,t'}_{i,i'}, s.t. e^{t,t'}_{i,i'}\in E^C)\)
- Inter-cluster distance: \(D(C_m,C_n)=min(w^{t,t'}_{i,i'}, s.t. v^t_i \in V^{C_m}, v^{t'}_{i'} \in V^{C_n}, e^{t,t'}_{i,i'}\in E)\)
- Merging: \(D(C_m,C_n)\leq MI(C_m,C_n)\) where \(MI(C_m,C_n)=min(I(C_m)+\kappa / |C_m|,I(C_n)+\kappa / |C_n|)\)

Foreground-Background Separation

compute the maximum frquency image: \(MF_{N,t}(i,j)=max(F_{N,t}(i,j),F_{-N,t}(i,j))\)
thresholding & morphological filtering

Interest-Point Matching

Hessian Affine invariant interest-point operator²
Compare using distance measure and thresholding

Dynamic-Programming Model Fitting

Model-based match respective body parts

aleviate the problem of changing relative location
use model to segment body parts

Engergy function(minimize):

\[E(g,I) = \sum_i E_i(g_i,I) = \sum_i E^{data}_i(g_i,I) + E^{shape}_i(g_i)\]

Shape costs defined by polar decomp of its affine transformation
- affine matrix A, polar decomp:
- \(A=\begin{bmatrix} cos \psi & -sin\psi \\ sin \psi & cos \psi \end{bmatrix} \begin{bmatrix} s_x & x_h \\ s_h & s_y \end{bmatrix} = R(\psi ) S\)
- S => scale-shear matrix R closest possible rotation matrix to A(Frobenuis matrix norm)
- \(E^{shape} = log(\frac{\lambda_1}{\lambda_2})^2 + log(1+ s_h)^2\)
- \(\lambda_1,\lambda_2\) => eigenvalues
- 1st height-width ratio, 2nd shear
data cost attracts model to salient image features
- Only for boundary edges
- less sensitive to spurious edges then to missed ones
- Canny edge detection, \(E^{edge} = \frac{1}{n} \sum^{n}_{i=1}D(x_i,y_i)\)
- Foreground mask:
- \(E^{fg} = 1 - |\frac{N^{fg}_1}{N_1} - \frac{N^{fg}_2}{N_2}|\)
- total number of pixels and total number of pixels on one side of the window
dynamic-programming: 0-1 backpack
computed and compared using histogram eq

Sobel op:\[G_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \end{bmatrix} * A,G_x = \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \end{bmatrix} * A\]↩
Wikipedia ↩