Efficient Algorithms for Ranking with SVMs

Introduction

Methods:

  • Pointwise (give score close to its relevance)
  • Pairwise (Compute score for each pair, documents and its relevance)
  • Listwise (training loss function is based on the list of all participating documents and their scores)

Pairwise:

  • RankBoost
  • RankNet
  • LambdaRank
  • GBRank
  • RankSVM

Main differences

  • Type of loss function
  • Training methods

RankSVM

  • Minimizing objective function: \(\frac12||w||^2 + C\sum_{(i,j)\in P} l(w^T x_i - w^T x_j)\)
    • l is a loss function such as: \(l(t)=max(0,1-t^2)\)
  • SVMLight form all diff & solve using dual method1
  • Limit iterations lowers time but imcomplete training gives poor results

This paper:

  • RankSVM + primal newton method2
  • Two ways:
    • various computation of the Newton method \(O(nd+p)\)
    • Deals with P arises dur to small number of relevance levels \(O(nd+n\log n)\)

Newton optimization for linear SVM

Objective function, \(y_i \in \{-1,1\}\): \[\min_{w,b} \frac12 ||w||^2 C \sum^{n}_{i=1}max(0,1-y_i (w^T x_i +b))^2\]

no need to solve in dual,but can use:

  • Non-linear conjugate gradient
    • gradient of the objective function & line search procedure (secant method/cubic polynomial interpolation)
    • O(n) if Xs is precomputed, X data matrix, s search direction
  • Truncated Newton
    • \(w <- w - H^{-1}g\), H Hessian, g gradient of the objective function
    • truncated newton => Hessian is not computed directly, linear system is solved by linear conjugate gradient

Roughly equivalent(Chapelle, 2007a), we use trucated newton

  • Support vectors \(sv = {i,y_i(w^T x_i+b -y_i)x_i<1}\)
  • Gradient \(g:=w+2C\sum_{i \in sv}(w^T x_i +b -y_i)x_i\)
  • Hessian \(H:=I+2C\sum_{i \in sv}x_i x^T_i\)
  • Newton step, can be computed with linear conjugate gradient: \(H^{-1}g\), For some vector s: \(Hs=s+2CX^TD(Xs)\)
    • D => diagonal matix \(D_{ii}=1, \mbox{if} i \in sv, \mbox{0 otherwise}\)

First implementation3

  • Linear SVM can be used for RankSVM
  • \(x_i <- x_i - x_j, X <- AX, \mbox{where} A_{ki} \in \{-1,1\}\)

Better Algorithm4

  • \(g=w+Cg_L\)
  • \(Hs=s+CH_L s\)

Two relevance levels

  • \(L=\sum_{i \in A, j \in B} \ell (y_i-y_j)\)
  • A high relevance, B low relevance
  • let \(\tilde{y_i} = \begin{cases} y_i - \frac12, & i \in A \\ y_i + \frac12, & i \in B \end{cases}\)
  • => \(\ell (y_i-y_j) => \max (0,\tilde{y_j} - \tilde{y_i})\)

  1. what is it?

  2. Keerthi and DeCoste, 2005; Chapelle, 2007b

  3. Matlab Code

  4. Known as PRSVM+