SVM Regression in MATLAB: Learn Support Vector Machine Mo

Mathematical Formulation of SVM Regression

Overview

Support vector machine (SVM) analysis is a popular machine learning tool for classification and regression, first identified by Vladimir Vapnik and his colleagues in 1992[5]. SVM regression is considered a nonparametric technique because it relies on kernel functions.

Statistics and Machine Learning Toolbox™ implements linear epsilon-insensitive SVM (ε-SVM) regression, which is also known as L1 loss. In ε-SVM regression, the set of training data includes predictor variables and observed response values. The goal is to find a function f(x) that deviates from y_n by a value no greater than ε for each training point x, and at the same time is as flat as possible.

Linear SVM Regression: Primal Formula

Suppose we have a set of training data where x_n is a multivariate set of N observations with observed response values y_n.

To find the linear function

f(x)=x′β+b,

and ensure that it is as flat as possible, find f(x) with the minimal norm value (β′β). This is formulated as a convex optimization problem to minimize

J(β)=12β′β

subject to all residuals having a value less than ε; or, in equation form:

∀n:?yn−(xn′β+b)?≤ε .

It is possible that no such function f(x) exists to satisfy these constraints for all points. To deal with otherwise infeasible constraints, introduce slack variables ξ_n and ξ^*_n for each point. This approach is similar to the “soft margin” concept in SVM classification, because the slack variables allow regression errors to exist up to the value of ξ_n and ξ^*_n, yet still satisfy the required conditions.

Including slack variables leads to the objective function, also known as the primal formula[5]:

J(β)=12β′β+CN?n=1(ξn+ξ∗n) ,

subject to:

∀n:yn−(xn′β+b)≤ε+ξn∀n:(xn′β+b)−yn≤ε+ξ∗n∀n:ξ∗n≥0∀n:ξn≥0 .

The constant C is the box constraint, a positive numeric value that controls the penalty imposed on observations that lie outside the epsilon margin (ε) and helps to prevent overfitting (regularization). This value determines the trade-off between the flatness of f(x) and the amount up to which deviations larger than ε are tolerated.

The linear ε-insensitive loss function ignores errors that are within ε distance of the observed value by treating them as equal to zero. The loss is measured based on the distance between observed value y and the ε boundary. This is formally described by

Lε={0?y−f(x)?−εif ?y−f(x)?≤εotherwise

Linear SVM Regression: Dual Formula

The optimization problem previously described is computationally simpler to solve in its Lagrange dual formulation. The solution to the dual problem provides a lower bound to the solution of the primal (minimization) problem. The optimal values of the primal and dual problems need not be equal, and the difference is called the “duality gap.” But when the problem is convex and satisfies a constraint qualification condition, the value of the optimal solution to the primal problem is given by the solution of the dual problem.

To obtain the dual formula, construct a Lagrangian function from the primal function by introducing nonnegative multipliers α_n and α^*_n for each observation x_n. This leads to the dual formula, where we minimize

L(α)=12N?i=1N?j=1(αi−α∗i)(αj−α∗j)xi′xj+εN?i=1(αi+α∗i)+N?i=1yi(α∗i−αi)

subject to the constraints

N?n=1(αn−α∗n)=0∀n:0≤αn≤C∀n:0≤α∗n≤C .

The β parameter can be completely described as a linear combination of the training observations using the equation

β=N?n=1(αn−α∗n)xn .

The function used to predict new values depends only on the support vectors:

f(x)=N?n=1(αn−α∗n)(xn′x)+b .

(1)

The Karush-Kuhn-Tucker (KKT) complementarity conditions are optimization constraints required to obtain optimal solutions. For linear SVM regression, these conditions are

∀n:αn(ε+ξn−yn+xn′β+b)=0∀n:α∗n(ε+ξ∗n+yn−xn′β−b)=0∀n:ξn(C−αn)=0∀n:ξ∗n(C−α∗n)=0 .

These conditions indicate that all observations strictly inside the epsilon tube have Lagrange multipliers α_n = 0 and α_n^* = 0. If either α_n or α_n^* is not zero, then the corresponding observation is called a support vector.

The property Alpha of a trained SVM model stores the difference between two Lagrange multipliers of support vectors, α_n – α_n^*. The properties SupportVectors and Bias store x_n and b, respectively.

Nonlinear SVM Regression: Primal Formula

Some regression problems cannot adequately be described using a linear model. In such a case, the Lagrange dual formulation allows the previously-described technique to be extended to nonlinear functions.

Obtain a nonlinear SVM regression model by replacing the dot product x₁′x₂ with a nonlinear kernel function G(x₁,x₂) = <φ(x₁),φ(x₂)>, where φ(x) is a transformation that maps x to a high-dimensional space. Statistics and Machine Learning Toolbox provides the following built-in positive semidefinite kernel functions.

Kernel Name	Kernel Function
Linear (dot product)	G(xj,xk)=xj′xk
Gaussian	G(xj,xk)=exp(−?xj−xk?2)
Polynomial	G(xj,xk)=(1+xj′xk)q, where q is in the set {2,3,...}.

The Gram matrix is an

MATLAB & Simulink Help

Programming & Technical Help

Engineering & Specialized Tools

Writing & Exam Services

Data Analysis Services

Understanding Support Vector Machine Regression

Mathematical Formulation of SVM Regression

Overview

Linear SVM Regression: Primal Formula

Linear SVM Regression: Dual Formula

Nonlinear SVM Regression: Primal Formula