Gaussian Process Regression Models

Gaussian process regression (GPR) models are nonparametric kernel-based probabilistic models. You can train a GPR model using the fitrgp function.

Consider the training set {(xi,yi);i=1,2,...,n}, where xi?d and yi?, drawn from an unknown distribution. A GPR model addresses the question of predicting the value of a response variable ynew, given the new input vector xnew, and the training data. A linear regression model is of the form

y=xTβ+ε,

where εN(0,σ2). The error variance σ2 and the coefficients β are estimated from the data. A GPR model explains the response by introducing latent variables, f(xi),i=1,2,...,n, from a Gaussian process (GP), and explicit basis functions, h. The covariance function of the latent variables captures the smoothness of the response and basis functions project the inputs x into a p-dimensional feature space.

A GP is a set of random variables, such that any finite number of them have a joint Gaussian distribution. If {f(x),x?d} is a GP, then given n observations x1,x2,...,xn, the joint distribution of the random variables f(x1),f(x2),...,f(xn) is Gaussian. A GP is defined by its mean function m(x) and covariance function, k(x,x). That is, if {f(x),x?d} is a Gaussian process, then E(f(x))=m(x) and Cov[f(x),f(x)]=E[{f(x)m(x)}{f(x)m(x)}]=k(x,x).

Now consider the following model.

h(x)Tβ+f(x),

where f(x)~GP(0,k(x,x)), that is f(x) are from a zero mean GP with covariance function, k(x,x). h(x) are a set of basis functions that transform the original feature vector x in Rd into a new feature vector h(x) in Rp. β is a p-by-1 vector of basis function coefficients. This model represents a GPR model. An instance of response y can be modeled as

P(yi?f(xi),xi) ~N(yi?h(xi)Tβ+f(xi),σ2)

Hence, a GPR model is a probabilistic model. There is a latent variable f(xi) introduced for each observation xi, which makes the GPR model nonparametric. In vector form, this model is equivalent to

P(y?f,X)~N(y?Hβ+f,σ2I),

where

X=???????xT1xT2?xTn???????,y=???????y1y2?yn???????,H=???????h(xT1)h(xT2)?h(xTn)???????,f=???????f(x1)f(x2)?f(xn)???????.

The joint distribution of latent variables f(x1),f(x2),...,f(xn) in the GPR model is as follows:

P(f?X)~N(f?0,K(X,X)),

close to a linear regression model, where K(X,X) looks as follows: