首页 > 其他分享> > [paper reading][Proceedings of the IEEE 2016] Taking the Human Out of the Loop: A Review of Bayesian

[paper reading][Proceedings of the IEEE 2016] Taking the Human Out of the Loop: A Review of Bayesian

2021-11-05 12:31:06 作者：互联网

1 Introduction

design, choice, high-dim, hyperparam
- IBM ILOG CPLEX
\(x^* = argmax_{x\in \mathcal X}f(x)\)
- compact subset of \(\mathbb R^d\), or ...
- stochastic output \(\mathbb E[y|f(x)]=f(x)\)
- unbiased noisy point-wise observations
data efficient, evaluations are costly
prior, refine
best choice? acquisition function \(\alpha_n: \mathcal X\to \mathbb R\)
- mean, confidence interval
myopic heuristics
- uncertainty is large (exploration), or prediction is high (exploitation)
- acquisition function: easy to find the optimum, analytic?

parametrized by \(w\)
\(\mathcal D\): data
bayesian: \(p(w|D)=p(D|w)p(w)/p(D)\)
- beliefs about \(w\) after observing data \(D\)
- \(p(D)\) intractable, but in fact a normalizing constant
prior: conjucacy, analytically
\(K\) drugs, independent
- to optimize \(f\), on \(K\) indices, fully parametrized
- beta, conjugacy
TS, simplest strategy, posterior prob of optimality, estimated, MC
- \(a_{n+1}=argmax_a f_{\bar w}(a)\)
- no more param other than the prior
linear model, feature, vector, \(f_w(a)=x_a^T w\)
\(X\): input vectors, \(y\): outputs
nonlinear basis functions
- radial
- Fourier
- learned from data
- feature map, regardless, weights can be computed analytically

start, observation variance \(\sigma^2\), zero-mean Gaussian prior \(V_0\), preserve Gaussianity
basis functions, linear regression, symmetric positive-semidefinite, kernel
- intuitive similarity between pairs of points, rather than a feature map \(\Phi\)
- tractable, linear algebra, unnecessary to explicitly define \(\Phi\)
GP, nonparametric model, prior mean, covariance
\(f|X\sim \mathcal N(m,K)\)
\(y|f, \sigma^2\sim \mathcal N(f,\sigma^2 I)\)
posterior: use \(x\) and previous data (not "abstracted by parameters")
kernel, structure, periodic, stationary
- Matern, diagonal, paramtrized
kernel, smoothness and amplitude
prior, possible offset, constant, expert knowledge

标签：kernel,Taking,feature,prior,Optimization,mathcal,reading,data,mean
来源： https://www.cnblogs.com/minor-second/p/15512655.html