Bayesian optimization has emerged as an exciting sub-field of machine learning and artificial intelligence that is concerned with optimization using probabilistic methods. Systems implementing Bayesian optimization techniques have been successfully used to solve difficult problems in a diverse set of applications, including automatic tuning of machine learning algorithms, experimental designs, and many other systems. Several recent advances in the methodologies and theory underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Bayesian optimization is now increasingly being used in industrial settings, providing new and interesting challenges that require new algorithms and theoretical insights. Therefore, I think having a tutorial on Bayesian optimization for ACML audience is timely, useful, and practical for both academia and industries to know the recent advances on Bayesian optimization in a systematic manner. The topics of this tutorial consists of two main parts. In the first part, I will go into detail the Bayesian optimization in the standard and simple setting. In the second part, I will present the current advances in Bayesian optimization including (1) batch Bayesian optimization, (2) high dimensional Bayesian optimization and (3) mixed categorical-continuous optimization. In the end of the talk, I also outline the possible future research directions in Bayesian optimization.
Part I: Bayesian Optimization.
Bayesian optimization is a sequential model-based approach to solving global optimization problem of black-box functions. By black-box function, we will assume
the function f has no simple closed form, but can be evaluated at any arbitrary query point x in the domain. In particular,
the Bayesian optimization framework has two key ingredients. The first ingredient is a probabilistic surrogate model, which consists of a prior distribution that
captures our beliefs about the behavior of the unknown objective function and an observation model that describes the data generation mechanism. The second ingredient is a
loss function that describes how optimal a sequence of queries are; in practice, these loss functions often take the form of regret, either simple or cumulative. Ideally, the
expected loss is then minimized to select an optimal sequence of queries. After observing the output of each query of the objective, the prior is updated to produce a
more informative posterior distribution over the space of objective functions.
Part II.1: Recent Advances in Bayesian Optimization - Batch Bayesian Optimization.
Standard Bayesian optimization approaches only allow the exploration
of the parameter space to occur sequentially. Often, it is desirable to simultaneously
propose batches of parameter values
to explore. This is particularly the case when
large parallel processing facilities are available.
These could either be computational
or physical facets of the process being optimized.
Batch methods, however, require the
modeling of the interaction between the different
evaluations in the batch, which can be
expensive in complex scenarios. In this section, I will summarize the recent batch Bayesian optimization models. I will provide the strengths and weaknesses of each approach.
Part II.2: Recent Advances in Bayesian Optimization - High Dimensional Bayesian Optimization
Standard Bayesian optimization is limited to about 10 dimensions. Scaling BO methods to handle functions in high dimension presents two main challenges. Firstly, the number
of observations required by the GP grows exponentially as input dimensions increase. This implies more experimental
evaluations are required, often expensive and infeasible in real applications. Secondly, global optimization for high dimensional
acquisition functions is intrinsically a hard problem and can be prohibitively expensive to be feasible. I will discuss recent advances in Bayesian optimization techniques for high dimensional settings.
Part II.3: Recent Advances in Bayesian Optimization - Mixed Categorical-Continuous Bayesian optimization
Real-world optimization problems are typically of mixed-variable nature, involving both continuous and
categorical input variables. For example, tuning the hyperparameters of a deep neural network involves both continuous variables,
e.g., learning rate and momentum, and categorical ones, e.g., optimizer types, activation type. Having a mixture of categorical and continuous variables presents unique challenges. If some inputs
are categorical variables, as opposed to continuous, then the common assumption
that the BO acquisition function is differentiable and continuous over the input space, which allows the acquisition function to be efficiently optimized, is no longer valid.
I will discuss recent advances in Bayesian optimization techniques for mixed categorical-continuous settings.
Tutorial Outline and Motivation to Bayesian Optimization (5 min)
Part I. Bayesian Optimization (55 mins)
Part II.1. Recent Advances in Bayesian Optimization - Parallel Bayesian Optimization (15 min)
Part II.2. Recent Advances in Bayesian Optimization - High Dimensional Bayesian Optimization (15 min)
Part II.3. Recent Advances in Bayesian Optimization - Mixed Categorical-Continuous Optimization (20 min)
Future Research Directions and Q&A (20 mins)
Tutorial slides can be downloaded here: BO_Part_1 and BO_Part_2
We do not require the audiences to have strong background knowledge on Bayesian modelling. However, we expect the audience already understand some basic concepts and terminologies on artificial intelligence, data mining, and machine learning.
Dr Vu Nguyen is currently a Senior Research Associate at a Machine Learning Research Group at University of Oxford. He is working with Professor Michael Osborne and Professor Andrew Briggs on a machine learning project for tuning quantum devices using Bayesian optimization and deep reinforcement learning. Previously he was working as a Research Scientist at a Credit AI in Melbourne and was a postdoctoral researcher at Deakin University where he obtained his PhD in 2015. He was the recipient of ACML 2016 best paper award, IEEE ICDM 2017 best papers and one of the 200 young researchers world-wide for attending Heidelberg Laureate Forum 2015. He gains expertise on Bayesian Machine Learning, Bayesian Optimization and regularly publishes at ICML, NeurIPS, ICDM, IJCAI, AISTATS and ACML.