Description

Bayesian optimization has emerged as an exciting sub-field of machine learning and artificial intelligence that is concerned with optimization using probabilistic methods. Systems implementing Bayesian optimization techniques have been successfully used to solve difficult problems in a diverse set of applications, including automatic tuning of machine learning algorithms, experimental designs, and many other systems. Several recent advances in the methodologies and theory underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Bayesian optimization is now increasingly being used in industrial settings, providing new and interesting challenges that require new algorithms and theoretical insights. Therefore, I think having a tutorial on Bayesian optimization for ACML audience is timely, useful, and practical for both academia and industries to know the recent advances on Bayesian optimization in a systematic manner. The topics of this tutorial consists of two main parts. In the first part, I will go into detail the Bayesian optimization in the standard and simple setting. In the second part, I will present the current advances in Bayesian optimization including (1) batch Bayesian optimization, (2) high dimensional Bayesian optimization and (3) mixed categorical-continuous optimization. In the end of the talk, I also outline the possible future research directions in Bayesian optimization.

Part I: Bayesian Optimization.
Bayesian optimization is a sequential model-based approach to solving global optimization problem of black-box functions. By black-box function, we will assume the function f has no simple closed form, but can be evaluated at any arbitrary query point x in the domain. In particular, the Bayesian optimization framework has two key ingredients. The first ingredient is a probabilistic surrogate model, which consists of a prior distribution that captures our beliefs about the behavior of the unknown objective function and an observation model that describes the data generation mechanism. The second ingredient is a loss function that describes how optimal a sequence of queries are; in practice, these loss functions often take the form of regret, either simple or cumulative. Ideally, the expected loss is then minimized to select an optimal sequence of queries. After observing the output of each query of the objective, the prior is updated to produce a more informative posterior distribution over the space of objective functions.

Part II.1: Recent Advances in Bayesian Optimization - Batch Bayesian Optimization.
Standard Bayesian optimization approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These could either be computational or physical facets of the process being optimized. Batch methods, however, require the modeling of the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. In this section, I will summarize the recent batch Bayesian optimization models. I will provide the strengths and weaknesses of each approach.

Part II.2: Recent Advances in Bayesian Optimization - High Dimensional Bayesian Optimization
Standard Bayesian optimization is limited to about 10 dimensions. Scaling BO methods to handle functions in high dimension presents two main challenges. Firstly, the number of observations required by the GP grows exponentially as input dimensions increase. This implies more experimental evaluations are required, often expensive and infeasible in real applications. Secondly, global optimization for high dimensional acquisition functions is intrinsically a hard problem and can be prohibitively expensive to be feasible. I will discuss recent advances in Bayesian optimization techniques for high dimensional settings.

Part II.3: Recent Advances in Bayesian Optimization - Mixed Categorical-Continuous Bayesian optimization
Real-world optimization problems are typically of mixed-variable nature, involving both continuous and categorical input variables. For example, tuning the hyperparameters of a deep neural network involves both continuous variables, e.g., learning rate and momentum, and categorical ones, e.g., optimizer types, activation type. Having a mixture of categorical and continuous variables presents unique challenges. If some inputs are categorical variables, as opposed to continuous, then the common assumption that the BO acquisition function is differentiable and continuous over the input space, which allows the acquisition function to be efficiently optimized, is no longer valid. I will discuss recent advances in Bayesian optimization techniques for mixed categorical-continuous settings.

Tutorial Outline

Tutorial Outline and Motivation to Bayesian Optimization (5 min)

  • Machine Learning Hyper-parameter Tuning and Experimental Design as Black-box Function [3,18]
  • Bayesian Optimization for Optimizing Black-box Functions [23]

Part I. Bayesian Optimization (55 mins)

  • Bayesian Optimization [1,18,30,31]
  • Gaussian Processes [11,17]
  • Acquisition Function [5,7,15,19,20,25]
  • Applications of Bayesian Optimization [4,6,8,28,29]
  • Using Bayesian Optimization with MiniBO package
  • Question and Answer

Part II.1. Recent Advances in Bayesian Optimization - Parallel Bayesian Optimization (15 min)

  • Introduction and Problem Statements
  • Existing Approaches in Batch Bayesian Optimization
  • Peak Suppression Approaches: Constant Liar, GP-BUCB, Local Penalization [22,23,24,25]
  • Budgeted Batch Bayesian Optimization for Unknown Batch Size [11]
  • Thompson Sampling for Batch Bayesian Optimization [27,31,32]
  • Asynchronous Batch Bayesian Optimization [33]

Part II.2. Recent Advances in Bayesian Optimization - High Dimensional Bayesian Optimization (15 min)

  • Introduction and Problem Statements
  • Existing Approaches in High Dimensional Bayesian Optimization [10,13]
  • High Dimensional Bayesian Optimization using Low Dimensional Embedding [36]
  • High Dimensional Bayesian Optimization using Additive Structure [35]
  • High Dimensional Bayesian Optimization using Local Optimization [34]

Part II.3. Recent Advances in Bayesian Optimization - Mixed Categorical-Continuous Optimization (20 min)

  • Introduction and Problem Statements
  • Existing Approaches in Mixed Categorical-Continuous Optimization[38]
  • Multi-armed Bandits [37]
  • Categorical-specific Continuous Optimization [14]
  • Categorical Continuous Optimization [2]

Future Research Directions and Q&A (20 mins)

  • Future Research Directions [3],[4]
  • Question and Answer

Tutorial Slides and Video

Tutorial slides can be downloaded: BO_Part_1 and BO_Part_2

The video and discussion forum are available at: videolectures and ACML2020

MiniBO Package

The MiniBO package can be downloaded: Here or from Github

Prerequisites

We do not require the audiences to have strong background knowledge on Bayesian modelling. However, we expect the audience already understand some basic concepts and terminologies on artificial intelligence, data mining, and machine learning.

Presenters

Dr Vu Nguyen is currently a Senior Research Associate at a Machine Learning Research Group at University of Oxford. He is working with Professor Michael Osborne and Professor Andrew Briggs on a machine learning project for tuning quantum devices using Bayesian optimization and deep reinforcement learning. Previously he was working as a Research Scientist at a Credit AI in Melbourne and was a postdoctoral researcher at Deakin University where he obtained his PhD in 2015. He was the recipient of ACML 2016 best paper award, IEEE ICDM 2017 best papers and one of the 200 young researchers world-wide for attending Heidelberg Laureate Forum 2015. He gains expertise on Bayesian Machine Learning, Bayesian Optimization and regularly publishes at ICML, NeurIPS, ICDM, IJCAI, AISTATS and ACML.

References

  1. V. Nguyen, M. A. Osborne. Knowing The What But Not The Where in Bayesian Optimization. International Conference on Machine Learning (ICML), 2020.
  2. B. Ru, AS. Alvi, V. Nguyen, M. A. Osborne, SJ. Roberts. Bayesian Optimisation over Multiple Continuous and Categorical Inputs. International Conference on Machine Learning (ICML), 2020.
  3. J. Parker-Holder, V. Nguyen, SJ. Roberts. Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  4. V. Nguyen, S. Schulze, M. A. Osborne. Bayesian Optimization for Iterative Learning. Advances in Neural Information Processing Systems (NeurIPS) 2020.
  5. V. Nguyen, S. Gupta, S. Rana, M. Thai, C. Li, S. Venkatesh. Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations. Proceedings of the IEEE International Conference on Data Mining (ICDM), pp. 1270-1275, 2019.
  6. S. Gopakumar, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh. Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation. Advances in Neural Information Processing Systems (NeurIPS), pp. 5470-5478, 2018
  7. J. Berk, V. Nguyen, S. Gupta, S. Rana, S. Venkatesh. Exploration Enhanced Expected Improvement for Bayesian Optimization. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML), pp. 621-637, 2018.
  8. V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh. Regret for Expected Improvement over the Best-Observed Value and Stopping Condition. Asian Conference on Machine Learning (ACML), 2017.
  9. V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh. Bayesian Optimization in Weakly Specified Search Space. IEEE International Conference on Data Mining (ICDM), 2017.
  10. S. Rana, C. Li, S. Gupta, V. Nguyen, S. Venkatesh. High Dimensional Bayesian Optimization with Elastic Gaussian Process. International Conference on Machine Learning (ICML), pp 2883-2891, 2017.
  11. V. Nguyen, S. Rana, S. K. Gupta, C. Li, S. Venkatesh. Budgeted Batch Bayesian Optimization. IEEE International Conference on Data Mining (ICDM), pp 1107-1112, 2016.
  12. T. Le, K. Nguyen, V. Nguyen, T. D. Nguyen, D. Phung. GoGP: Fast Online Regression with Gaussian Processes. IEEE International Conference on Data Mining (ICDM), 2017.
  13. C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, A. Shilton. High Dimensional Bayesian Optimization Using Dropout. International Joint Conference on Artificial Intelligence (IJCAI), pp 2096-2102, 2017.
  14. S. Gopakumar, S. Gupta, S. Rana, V. Nguyen, & S. Venkatesh. (2018). Algorithmic assurance: An active approach to algorithmic testing using bayesian optimisation. In Advances in Neural Information Processing Systems (NeurIPS) 2018.
  15. Srinivas, N., Krause, A., Kakade, S., & Seeger, M. Gaussian process optimization in the bandit setting: No regret and experimental design. International Conference on Machine Learning (ICML), pp. 1015–1022, 2010.
  16. Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & de Freitas, N. Taking the human out of the loop: A review of bayesian optimization. Proceedings of the IEEE, 104(1), 148–175, 2016.
  17. Rasmussen, C. E. Gaussian processes for machine learning, 2006.
  18. Mockus, J., Tiesis, V., & Zilinskas, A. The application of bayesian methods for seeking the extremum. Towards global optimization, 2(117-129), 2, 1978.
  19. Hernandez-Lobato, J. M., Hoffman, M. W., & Ghahramani, Z. Predictive entropy search for efficient global optimization of black-box functions. Advances in Neural Information Processing Systems (NIPS), pp. 918–926, 2014.
  20. Hennig, P., & Schuler, C. J. Entropy search for information-efficient global optimization. Journal of Machine Learning Research (JMLR), 13, 1809–1837, 2012.
  21. Gonzalez, J., Dai, Z., Hennig, P., & Lawrence, N. D. Batch bayesian optimization via local penalization. International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 648–657, 2016.
  22. Ginsbourger, D., Le Riche, R., & Carraro, L. A multi-points criterion for deterministic parallel global optimization based on gaussian processes, 2008.
  23. Desautels, T., Krause, A., & Burdick, J. W. Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization. The Journal of Machine Learning Research (JMLR), 15(1), 3873–3923, 2014.
  24. Contal, E., Buffoni, D., Robicquet, A., & Vayatis, N. Parallel gaussian process optimization with upper confidence bound and pure exploration. In Machine Learning and Knowledge Discovery in Databases, pp. 225–240. Springer, 2013.
  25. Jones, D. R., Perttunen, C. D., & Stuckman, B. E. Lipschitzian optimization without the lipschitz constant. Journal of Optimization Theory and Applications, 79(1), 157–181, 1993.
  26. Shahriari, B., Bouchard-Côté, A., & Freitas, N. Unbounded Bayesian optimization via regularization. International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1168-1176, 2016.
  27. Rahimi, A. and Recht, B., Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems (NIPS), pp. 1177-1184, 2008.
  28. Li, C., de Celis Leal, D.R., Rana, S., Gupta, S., Sutti, A., Greenhill, S., Slezak, T., Height, M. and Venkatesh, S.. Rapid Bayesian optimisation for synthesis of short polymer fiber materials. Scientific reports, 7(1), p.5683, 2017.
  29. Snoek, J., Larochelle, H., & Adams, R. P. Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems (NIPS), pp. 2951-2959, 2012.
  30. Brochu, E., Cora, V. M., & De Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv:1012.2599, 2010.
  31. Hernández-Lobato, J. M., Requeima, J., Pyzer-Knapp, E. O., & Aspuru-Guzik, A. Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space. International Conference on Machine Learning (ICML) 2017.
  32. Kandasamy, K., Krishnamurthy, A., Schneider, J., & Póczos, B,. Parallelised Bayesian Optimisation via Thompson Sampling. International Conference on Artificial Intelligence and Statistics (AISTATS) 2018.
  33. Alvi, A., Ru, B., Calliess, J. P., Roberts, S., & Osborne, M. A. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation. In International Conference on Machine Learning (ICML) 2019.
  34. Eriksson, D., Pearce, M., Gardner, J., Turner, R. D., & Poloczek, M. (2019). Scalable global optimization via local bayesian optimization. In Advances in Neural Information Processing Systems (NeurIPS) 2019.
  35. Kandasamy, K., Schneider, J., & Póczos, B. High dimensional Bayesian optimisation and bandits via additive models. In International conference on machine learning (ICML) 2015.
  36. Wang, Z., Zoghi, M., Hutter, F., Matheson, D. and De Freitas, N. Bayesian optimization in high dimensions via random embeddings. In International Joint Conference on Artificial Intelligence (IJCAI) 2013.
  37. Auer P, Cesa-Bianchi N, Freund Y. & Schapire, RE. The non-stochastic multi-armed bandit problem. SIAM Journal of Computing.32, 2002
  38. Daxberger, E., Makarova, A., Turchetta, M., & Krause, A. Mixed-Variable Bayesian Optimization. In International Joint Conference on Artificial Intelligence (IJCAI), 2020