R. Cohen Kadosh, T. Reed, V. Nguyen, and N. van Bueren Method for obtaining personalized parameters for transcranial stimulation, transcranial system, method of applying transcranial stimulation. UK Patent Application Number 2000874.4
Method for obtaining personalized parameters for transcranial stimulation, transcranial system, method of applying transcranial stimulation
A. Soen, H. Husain, P. Schulz, V. Nguyen Rejection via Learning Density Ratios Advances in Neural Information Processing Systems (NeurIPS), 2024. Abstract
Rejection via Learning Density Ratios
Classification with rejection emerges as a learning paradigm which allows models to
abstain from making predictions. The predominant approach is to alter the supervised
learning pipeline by augmenting typical loss functions, letting model rejection incur a lower
loss than an incorrect prediction. Instead, we propose a different distributional perspective,
where we seek to find an idealized data distribution which maximizes a pretrained model’s
performance. This can be formalized via the optimization of a loss’s risk with a φ-divergence
regularization term. Through this idealized distribution, a rejection decision can be made
by utilizing the density ratio between this distribution and the data distribution. We focus
on the setting where our φ-divergences are specified by the family of α-divergence. Our
framework is tested empirically over clean and noisy datasets
Y. Liu, T. Ajanthan, H. Husain, V. Nguyen Self-Supervision Improves Diffusion Models For Tabular Data Imputation ACM International Conference on Information and Knowledge Management (CIKM), 2024.
Preliminary version appears at Workshop on AI4DifferentialEquations in Science, (ICLR), 2024. Abstract
Self-Supervision Improves Diffusion Models For Tabular Data Imputation
Incomplete tabular datasets are ubiquitous in many applications
for a number of reasons such as human error in data collection or
privacy considerations. One would expect a natural solution for this
is to utilize powerful generative models such as diffusion models,
which have demonstrated great potential across image and continuous domains. However, vanilla diffusion models often exhibit
sensitivity to initialized noises. This, along with the natural sparsity inherent in the tabular domain, poses challenges for diffusion
models, thereby impacting the robustness of these models for data
imputation. In this work, we propose an advanced diffusion model
named Self-supervised imputation Diffusion Model (SimpDM for
brevity), specifically tailored for tabular data imputation tasks. To
mitigate sensitivity to noise, we introduce a self-supervised alignment mechanism that aims to regularize the model, ensuring consistent and stable imputation predictions. Furthermore, we introduce
a carefully devised state-dependent data augmentation strategy
within SimpDM, enhancing the robustness of the diffusion model
when dealing with limited data. Extensive experiments demonstrate
that SimpDM matches or outperforms state-of-the-art imputation
methods across various scenarios
M. Adachi, S. Hayakawa, M. Jørgensen, X. Wan, V. Nguyen, H. Oberhauser, M.A. Osborne Adaptive Batch Sizes in Active Learning: A Probabilistic Numerics Approach International Conference on Artificial Intelligence and Statistics (AISTATS), 2024. Abstract
Adaptive Batch Sizes in Active Learning: A Probabilistic Numerics Approach
Active learning parallelization is widely used, but typically relies on fixing the batch size throughout experimentation. This fixed approach is inefficient because of a dynamic trade-off between cost and speed---larger batches are more costly, smaller batches lead to slower wall-clock run-times---and the trade-off may change over the run (larger batches are often preferable earlier). To address this trade-off, we propose a novel Probabilistic Numerics framework that adaptively changes batch sizes.
We define batch construction as a \emph{quantization} task---a task of approximating a continuous target distribution (e.g. acquisition function), with a discrete distribution (batch samples). We measure the batch quality through the divergence between these two distributions. Instead of a set batch size, we fix the \emph{precision} of approximation, allowing dynamic adjustment of batch size and query locations. We also extend this to scenarios with probabilistic constraints, interpreting constraint violations as reductions in the precision requirement, to subsequently adapt batch construction.
Through extensive experiments, we demonstrate that our approach significantly enhances learning efficiency and flexibility in diverse Bayesian batch active learning and optimization applications.
H. Phan, B. Kim, V. Nguyen, A. Bydlon, Q. Tang, C.-C. Kao, C. Wang Cross-triggering Issue in Audio Event Detection and Mitigation IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024. Abstract
Cross-triggering Issue in Audio Event Detection and Mitigation
Cross-triggering is a critical problem for applications of audio event detection (AED), particularly in low-resource settings. However, not much attention (if not none) has been paid to this problem in the AED research community. In this work, we tackle this problem via a regularization approach. We propose a regularizer, namely mutual exclusivity regularizer, that is able to enforce pairwise exclusivity between two event classes when they do not co-occur. When the regularizer is added to the loss function for network training, in effect, the increase in the score of one event class will result in the decrease of the other and vice versa. To quantify the effectiveness of the proposed regularizer, we developed an AED system based on convolutional neural network (CNN) for the detection of hand clap and door knock, two transient audio events that share similar spectro-temporal profiles, and conducted experiments on a large-scale real-world dataset (around 274.2 hours). The experimental results show that the proposed approach is able to largely mitigate the cross-triggering issue in various experimental settings. Further-more, the reduction in cross-triggering, as a result, leads to improvement in the detection performance.
L. Ngo, H. Ha, J. Chan, V. Nguyen, H. Zhang High-dimensional bayesian optimization via covariance matrix adaptation strategy Transactions on Machine Learning Research (TMLR), 2024. Abstract
High-dimensional bayesian optimization via covariance matrix adaptation strategy
Bayesian Optimization (BO) is an effective method for finding the global optimum of expensive black-box functions. However, it is well known that applying BO to high-dimensional optimization problems is challenging. To address this issue, a promising solution is to use a local search strategy that partitions the search domain into local regions with high likelihood of containing the global optimum, and then use BO to optimize the objective function within these regions. In this paper, we propose a novel technique for defining the local regions using the Covariance Matrix Adaptation (CMA) strategy. Specifically, we use CMA to learn a search distribution that can estimate the probabilities of data points being the global optimum of the objective function. Based on this search distribution, we then define the local regions consisting of data points with high probabilities of being the global optimum. Our approach serves as a meta-algorithm as it can incorporate existing black-box BO optimizers, such as BO, TuRBO (Eriksson et al., 2019), and BAxUS (Papenmeier et al., 2022), to find the global optimum of the objective function within our derived local regions. We evaluate our proposed method on various benchmark synthetic and real-world problems. The results demonstrate that our method outperforms existing state-of-the-art techniques.
Distributionally Robust Bayesian Optimization with $\varphi$-divergences
The study of robustness has received much attention due to its inevitability in
data-driven settings where many systems face uncertainty. One such example of
concern is Bayesian Optimization (BO), where uncertainty is multi-faceted, yet
there only exists a limited number of works dedicated to this direction. In particular,
there is the work of Kirschner et al. [23], which bridges the existing literature of
Distributionally Robust Optimization (DRO) by casting the BO problem from the
lens of DRO. While this work is pioneering, it admittedly suffers from various
practical shortcomings such as finite contexts assumptions, leaving behind the
main question Can one devise a computationally tractable algorithm for solving
this DRO-BO problem? In this work, we tackle this question to a large degree of
generality by considering robustness against data-shift in ϕ-divergences, which
subsumes many popular choices, such as the χ2-divergence, Total Variation, and
the extant Kullback-Leibler (KL) divergence. We show that the DRO-BO problem
in this setting is equivalent to a finite-dimensional optimization problem which,
even in the continuous context setting, can be easily implemented with provable
sublinear regret bounds. We then show experimentally that our method surpasses
existing methods, attesting to the theoretical results.
Y. Zuo, V. Nguyen, A. Dezfouli, D. Alexander, B. Muir, I. Chades Mixed-Variable Black-Box Optimisation Using Value Proposal Trees Annual AAAI Conference on Artificial Intelligence (AAAI), 2023. Abstract
Mixed-Variable Black-Box Optimisation Using Value Proposal Trees
Many real-world optimisation problems are defined over both categorical and continuous variables, yet efficient optimisation methods such as Bayesian Optimisation (BO) are ill-equipped to handle such mixed-variable search spaces.
The optimisation breadth introduced by categorical variables in the mixed-input setting has seen recent approaches operating on local trust regions, but these methods can be greedy in suboptimal regions of the search space.
In this paper, we adopt a holistic view and aim to consolidate optimisation of the categorical and continuous sub-spaces under a single acquisition metric. We develop a tree-based method which retains a global view of the optimisation spaces by identifying regions in the search space with high potential candidates which we call value proposals.
Our method uses these proposals to make selections on both the categorical and continuous components of the input.
We show that this approach significantly outperforms existing mixed-variable optimisation approaches across several mixed-variable black-box optimisation tasks.
T. Bach, A. Tong, S. Hy, V. Nguyen, T. Nguyen Global Contrastive Learning for Long-Tailed Classification Transactions on Machine Learning Research (TMLR), 2023. Abstract
Global Contrastive Learning for Long-Tailed Classification
We consider the long-tailed classification problem in which a few classes in the training data dominate the majority of the other classes. For concreteness, we focus on the visual domain in this paper. Most current methods employ contrastive learning to learn a representation for long-tailed data. In this paper, first, we investigate
-positive sampling, a popular baseline method widely used to build contrastive learning models for imbalanced data. Previous works show that
-positive learning, which only chooses
positive samples (instead of all positive images) for each query image, suffers from inferior performance in long-tailed data. In this work, we further point out that k-positive learning limits the learning capability of both head and tail classes. Based on this perspective, we propose a novel contrastive learning framework that improves the limitation in k-positive learning by enlarging its positive selection space, so it can help the model learn more semantic discrimination features. Second, we analyze how the temperature (the hyperparameter used for tuning a concentration of samples on feature space) affects the gradients of each class in long-tailed learning, and propose a new method that can mitigate inadequate gradients between classes, which can help model learning easier. We name this framework as CoGloAT. Finally, we go on to introduce a new prototype learning framework namely ProCo based on coreset selection, which creates a global prototype for each cluster while keeping the computation cost within a reasonable time and show that combining CoGloAT with ProCo can further enhance the model learning ability on long-tailed data.
Semi-supervised learning is a critical tool in reducing machine learning’s dependence on labeled data. It has, however, been applied primarily to image and
language data, by exploiting the inherent spatial and semantic structure therein.
These methods do not apply to tabular data because these domain structures are not
available. Existing pseudo-labeling (PL) methods can be effective for tabular data
but are vulnerable to noise samples and to greedy assignments given a predefined
threshold which is unknown. This paper addresses this problem by proposing a
Confident Sinkhorn Allocation (CSA), which assigns labels to only samples with
high confidence scores and learns the best label allocation via optimal transport.
CSA outperforms the current state-of-the-art in this practically important area.
J. Parker-Holder*, R. Rajan*, X. Song*, A. Biedenkapp, Y. Miao, T. Eimer, B. Zhang, V. Nguyen, R. Calandra, A. Faust, F. Hutter, M. Lindauer Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Journal of Artificial Intelligence Research (JAIR), 2022. Abstract
Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.
X. Wan, C. Lu, J. Parker-Holder, P. J. Ball, V. Nguyen, B. Ru, M. Osborne Bayesian Generational Population-Based Training
International Conference on Automated Machine Learning (AutoML), 2022. Preliminary version at ALOE workshop, ICLR 2022. Abstract
Bayesian Generational Population-Based Training
Reinforcement learning (RL) offers the potential for training generally capable
agents that can interact autonomously in the real world. However, one key limitation
is the brittleness of RL algorithms to core hyperparameters and network architecture
choice. Furthermore, non-stationarities such as evolving training data and increased
agent complexity mean that different hyperparameters and architectures may be
optimal at different points of training. This motivates AutoRL, a class of methods
seeking to automate these design choices. One prominent class of AutoRL methods
is Population-Based Training (PBT), which have led to impressive performance
in several large scale settings. In this paper, we introduce two new innovations in
PBT-style methods. First, we employ trust-region based Bayesian Optimization,
enabling full coverage of the high-dimensional mixed hyperparameter search
space. Second, we show that using a generational approach, we can also learn
both architectures and hyperparameters jointly on-the-fly in a single training run.
Leveraging the new highly parallelizable Brax physics engine, we show that these
innovations lead to large performance gains, significantly outperforming the tuned
baseline while learning entire configurations on the fly.
A. Long, W. Yin, T. Ajanthan, V. Nguyen, P. Purkait, R. Garg, A. Blair, C. Shen, A. van den Hengel Retrieval Augmented Classification for Long-Tail Visual Recognition IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Abstract
Retrieval Augmented Classification for Long-Tail Visual Recognition
We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module.
RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classification and demonstrate a significant improvement over previous state-of-the-art on Places365-LT and iNaturalist-2018 (14.5% and 6.7% respectively), despite using only the training datasets themselves as the external information source. We demonstrate that RAC's retrieval module, without prompting, learns a high level of accuracy on tail classes. This, in turn, frees the base encoder to focus on common classes, and improve its performance thereon. RAC represents an alternative approach to utilizing large, pretrained models without requiring fine-tuning, as well as a first step towards more effectively making use of external memory within common computer vision architectures.
J. Parker-Holder, V. Nguyen, S. Desai, S. Roberts Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL
Advances in Neural Information Processing Systems (NeurIPS), 2021. Abstract
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the fly. In particular, the PB2 algorithm is able to achieve strong performance in RL tasks by formulating online hyperparameter optimization as time varying GP-bandit problem, while also providing theoretical guarantees. However, PB2 is only designed to work for continuous hyperparameters, which severely limits its utility in practice. In this paper we introduce a new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime. We evaluate our approach on the challenging Procgen benchmark, where we show that explicitly modelling dependence between data augmentation and other hyperparameters improves generalization..
M. Ahrens*, J. Ashwin*, J. Calliess, V. Nguyen Bayesian Topic Regression for Causal Inference
Empirical Methods in Natural Language Processing (EMNLP), 2021. Abstract
Bayesian Topic Regression for Causal Inference
Causal inference using observational text data
is becoming increasingly popular in many research areas. This paper presents the Bayesian
Topic Regression (BTR) model that uses both
text and numerical information to model an
outcome variable. It allows estimation of both
discrete and continuous treatment effects. Furthermore, it allows for the inclusion of additional numerical confounding factors next to
text data. To this end, we combine a supervised Bayesian topic model with a Bayesian
regression framework and perform supervised
representation learning for the text features
jointly with the regression parameter training, respecting the Frisch-Waugh-Lovell theorem. Our paper makes two main contributions. First, we provide a regression framework that allows causal inference in settings
when both text and numerical confounders are
of relevance. We show with synthetic and
semi-synthetic datasets that our joint approach
recovers ground truth with lower bias than any
benchmark model, when text and numerical
features are correlated. Second, experiments
on two real-world datasets demonstrate that
a joint and supervised learning strategy also
yields superior prediction results compared to
strategies that estimate regression weights for
text and non-text features separately, being
even competitive with more complex deep neural networks.
N. E. R. van Bueren ,T. L. Reed , V. Nguyen, J. G. Sheffield, S. H. G. van der Ven, M. A. Osborne, E. H. Kroesbergen, R. Cohen Kadosh Personalized brain stimulation for effective neurointervention across participants PLOS Computational Biology, 17(9), e1008886, 2021. Abstract
Personalized brain stimulation for effective neurointervention across participants
Accumulating evidence from human-based research has highlighted that the prevalent one-size-fits-all approach for neural and behavioral interventions is inefficient.
This approach can benefit one individual, but be ineffective or even detrimental for another.
Studying the efficacy of the large range of different parameters for different individuals is costly, time-consuming and requires a large sample size that makes such research impractical and hinders effective interventions. Here an active machine learning technique is presented across participants—personalized Bayesian optimization (pBO)—that searches available parameter combinations to optimize an intervention as a function of an individual’s ability. This novel technique was utilized to identify transcranial alternating current stimulation (tACS) frequency and current strength combinations most likely to improve arithmetic performance, based on a subject’s baseline arithmetic abilities. The pBO was performed across all subjects tested, building a model of subject performance, capable of recommending parameters for future subjects based on their baseline arithmetic ability. pBO successfully searches, learns, and recommends parameters for an effective neurointervention as supported by behavioral, stimulation, and neural data.
The application of pBO in human-based research opens up new avenues for personalized and more effective interventions, as well as discoveries of protocols for treatment and translation to other clinical and non-clinical domains.
V. Nguyen*, S. B. Orbell*, D. T. Lennon, H. Moon, F. Vigneau, L. C. Camenzind, L. Yu, D. M. Zumbühl, G. A. D. Briggs, M. A. Osborne, D. Sejdinovic, N. Ares Deep reinforcement learning for efficient measurement of quantum devices NPJ Quantum Information, 7(1), pp.1-9, 2021. Abstract
Deep reinforcement learning for efficient measurement of quantum devices
Deep reinforcement learning is an emerging machine-learning approach that can teach a computer to learn from their actions and rewards similar to the way humans learn from experience.
It offers many advantages in automating decision processes to navigate large parameter spaces. This paper proposes an approach to the efficient measurement of quantum devices based on deep reinforcement learning.
We focus on double quantum dot devices, demonstrating the fully automatic identification of specific transport features called bias triangles.
Measurements targeting these features are difficult to automate, since bias triangles are found in otherwise featureless regions of the parameter space. Our algorithm identifies bias triangles in a mean time of <30 min, and sometimes as little as 1 min. This approach, based on dueling deep Q-networks, can be adapted to a broad range of devices and target transport features.
This is a crucial demonstration of the utility of deep reinforcement learning for decision making in the measurement and operation of quantum devices.
M. C. Tran, V. Nguyen, R. Bruce, D. C. Crockett, F. Formenti, P.A. Phan, S. J. Payne, A. D. Farmery Simulation-based Optimisation to Quantify Heterogeneity of Specific Ventilation and Perfusion in the Lung by the Inspired Sinewave Test Scientific Reports, 11(1), pp.1-10, 2021. Abstract
Simulation-based optimisation to quantify heterogeneity of specific ventilation and perfusion in the lung by the Inspired Sinewave Test
The degree of specific ventilatory heterogeneity (spatial unevenness of ventilation) of the lung is a useful marker of early structural lung changes which has the potential to detect early-onset disease.
The Inspired Sinewave Test (IST) is an established noninvasive ‘gas-distribution’ type of respiratory test capable of measuring the cardiopulmonary parameters.
We developed a simulation-based optimisation for the IST, with a simulation of a realistic heterogeneous lung, namely a lognormal distribution of spatial ventilation and perfusion.
We tested this method in datasets from 13 anaesthetised pigs (pre and post-lung injury) and 104 human subjects (32 healthy and 72 COPD subjects).
The 72 COPD subjects were classified into four COPD phenotypes based on ‘GOLD’ classification.
This method allowed IST to identify and quantify heterogeneity of both ventilation and perfusion, permitting diagnostic distinction between health and disease states.
In healthy volunteers, we show a linear relationship between the ventilatory heterogeneity versus age (R^2=0.42).
In a mechanically ventilated pig, IST ventilatory heterogeneity in noninjured and injured lungs was significantly different (p<0.0001).
Additionally, measured indices could accurately identify patients with COPD (area under the receiver operating characteristic curve is 0.76, p<0.0001).
The IST also could distinguish different phenotypes of COPD with 73% agreement with spirometry.
S. Kessler, V. Nguyen, S. Zohren, S. Robert Hierarchical Indian Buffet Neural Networks for Bayesian Continual Learning
Uncertainty in Artificial Intelligence, (UAI), pp. 749-759, 2021. Preliminary version at Bayesian Deep Learning workshop,NeurIPS 2019. Abstract
Hierarchical Indian Buffet Neural Networks for Bayesian Continual Learning
We place an Indian Buffet process (IBP) prior
over the structure of a Bayesian Neural Network (BNN), thus allowing the complexity
of the BNN to increase and decrease automatically. We further extend this model such
that the prior on the structure of each hidden
layer is shared globally across all layers, using
a Hierarchical-IBP (H-IBP). We apply this
model to the problem of resource allocation in
Continual Learning (CL) where new tasks occur and the network requires extra resources.
Our model uses online variational inference
with reparameterisation of the Bernoulli and
Beta distributions, which constitute the IBP
and H-IBP priors. As we automatically learn
the number of weights in each layer of the
BNN, overfitting and underfitting problems
are largely overcome. We show empirically
that our approach offers a competitive edge
over existing methods in CL.
X. Wan, V. Nguyen, H. Ha, B. Ru, C. Lu, M. A. Osborne Think Global and Act Local: Bayesian Optimisation over
High-Dimensional Categorical and Mixed Search Spaces
International Conference on Machine Learning (ICML), pp. 10663-10674, 2021. Abstract
Think Global and Act Local: Bayesian Optimisation over
High-Dimensional Categorical and Mixed Search Spaces
High-dimensional black-box optimisation remains an important yet notoriously challenging problem. Despite the success of Bayesian optimisation methods on continuous domains, domains that are categorical, or that mix continuous and categorical variables, remain challenging. We propose a novel solution
– we combine local optimisation with a tailored kernel design, effectively handling highdimensional categorical and mixed search spaces, whilst retaining sample efficiency. We further derive convergence guarantee for the proposed approach. Finally, we demonstrate empirically that our method outperforms the current baselines on a variety of synthetic
and real-world tasks in terms of performance, computational costs, or both.
V. Nguyen*, T. Le*, M. Yamada, M. A. Osborne Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search
International Conference on Machine Learning (ICML), pp. 8084-8095, 2021. Abstract
Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search
Neural architecture search (NAS) automates the design of deep neural networks. One of the main challenges in searching complex and non-continuous architectures is to compare the similarity of networks that the conventional Euclidean metric may fail to capture. Optimal transport (OT) is resilient to such complex structure by considering the minimal cost for transporting a network into another. However, the OT is generally not negative definite which may limit its ability to build the positive-definite kernels required in many kernel-dependent frameworks. Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings. Furthermore, we derive a novel parallel NAS, using quality k-determinantal point process on the GP posterior, to select diverse and high-performing architectures from a discrete set of candidates. Empirically, we demonstrate that our TW-based approaches outperform other baselines in both sequential and parallel NAS.
V. Nguyen, V. Masrani, R. Brekelmans, M. A. Osborne, F. Wood Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Advances in Neural Information Processing Systems (NeurIPS), pp. 5764-5775, 2020. Abstract
Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational inference objective that lower-bounds the log evidence via one-dimensional Riemann integration, requires choosing a ``schedule'' of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and sigmoid belief networks.
V. Nguyen*, S. Schulze*, M. A. Osborne Bayesian Optimisation for Iterative Learning
Advances in Neural Information Processing Systems (NeurIPS), pp. 9361-9371, 2020. Preliminary version at 7th AutoML Workshop at International Conference on Machine Learning (ICML), 2020. Abstract
Bayesian Optimization for Iterative Learning
The success of deep (reinforcement) learning systems crucially depends on the correct choice of hyperparameters which are notoriously sensitive and expensive to evaluate. Training these systems typically requires running iterative processes over multiple epochs or episodes. Traditional approaches only consider final performances of a hyperparameter although intermediate information from the learning curve is readily available. In this paper, we present a Bayesian optimization approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. First, we transform each training curve into a numeric score. Second, we selectively augment the data using the auxiliary information from the curve. This augmentation step enables modeling efficiency while preventing the ill-conditioned issue of Gaussian process covariance matrix happened when adding the whole curve. We demonstrate the efficiency of our algorithm by tuning hyperparameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.
J. Parker-Holder, V. Nguyen, S. Roberts Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Advances in Neural Information Processing Systems (NeurIPS), pp. 17200-17211, 2020. Preliminary version at 7th AutoML Workshop at International Conference on Machine Learning (ICML), 2020. Abstract
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Selecting optimal hyperparameters is a key challenge in machine learning. A recent approach to this problem (PBT) showed it is possible to achieve impressive performance by updating both weights and hyperparameters in a single training run of a population of agents. Despite it's success, PBT relies on heuristics to explore the hyperparameter space, thus lacks theoretical guarantees, requires vast computational resources and often suffers from mode collapse when this is not available. In this work we introduce Population-Based Bandits (PB2), the first provably efficient PBT-style algorithm. PB2 uses a probabilistic model to balance exploration and exploitation, thus it is able to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT.
V. Nguyen, M. A. Osborne Knowing The What But Not The Where in Bayesian Optimization
International Conference on Machine Learning (ICML), pp 7317-7326, 2020. Abstract
Knowing The What But Not The Where in Bayesian Optimization
Bayesian optimization has demonstrated impressive success in finding the optimum location x* and value f*=f(x*)=max f(x) of the black-box function f.
In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location.
Existing work in Bayesian optimization (BO) has not effectively exploited the knowledge of f* for optimization.
In this paper, we consider a new setting in BO in which the knowledge of the optimum value is available. Our goal is to exploit the knowledge about f* to search for the location x* efficiently.
To achieve this goal, we first transform the Gaussian process surrogate using the information about the optimum value.
Then, we propose two acquisition functions, called confidence bound minimization and expected regret minimization, which exploit the knowledge about the optimum value to identify the optimum location efficiently.
We show that our approaches work both intuitively and quantitatively achieve better performance against standard BO methods.
We demonstrate real applications in tuning a deep reinforcement learning algorithm on the CartPole problem and XGBoost on Skin Segmentation dataset in which the optimum values are publicly available.
B. Ru*, AS. Alvi*, V. Nguyen, M. A. Osborne, SJ. Roberts Bayesian Optimisation over Multiple Continuous and Categorical Inputs
International Conference on Machine Learning (ICML), pp 8276-8285, 2020. Abstract
Bayesian Optimisation over Multiple Continuous and Categorical Inputs
Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges. We propose a new approach, Continuous and Categorical Bayesian Optimisation (CoCaBO), which combines the strengths of multi-armed bandits and Bayesian optimisation to select values for both categorical and continuous inputs. We model this mixed-type space using a Gaussian Process kernel, designed to allow sharing of information across multiple categorical variables, each with multiple possible values; this allows CoCaBO to leverage all available data efficiently. We extend our method to the batch setting and propose an efficient selection procedure that dynamically balances exploration and exploitation whilst encouraging batch diversity. We demonstrate empirically that our method outperforms existing approaches on both synthetic and real-world optimisation tasks with continuous and categorical inputs.
N. van Esbroeck, D. Lennon, H. Moon, V. Nguyen, F. Vigneau, LC. Camenzind, L. Yu, DM. Zumbühl, G. A. D. Briggs, D. Sejdinovic, N. Ares Quantum device fine-tuning using unsupervised embedding learning
New Journal of Physics, 22.9 (2020): 095003. Abstract
Quantum device fine-tuning using unsupervised embedding learning
Quantum devices with a large number of gate electrodes allow for precise control of device
parameters. This capability is hard to fully exploit due to the complex dependence of these
parameters on applied gate voltages. We experimentally demonstrate an algorithm capable of
fine-tuning several device parameters at once. The algorithm acquires a measurement and assigns
it a score using a variational auto-encoder. Gate voltage settings are set to optimise this score in
real-time in an unsupervised fashion. We report fine-tuning times of a double quantum dot device
within approximately 40 min.
V. Nguyen, D. Lennon, H. Moon, N. Esbroeck, D. Sejdinovic, M. A. Osborne, G. A. D. Briggs, N. Ares Controlling Quantum Device Measurement using Deep Reinforcement Learning
Deep Reinforcement Learning workshop,NeurIPS 2019. Abstract
Controlling Quantum Device Measurement using Deep Reinforcement Learning
Qubits based on semiconductor quantum dot devices are promising building blocks for the realisation of
quantum computers. However, measuring and characterising these quantum dot devices can be challenging and laborious for the experimentalists. In this paper, we develop an elegant application using deep
reinforcement learning for controlling the measurement of quantum dot devices. Specifically, we present
a computer-automated algorithm that measures a map of current flowing through a double quantum
dot device for different settings of its gate electrodes. The algorithm seeks particular features called
bias-triangles indicating the device is in the right operating regime of realising a qubit. Our approach
requires no human intervention and significantly reduces the measurement time. This work alleviates
the user effort required to measure multiple quantum dot devices, each with multiple gate electrodes.
V. Nguyen, S. Gupta, S. Rana, M. Thai, C. Li, S. Venkatesh Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp. 1270-1275, 2019. [194/1046=18%]
Preliminary version appears at NIPS Workshop on Bayesian Optimization, (NIPSW), 2017. Abstract
Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations
S. Gopakumar, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation Advances in Neural Information Processing Systems, (NeurIPS), pp. 5470-5478, 2018. Abstract
Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation
We introduce algorithmic assurance, the problem of testing whether machine learning algorithms are conforming to their intended design goal. We address this problem by proposing an efficient framework for algorithmic testing. To provide assurance, we need to efficiently discover scenarios where an algorithm decision deviates maximally from its intended gold standard. We mathematically formulate this task as an optimisation problem of an expensive, black-box function. We use an active learning approach based on Bayesian optimisation to solve this optimisation problem. We extend this framework to algorithms with vector-valued outputs by making appropriate modification in Bayesian optimisation via the EXP3 algorithm. We theoretically analyse our methods for convergence. Using two real-world applications, we demonstrate the efficiency of our methods. The significance of our problem formulation and initial solutions is that it will serve as the foundation in assuring humans about machines making complex decisions.
C. Li, S. Rana, S. Gupta, V. Nguyen, S. Venkatesh, A. Sutti, D. Rubin, T. Slezak, M. Height, M. Mohammed, and I. Gibson Accelerating Experimental Design by Incorporating Experimenter Hunches Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp. 257-266, 2018. [84/948=9%] Abstract
Accelerating Experimental Design by Incorporating Experimenter Hunches
Experimental design is a process of obtaining a product with target property via experimentation. Bayesian optimization offers a sample-efficient tool for experimental design when experiments are expensive. Often, expert experimenters have 'hunches' about the behavior of the experimental system, offering potentials to further improve the efficiency. In this paper, we consider per-variable monotonic trend in the underlying property that results in a unimodal trend in those variables for a target value optimization. For example, sweetness of a candy is monotonic to the sugar content. However, to obtain a target sweetness, the utility of the sugar content becomes a unimodal function, which peaks at the value giving the target sweetness and falls off both ways. In this paper, we propose a novel method to solve such problems that achieves two main objectives: a) the monotonicity information is used to the fullest extent possible, whilst ensuring that b) the convergence guarantee remains intact. This is achieved by a two-stage Gaussian process modeling, where the first stage uses the monotonicity trend to model the underlying property, and the second stage uses `virtual' samples, sampled from the first, to model the target value optimization function. The process is made theoretically consistent by adding appropriate adjustment factor in the posterior computation, necessitated because of using the `virtual' samples. The proposed method is evaluated through both simulations and real world experimental design problems of a) new short polymer fiber with the target length, and b) designing of a new three dimensional porous scaffolding with a target porosity. In all scenarios our method demonstrates faster convergence than the basic Bayesian optimization approach not using such `hunches'.
J. Berk, V. Nguyen, S. Gupta, S. Rana, S. Venkatesh Exploration Enhanced Expected Improvement for Bayesian Optimization European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), pp. 621-637, 2018. Abstract
Exploration Enhanced Expected Improvement for Bayesian Optimization
Bayesian optimization (BO) is a sample-efficient method for global
optimization of expensive, noisy, black-box functions using probabilistic methods. The performance of a BO method depends on its selection strategy through
an acquisition function. This must balance improving our understanding of the
function in unknown regions (exploration) with locally improving on known promising samples (exploitation). Expected improvement (EI) is one of the most widely
used acquisition functions for BO. Unfortunately, it has a tendency to over-exploit,
meaning that it can be slow in finding new peaks. We propose a modification to EI
that will allow for increased early exploration while providing similar exploitation once the system has been suitably explored. We also prove that our method
has a sub-linear convergence rate and test it on a range of functions to compare
its performance against the standard EI and other competing methods.
X. Zhang, W. Li, V. Nguyen, F. Zhuang, H. Xiong, S. Lu Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning International Joint Conference on Artificial Intelligence, (IJCAI), pp. 3125-3131, 2018. Abstract
Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning
Multi-label learning is widely applied in many realworld
applications, such as image and gene annotation.
While most of the existing multi-label
learning models focus on the single-task learning
problem, there are always some tasks that share
some commonalities, which can help each other to
improve the learning performances if the knowledge
in the similar tasks can be smartly shared.
In this paper, we propose a LABel-sensitive TAsk
Grouping framework, named LABTAG, based on
Bayesian nonparametric approach for multi-task
multi-label classification. The proposed framework
explores the label correlations to capture featurelabel
patterns, and clusters similar tasks into groups
with shared knowledge, which are learned jointly
to produce a strengthened multi-task multi-label
model. We evaluate the model performance on
three public multi-task multi-label data sets, and the
results show that LABTAG outperforms the compared
baselines with a significant margin.
S. Rana, C. Li, S. Gupta, V. Nguyen, S. Venkatesh High Dimensional Bayesian Optimization with Elastic Gaussian Process Proceedings of the 34th International Conference on Machine Learning, (ICML), pp 2883-2891, 2017. Abstract
High Dimensional Bayesian Optimization with Elastic Gaussian Process
Bayesian optimization is an efficient way to optimize expensive black-box functions such as designing a new product with highest quality or hyperparameter tuning of a machine learning algorithm. However, it has a serious limitation when the parameter space is high-dimensional as Bayesian optimization crucially depends on solving a global optimization of a surrogate utility function in the same sized dimensions. The surrogate utility function, known commonly as acquisition function is a continuous function but can be extremely sharp at high dimension - having only a few peaks marooned in a large terrain of almost flat surface. Global optimization algorithms such as DIRECT are infeasible at higher dimensions and gradient-dependent methods cannot move if initialized in the flat terrain. We propose an algorithm that enables local gradient-dependent algorithms to move through the flat terrain by using a sequence of gross-to-finer Gaussian process priors on the objective function as we leverage two underlying facts - a) there exists a large enough length-scales for which the acquisition function can be made to have a significant gradient at any location in the parameter space, and b) the extrema of the consecutive acquisition functions are close although they are different only due to a small difference in the length-scales. Theoretical guarantees are provided and experiments clearly demonstrate the utility of the proposed method at high dimension using both benchmark test functions and real-world case studies.
T. Le, T. D Nguyen, V. Nguyen, D. Phung, Approximation Vector Machines for Large-scale Online Learning Journal of Machine Learning Research, (JMLR), 2017. Abstract
Approximation Vector Machines for Large-scale Online Learning
One of the most challenging problems in kernel online learning is to bound the model size
and to promote model sparsity. Sparse models not only improve computation and memory
usage, but also enhance the generalization capacity – a principle that concurs with the law
of parsimony. However, inappropriate sparsity modeling may also significantly degrade the
performance. In this paper, we propose Approximation Vector Machine (AVM), a model
that can simultaneously encourage sparsity and safeguard its risk in compromising the performance.
In an online setting context, when an incoming instance arrives, we approximate
this instance by one of its neighbors whose distance to it is less than a predefined threshold.
Our key intuition is that since the newly seen instance is expressed by its nearby neighbor
the optimal performance can be analytically formulated and maintained. We develop
theoretical foundations to support this intuition and further establish an analysis for the
common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classifi-
cation task) and `1, `2, and ε-insensitive (i.e., for the regression task) to characterize the
gap between the approximation and optimal solutions. This gap crucially depends on two
key factors including the frequency of approximation (i.e., how frequent the approximation
operation takes place) and the predefined threshold. We conducted extensive experiments
for classification and regression tasks in batch and online modes using several benchmark
datasets. The quantitative results show that our proposed AVM obtained comparable predictive
performances with current state-of-the-art methods while simultaneously achieving
significant computational speed-up due to the ability of the proposed AVM in maintaining
the model size.
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh Regret for Expected Improvement over the Best-Observed Value and Stopping Condition Proceedings of The 9th Asian Conference on Machine Learning, (ACML), pp. 279-294, 2017. Abstract
Regret for Expected Improvement over the Best-Observed Value and Stopping Condition
Bayesian optimization (BO) is a sample-efficient method for global optimization of expensive,
noisy, black-box functions using probabilistic methods. The performance of a BO method depends
on its selection strategy through the acquisition function. Expected improvement (EI) is one of
the most widely used acquisition functions for BO that finds the expectation of the improvement
function over the incumbent. The incumbent is usually selected as the best-observed value so far,
termed as y^max (for the maximizing problem). Recent work has studied the convergence rate for
EI under some mild assumptions or zero noise of observations. Especially, the work of Wang and
de Freitas (2014) has derived the sublinear regret for EI under a stochastic noise. However, due to
the difficulty in stochastic noise setting and to make the convergent proof feasible, they use an alternative
choice for the incumbent as the maximum of the Gaussian process predictive mean, \mu^max.
This modification makes the algorithm computationally inefficient because it requires an additional
global optimization step to estimate mmax that is costly and may be inaccurate. To address this
issue, we derive a sublinear convergence rate for EI using the commonly used ymax. Moreover, our
analysis is the first to study a stopping criteria for EI to prevent unnecessary evaluations. Our analysis
complements the results of Wang and de Freitas (2014) to theoretically cover two incumbent
settings for EI. Finally, we demonstrate empirically that EI using y^max is both more computationally
efficiency and more accurate than EI using \mu^max.
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh Bayesian Optimization in Weakly Specified Search Space Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 347-356, 2017. [72/778=9%] Abstract
Bayesian Optimization in Weakly Specified Search Space
Bayesian optimization (BO) has recently emerged
as a powerful and flexible tool for hyper-parameter tuning and
more generally for the efficient global optimization of expensive
black-box functions. Systems implementing BO has successfully
solved difficult problems in automatic design choices and machine
learning hyper-parameters tunings. Many recent advances in
the methodologies and theories underlying Bayesian optimization
have extended the framework to new applications and provided
greater insights into the behavior of these algorithms. Still, these
established techniques always require a user-defined space to
perform optimization. This pre-defined space specifies the ranges
of hyper-parameter values. In many situations, however, it can
be difficult to prescribe such spaces, as a prior knowledge is
often unavailable. Setting these regions arbitrarily can lead to
inefficient optimization - if a space is too large, we can miss the
optimum with a limited budget, on the other hand, if a space is
too small, it may not contain the optimum point that we want to
get. The unknown search space problem is intractable to solve in
practice. Therefore, in this paper, we narrow down to consider
specifically the setting of “weakly specified” search space for
Bayesian optimization. By weakly specified space, we mean that
the pre-defined space is placed at a sufficiently good region so that
the optimization can expand and reach to the optimum. However,
this pre-defined space need not include the global optimum.
We tackle this problem by proposing the filtering expansion
strategy for Bayesian optimization. Our approach starts from
the initial region and gradually expands the search space. We
develop an efficient algorithm for this strategy and derive its
regret bound. These theoretical results are complemented by an
extensive set of experiments on benchmark functions and two
real-world applications which demonstrate the benefits of our
proposed approach.
T. Le, K. Nguyen, V. Nguyen, T. D. Nguyen, D. Phung GoGP: Fast Online Regression with Gaussian Processes Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 257-266, 2017. [72/778=9%] Abstract
GoGP: Fast Online Regression with Gaussian Processes
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive
irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative
representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence
rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori.
Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data.
We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task.
The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared
with its rivals under online setting.
More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
V. Nguyen, D. Phung, T. Le, H. Bui Discriminative Bayesian Nonparametric Clustering Proceedings of International Joint Conference on Artificial Intelligence, (IJCAI), pp 2550-2556, 2017. Abstract
Discriminative Bayesian Nonparametric Clustering
We propose a general framework for discriminative
Bayesian nonparametric clustering to promote
the inter-discrimination among the learned clusters
in a fully Bayesian nonparametric (BNP) manner.
Our method combines existing BNP clustering and
discriminative models by enforcing latent cluster
indices to be consistent with the predicted labels
resulted from probabilistic discriminative model.
This formulation results in a well-defined generative
process wherein we can use either logistic regression
or SVM for discrimination. Using the proposed
framework, we develop two novel discriminative
BNP variants: the discriminative Dirichlet
process mixtures, and the discriminative-state in-
finite HMMs for sequential data. We develop ef-
ficient data-augmentation Gibbs samplers for posterior
inference. Extensive experiments in image
clustering and dynamic location clustering demonstrate
that by encouraging discrimination between
induced clusters, our model enhances the quality of
clustering in comparison with the traditional generative
BNP models.
C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, A. Shilton High Dimensional Bayesian Optimization Using Dropout Proceedings of International Joint Conference on Artificial Intelligence , (IJCAI), pp 2096-2102, 2017. Abstract
High Dimensional Bayesian Optimization Using Dropout
Scaling Bayesian optimization to high dimensions
is challenging task as the global optimization of
high-dimensional acquisition function can be expensive
and often infeasible. Existing methods depend
either on limited “active” variables or the additive
form of the objective function. We propose
a new method for high-dimensional Bayesian optimization,
that uses a dropout strategy to optimize
only a subset of variables at each iteration. We derive
theoretical bounds for the regret and show how
it can inform the derivation of our algorithm. We
demonstrate the efficacy of our algorithms for optimization
on two benchmark functions and two realworld
applications - training cascade classifiers and
optimizing alloy composition
T. Le, T.D. Nguyen,V. Nguyen, D. Phung Dual Space Gradient Descent for Online Learning Advances in Neural Information Processing Systems, (NeurIPS), pp 4583-4591, 2016. Abstract
Dual Space Gradient Descent for Online Learning
One crucial goal in kernel online learning is to bound the model size. Common
approaches employ budget maintenance procedures to restrict the model sizes using
removal, projection, or merging strategies. Although projection and merging, in the
literature, are known to be the most effective strategies, they demand extensive com
putation whilst removal strategy fails to retain information of the removed vectors.
An alternative way to address the model size problem is to apply random features
to approximate the kernel function. This allows the model to be maintained directly
in the random feature space, hence effectively resolve the curse of kernelization.
However, this approach still suffers from a serious shortcoming as it needs to use a
high dimensional random feature space to achieve a sufficiently accurate kernel
approximation. Consequently, it leads to a significant increase in the computational
cost. To address all of these aforementioned challenges, we present in this paper
the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes
random features as an auxiliary space to maintain information from data points
removed during budget maintenance. Consequently, our approach permits the
budget to be maintained in a simple, direct and elegant way while simultaneously
mitigating the impact of the dimensionality issue on learning performance. We
further provide convergence analysis and extensively conduct experiments on five
real-world datasets to demonstrate the predictive performance and scalability of
our proposed method in comparison with the state-of-the-art baselines.
V. Nguyen, S. K. Gupta, S. Rana, C. Li, S. Venkatesh A Bayesian Nonparametric Approach for Multi-label Classification Proceedings of The 8th Asian Conference on Machine Learning, (ACML), pp 254-269, 2016. Abstract
A Bayesian Nonparametric Approach for Multi-label Classification
Many real-world applications require multi-label classification where multiple target labels
are assigned to each instance. In multi-label classification, there exist the intrinsic correlations
between the labels and features. These correlations are beneficial for multi-label
classification task since they reflect the coexistence of the input and output spaces that can
be exploited for prediction. Traditional classification methods have attempted to reveal
these correlations in different ways. However, existing methods demand expensive computation
complexity for finding such correlation structures. Furthermore, these approaches
can not identify the suitable number of label-feature correlation patterns. In this paper, we
propose a Bayesian nonparametric (BNP) framework for multi-label classification that can
automatically learn and exploit the unknown number of multi-label correlation. We utilize
the recent techniques in stochastic inference to derive the cheap (but efficient) posterior
inference algorithm for the model. In addition, our model can naturally exploit the useful
information from missing label samples. Furthermore, we extend the model to update parameters
in an online fashion that highlights the flexibility of our model against the existing
approaches. We compare our method with the state-of-the-art multi-label classification algorithms
on real-world datasets using both complete and missing label settings. Our model
achieves better classification accuracy while our running time is consistently much faster
than the baselines in an order of magnitude.
V. Nguyen, S. Rana, S. K. Gupta, C. Li, S. Venkatesh Budgeted Batch Bayesian Optimization Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 1107-1112, 2016. [180/918=19%] Abstract
Budgeted Batch Bayesian Optimization
Parameter settings profoundly impact the performance
of machine learning algorithms and laboratory experiments.
The classical trial-error methods are exponentially expensive
in large parameter spaces, and Bayesian optimization (BO)
offers an elegant alternative for global optimization of black box
functions. In situations where the functions can be evaluated
at multiple points simultaneously, batch Bayesian optimization
is used. Current batch BO approaches are restrictive in fixing
the number of evaluations per batch, and this can be wasteful
when the number of specified evaluations is larger than the
number of real maxima in the underlying acquisition function.
We present the budgeted batch Bayesian optimization (B3O) for
hyper-parameter tuning and experimental design - we identify
the appropriate batch size for each iteration in an elegant
way. In particular, we use the infinite Gaussian mixture model
(IGMM) for automatically identifying the number of peaks in
the underlying acquisition functions. We solve the intractability
of estimating the IGMM directly from the acquisition function
by formulating the batch generalized slice sampling to efficiently
draw samples from the acquisition function. We perform extensive
experiments for benchmark functions and two real world
applications - machine learning hyper-parameter tuning and
experimental design for alloy hardening. We show empirically
that the proposed B3O outperforms the existing fixed batch BO
approaches in finding the optimum whilst requiring a fewer
number of evaluations, thus saving cost and time.
V. Nguyen, T. D. Nguyen, T. Le, S. Venkatesh, D. Phung One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 1113-1118, 2016. [180/918=19%] Abstract
One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems
Logistic regression (LR) is at the cornerstone of classification. Its extension for multiclass classification is the
workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where
the class labels are not known in advance, a problem we term label-drift classification, in a similar spirit of the so-called conceptdrift
problem in the literature. Label-drift classification problem naturally occurs in many applications, especially in the context
of streaming and online settings where the incoming data may contain samples categorized with new classes that have not
been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running
time and label-drift requirements. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR)
to offer a principled treatment for large-scale and label-drift classifications. Our key contribution is the derivation of sufficient
statistic update for MAP estimation of Polya-Gamma augmentation for LR. Manipulating these sufficient statistics is convenient,
allowing our proposed method to efficiently perform the labeldrift classification under an online setting without retraining the
model from scratch. To handle large-scale classification for big data, we further extend our OLR to a distributed setting for
parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale
datasets with more than one hundred million data points. The experimental results show that the predictive performances of our
methods are comparable or better than those of state-of-the-art baselines whilst the execution time is much faster at an order of
magnitude. To measure the inherent trade-off between speed and accuracy, we propose quadrant visualization and quadrant score,
on which our proposed model outperforms other methods on all datasets. In addition, the OLR and Spark-OLR are invariant
to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse
of big data cross-validation to select optimal hyperparameters.
T. Le, V. Nguyen, TD. Nguyen, D. Phung Nonparametric Budgeted Stochastic Gradient Descent Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, (AISTATS), pp 654-572, 2016. Abstract
One of the most challenging problems in kernel online learning is to bound the model size.
Budgeted kernel online learning addresses this issue by bounding the model size to a predefined budget.
However, determining an appropriate value for such predefined budget is arduous.
In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way.
We provide theoretical analysis to show that our framework is guaranteed to converge for a large collection of loss functions
(e.g. Hinge, Logistic, L2, L1, and ε-insensitive) which enables the proposed algorithm to perform both classification and regression tasks without hurting the ideal convergence rate O(1/T)
of the standard Stochastic Gradient Descent. We validate our algorithm on the real-world datasets to consolidate the theoretical claims.
T. Nguyen, V. Nguyen, FD. Salim, D. Phung SECC: Simultaneous Extraction of Context and Community from Pervasive Signals Proceedings of 2016 IEEE International Conference on Pervasive Computing and Communications, (PERCOM), pp 1-9, 2016. Abstract
SECC: Simultaneous Extraction of Context and Community from Pervasive Signals
Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the
wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics
or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder
and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities.
This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework.
Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels.
We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.
V. Nguyen, D. Phung, T. Le, S. Venkatesh Large Sample Asymptotic for Nonparametric Mixture Model with Count Data. Workshop on Advances in Approximate Bayesian Inference at Neural Information Processing Systems, (NIPSW), 2015. Abstract
Large Sample Asymptotic for Nonparametric Mixture Model with Count Data
Bayesian nonparametric models have become popular recently due to its flexibility
in identifying the unknown number of clusters. However, the flexibility comes at
a cost for learning. Thus, Small Variance Asymptotic (SVA) is one of the promising
approach for scalability in Bayesian nonparametric models. SVA approach for
count data is also developed in which the likelihood function is replaced by the
Kullback–Leibler divergence. In this paper, we present the Large Sample Asymptotic
for count data when the number of sample in Multinomial distribution goes
to infinity, we derive the similar result to SVA for scalable clustering.
V. Nguyen, D. Phung, D.S. Pham, and S. Venkatesh Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance. Annals of Data Science, pp 1-21, 2015. Abstract
Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance
In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged
in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying
such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent
segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of
Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for
stream data segmentation and pattern discovery.
In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.
V. Nguyen, D. Phung, S. Venkatesh, and H. Bui A Bayesian Nonparametric Approach to Multilevel Regression. Advances in Knowledge Discovery and Data Mining (PAKDD), pp 330-342, 2015. Abstract
A Bayesian Nonparametric Approach to Multilevel Regression
Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics
and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups.
Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context
distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression.
We provide collapsed Gibbs sampler for posterior inference.
We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model.
V. Nguyen, D. Phung, L. Nguyen, S. Venkatesh and H. Bui Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts. Proceedings of The 31st International Conference on Machine Learning (ICML), pp. 288–296, 2014. Abstract
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents
and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context
observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way:
integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polya-urn view of
the model and an efficient collapsed Gibbs inference procedure.
Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.
T.V. Nguyen, D. Phung, S. K. Gupta, and S. Venkatesh Interactive Browsing System for Anomaly Video Surveillance. IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp 384-389, 2013. Abstract
Interactive Browsing System for Anomaly Video Surveillance
Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users.
This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare
motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern.
A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames
that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner.
We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.