V. Nguyen, V. Masrani, R. Brekelmans, M. A. Osborne, F. Wood Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Neural Information Processing Systems (NeurIPS), accepted, 2020. Abstract
Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational inference objective that lower-bounds the log evidence via one-dimensional Riemann integration, requires choosing a ``schedule'' of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and sigmoid belief networks.
V. Nguyen*, S. Schulze*, M. A. Osborne Bayesian Optimisation for Iterative Learning
Neural Information Processing Systems (NeurIPS), accepted, 2020. Preliminary version at 7th AutoML Workshop at International Conference on Machine Learning (ICML), 2020. Abstract
Bayesian Optimization for Iterative Learning
The success of deep (reinforcement) learning systems crucially depends on the correct choice of hyperparameters which are notoriously sensitive and expensive to evaluate. Training these systems typically requires running iterative processes over multiple epochs or episodes. Traditional approaches only consider final performances of a hyperparameter although intermediate information from the learning curve is readily available. In this paper, we present a Bayesian optimization approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. First, we transform each training curve into a numeric score. Second, we selectively augment the data using the auxiliary information from the curve. This augmentation step enables modeling efficiency while preventing the ill-conditioned issue of Gaussian process covariance matrix happened when adding the whole curve. We demonstrate the efficiency of our algorithm by tuning hyperparameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.
J. Parker-Holder, V. Nguyen, S. Roberts Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Neural Information Processing Systems (NeurIPS), accepted, 2020. Preliminary version at 7th AutoML Workshop at International Conference on Machine Learning (ICML), 2020. Abstract
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Selecting optimal hyperparameters is a key challenge in machine learning. A recent approach to this problem (PBT) showed it is possible to achieve impressive performance by updating both weights and hyperparameters in a single training run of a population of agents. Despite it's success, PBT relies on heuristics to explore the hyperparameter space, thus lacks theoretical guarantees, requires vast computational resources and often suffers from mode collapse when this is not available. In this work we introduce Population-Based Bandits (PB2), the first provably efficient PBT-style algorithm. PB2 uses a probabilistic model to balance exploration and exploitation, thus it is able to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT.
V. Nguyen, M. A. Osborne Knowing The What But Not The Where in Bayesian Optimization
International Conference on Machine Learning (ICML), 2020. Abstract
Knowing The What But Not The Where in Bayesian Optimization
Bayesian optimization has demonstrated impressive success in finding the optimum location x* and value f*=f(x*)=max f(x) of the black-box function f.
In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location.
Existing work in Bayesian optimization (BO) has not effectively exploited the knowledge of f* for optimization.
In this paper, we consider a new setting in BO in which the knowledge of the optimum value is available. Our goal is to exploit the knowledge about f* to search for the location x* efficiently.
To achieve this goal, we first transform the Gaussian process surrogate using the information about the optimum value.
Then, we propose two acquisition functions, called confidence bound minimization and expected regret minimization, which exploit the knowledge about the optimum value to identify the optimum location efficiently.
We show that our approaches work both intuitively and quantitatively achieve better performance against standard BO methods.
We demonstrate real applications in tuning a deep reinforcement learning algorithm on the CartPole problem and XGBoost on Skin Segmentation dataset in which the optimum values are publicly available.
B. Ru, AS. Alvi, V. Nguyen, M. A. Osborne, SJ. Roberts Bayesian Optimisation over Multiple Continuous and Categorical Inputs
International Conference on Machine Learning (ICML), 2020. Abstract
Bayesian Optimisation over Multiple Continuous and Categorical Inputs
Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges. We propose a new approach, Continuous and Categorical Bayesian Optimisation (CoCaBO), which combines the strengths of multi-armed bandits and Bayesian optimisation to select values for both categorical and continuous inputs. We model this mixed-type space using a Gaussian Process kernel, designed to allow sharing of information across multiple categorical variables, each with multiple possible values; this allows CoCaBO to leverage all available data efficiently. We extend our method to the batch setting and propose an efficient selection procedure that dynamically balances exploration and exploitation whilst encouraging batch diversity. We demonstrate empirically that our method outperforms existing approaches on both synthetic and real-world optimisation tasks with continuous and categorical inputs.
N. van Esbroeck, D. Lennon, H. Moon, V. Nguyen, F. Vigneau, LC. Camenzind, L. Yu, DM. Zumbühl, G. A. D. Briggs, D. Sejdinovic, N. Ares Quantum device fine-tuning using unsupervised embedding learning
New Journal of Physics 2020. Abstract
Quantum device fine-tuning using unsupervised embedding learning
Quantum devices with a large number of gate electrodes allow for precise control of device
parameters. This capability is hard to fully exploit due to the complex dependence of these
parameters on applied gate voltages. We experimentally demonstrate an algorithm capable of
fine-tuning several device parameters at once. The algorithm acquires a measurement and assigns
it a score using a variational auto-encoder. Gate voltage settings are set to optimise this score in
real-time in an unsupervised fashion. We report fine-tuning times of a double quantum dot device
within approximately 40 min.
V. Nguyen, D. Lennon, H. Moon, N. Esbroeck, D. Sejdinovic, M. A. Osborne, G. A. D. Briggs, N. Ares Controlling Quantum Device Measurement using Deep Reinforcement Learning
Deep Reinforcement Learning workshop, NeurIPS 2019. Abstract
Controlling Quantum Device Measurement using Deep Reinforcement Learning
Qubits based on semiconductor quantum dot devices are promising building blocks for the realisation of
quantum computers. However, measuring and characterising these quantum dot devices can be challenging and laborious for the experimentalists. In this paper, we develop an elegant application using deep
reinforcement learning for controlling the measurement of quantum dot devices. Specifically, we present
a computer-automated algorithm that measures a map of current flowing through a double quantum
dot device for different settings of its gate electrodes. The algorithm seeks particular features called
bias-triangles indicating the device is in the right operating regime of realising a qubit. Our approach
requires no human intervention and significantly reduces the measurement time. This work alleviates
the user effort required to measure multiple quantum dot devices, each with multiple gate electrodes.
V. Nguyen, S. Gupta, S. Rana, M. Thai, C. Li, S. Venkatesh Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp. 1270-1275, 2019. [194/1046=18%]
Preliminary version appears at NIPS Workshop on Bayesian Optimization, (NIPSW), 2017. Abstract
Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations
S. Gopakumar, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation Advances in Neural Information Processing Systems, (NeurIPS), pp. 5470-5478, 2018. Abstract
Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation
We introduce algorithmic assurance, the problem of testing whether machine learning algorithms are conforming to their intended design goal. We address this problem by proposing an efficient framework for algorithmic testing. To provide assurance, we need to efficiently discover scenarios where an algorithm decision deviates maximally from its intended gold standard. We mathematically formulate this task as an optimisation problem of an expensive, black-box function. We use an active learning approach based on Bayesian optimisation to solve this optimisation problem. We extend this framework to algorithms with vector-valued outputs by making appropriate modification in Bayesian optimisation via the EXP3 algorithm. We theoretically analyse our methods for convergence. Using two real-world applications, we demonstrate the efficiency of our methods. The significance of our problem formulation and initial solutions is that it will serve as the foundation in assuring humans about machines making complex decisions.
C. Li, S. Rana, S. Gupta, V. Nguyen, S. Venkatesh, A. Sutti, D. Rubin, T. Slezak, M. Height, M. Mohammed, and I. Gibson Accelerating Experimental Design by Incorporating Experimenter Hunches Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp. 257-266, 2018. [84/948=9%] Abstract
Accelerating Experimental Design by Incorporating Experimenter Hunches
Experimental design is a process of obtaining a product with target property via experimentation. Bayesian optimization offers a sample-efficient tool for experimental design when experiments are expensive. Often, expert experimenters have 'hunches' about the behavior of the experimental system, offering potentials to further improve the efficiency. In this paper, we consider per-variable monotonic trend in the underlying property that results in a unimodal trend in those variables for a target value optimization. For example, sweetness of a candy is monotonic to the sugar content. However, to obtain a target sweetness, the utility of the sugar content becomes a unimodal function, which peaks at the value giving the target sweetness and falls off both ways. In this paper, we propose a novel method to solve such problems that achieves two main objectives: a) the monotonicity information is used to the fullest extent possible, whilst ensuring that b) the convergence guarantee remains intact. This is achieved by a two-stage Gaussian process modeling, where the first stage uses the monotonicity trend to model the underlying property, and the second stage uses `virtual' samples, sampled from the first, to model the target value optimization function. The process is made theoretically consistent by adding appropriate adjustment factor in the posterior computation, necessitated because of using the `virtual' samples. The proposed method is evaluated through both simulations and real world experimental design problems of a) new short polymer fiber with the target length, and b) designing of a new three dimensional porous scaffolding with a target porosity. In all scenarios our method demonstrates faster convergence than the basic Bayesian optimization approach not using such `hunches'.
J. Berk, V. Nguyen, S. Gupta, S. Rana, S. Venkatesh Exploration Enhanced Expected Improvement for Bayesian Optimization European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), pp. 621-637, 2018. Abstract
Exploration Enhanced Expected Improvement for Bayesian Optimization
Bayesian optimization (BO) is a sample-efficient method for global
optimization of expensive, noisy, black-box functions using probabilistic methods. The performance of a BO method depends on its selection strategy through
an acquisition function. This must balance improving our understanding of the
function in unknown regions (exploration) with locally improving on known promising samples (exploitation). Expected improvement (EI) is one of the most widely
used acquisition functions for BO. Unfortunately, it has a tendency to over-exploit,
meaning that it can be slow in finding new peaks. We propose a modification to EI
that will allow for increased early exploration while providing similar exploitation once the system has been suitably explored. We also prove that our method
has a sub-linear convergence rate and test it on a range of functions to compare
its performance against the standard EI and other competing methods.
X. Zhang, W. Li, V. Nguyen, F. Zhuang, H. Xiong, S. Lu Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning International Joint Conference on Artificial Intelligence, (IJCAI), pp. 3125-3131, 2018. Abstract
Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning
Multi-label learning is widely applied in many realworld
applications, such as image and gene annotation.
While most of the existing multi-label
learning models focus on the single-task learning
problem, there are always some tasks that share
some commonalities, which can help each other to
improve the learning performances if the knowledge
in the similar tasks can be smartly shared.
In this paper, we propose a LABel-sensitive TAsk
Grouping framework, named LABTAG, based on
Bayesian nonparametric approach for multi-task
multi-label classification. The proposed framework
explores the label correlations to capture featurelabel
patterns, and clusters similar tasks into groups
with shared knowledge, which are learned jointly
to produce a strengthened multi-task multi-label
model. We evaluate the model performance on
three public multi-task multi-label data sets, and the
results show that LABTAG outperforms the compared
baselines with a significant margin.
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh Filtering Bayesian Optimization Approach in Weakly Specified Search Space Knowledge and Information Systems, (KAIS), 2018. Abstract
Filtering Bayesian Optimization Approach in Weakly Specified Search Space
Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions.
Systems implementing BO have successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values.
In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization—if a space is too large, we can miss the optimum with a limited budget, and on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice.
Therefore, in this paper, we narrow down to consider specifically the setting of “weakly specified” search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum.
We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. We develop an efficient algorithm for this strategy and derive its regret bound.
These theoretical results are complemented by an extensive set of experiments on benchmark functions and two real-world applications which demonstrate the benefits of our proposed approach.
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh Regret for Expected Improvement over the Best-Observed Value and Stopping Condition Proceedings of The 9th Asian Conference on Machine Learning, (ACML), pp. 279-294, 2017. Abstract
Regret for Expected Improvement over the Best-Observed Value and Stopping Condition
Bayesian optimization (BO) is a sample-efficient method for global optimization of expensive,
noisy, black-box functions using probabilistic methods. The performance of a BO method depends
on its selection strategy through the acquisition function. Expected improvement (EI) is one of
the most widely used acquisition functions for BO that finds the expectation of the improvement
function over the incumbent. The incumbent is usually selected as the best-observed value so far,
termed as y^max (for the maximizing problem). Recent work has studied the convergence rate for
EI under some mild assumptions or zero noise of observations. Especially, the work of Wang and
de Freitas (2014) has derived the sublinear regret for EI under a stochastic noise. However, due to
the difficulty in stochastic noise setting and to make the convergent proof feasible, they use an alternative
choice for the incumbent as the maximum of the Gaussian process predictive mean, \mu^max.
This modification makes the algorithm computationally inefficient because it requires an additional
global optimization step to estimate mmax that is costly and may be inaccurate. To address this
issue, we derive a sublinear convergence rate for EI using the commonly used ymax. Moreover, our
analysis is the first to study a stopping criteria for EI to prevent unnecessary evaluations. Our analysis
complements the results of Wang and de Freitas (2014) to theoretically cover two incumbent
settings for EI. Finally, we demonstrate empirically that EI using y^max is both more computationally
efficiency and more accurate than EI using \mu^max.
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh Bayesian Optimization in Weakly Specified Search Space Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 347-356, 2017. [72/778=9%] Abstract
Bayesian Optimization in Weakly Specified Search Space
Bayesian optimization (BO) has recently emerged
as a powerful and flexible tool for hyper-parameter tuning and
more generally for the efficient global optimization of expensive
black-box functions. Systems implementing BO has successfully
solved difficult problems in automatic design choices and machine
learning hyper-parameters tunings. Many recent advances in
the methodologies and theories underlying Bayesian optimization
have extended the framework to new applications and provided
greater insights into the behavior of these algorithms. Still, these
established techniques always require a user-defined space to
perform optimization. This pre-defined space specifies the ranges
of hyper-parameter values. In many situations, however, it can
be difficult to prescribe such spaces, as a prior knowledge is
often unavailable. Setting these regions arbitrarily can lead to
inefficient optimization - if a space is too large, we can miss the
optimum with a limited budget, on the other hand, if a space is
too small, it may not contain the optimum point that we want to
get. The unknown search space problem is intractable to solve in
practice. Therefore, in this paper, we narrow down to consider
specifically the setting of “weakly specified” search space for
Bayesian optimization. By weakly specified space, we mean that
the pre-defined space is placed at a sufficiently good region so that
the optimization can expand and reach to the optimum. However,
this pre-defined space need not include the global optimum.
We tackle this problem by proposing the filtering expansion
strategy for Bayesian optimization. Our approach starts from
the initial region and gradually expands the search space. We
develop an efficient algorithm for this strategy and derive its
regret bound. These theoretical results are complemented by an
extensive set of experiments on benchmark functions and two
real-world applications which demonstrate the benefits of our
T. Le, K. Nguyen, V. Nguyen, T. D. Nguyen, D. Phung GoGP: Fast Online Regression with Gaussian Processes Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 257-266, 2017. [72/778=9%] Abstract
GoGP: Fast Online Regression with Gaussian Processes
One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive
irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative
representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence
rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori.
Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data.
We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task.
The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared
with its rivals under online setting.
More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
T. Le, T. D Nguyen, V. Nguyen, D. Phung, Approximation Vector Machines for Large-scale Online Learning Journal of Machine Learning Research, (JMLR), 2017. Abstract
Approximation Vector Machines for Large-scale Online Learning
One of the most challenging problems in kernel online learning is to bound the model size
and to promote model sparsity. Sparse models not only improve computation and memory
usage, but also enhance the generalization capacity – a principle that concurs with the law
of parsimony. However, inappropriate sparsity modeling may also significantly degrade the
performance. In this paper, we propose Approximation Vector Machine (AVM), a model
that can simultaneously encourage sparsity and safeguard its risk in compromising the performance.
In an online setting context, when an incoming instance arrives, we approximate
this instance by one of its neighbors whose distance to it is less than a predefined threshold.
Our key intuition is that since the newly seen instance is expressed by its nearby neighbor
the optimal performance can be analytically formulated and maintained. We develop
theoretical foundations to support this intuition and further establish an analysis for the
common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classifi-
cation task) and `1, `2, and ε-insensitive (i.e., for the regression task) to characterize the
gap between the approximation and optimal solutions. This gap crucially depends on two
key factors including the frequency of approximation (i.e., how frequent the approximation
operation takes place) and the predefined threshold. We conducted extensive experiments
for classification and regression tasks in batch and online modes using several benchmark
datasets. The quantitative results show that our proposed AVM obtained comparable predictive
performances with current state-of-the-art methods while simultaneously achieving
significant computational speed-up due to the ability of the proposed AVM in maintaining
the model size.
S. Rana, C. Li, S. Gupta, V. Nguyen, S. Venkatesh High Dimensional Bayesian Optimization with Elastic Gaussian Process Proceedings of the 34th International Conference on Machine Learning, (ICML), pp 2883-2891, 2017. Abstract
High Dimensional Bayesian Optimization with Elastic Gaussian Process
Bayesian optimization is an efficient way to optimize expensive black-box functions such as designing a new product with highest quality or hyperparameter tuning of a machine learning algorithm. However, it has a serious limitation when the parameter space is high-dimensional as Bayesian optimization crucially depends on solving a global optimization of a surrogate utility function in the same sized dimensions. The surrogate utility function, known commonly as acquisition function is a continuous function but can be extremely sharp at high dimension - having only a few peaks marooned in a large terrain of almost flat surface. Global optimization algorithms such as DIRECT are infeasible at higher dimensions and gradient-dependent methods cannot move if initialized in the flat terrain. We propose an algorithm that enables local gradient-dependent algorithms to move through the flat terrain by using a sequence of gross-to-finer Gaussian process priors on the objective function as we leverage two underlying facts - a) there exists a large enough length-scales for which the acquisition function can be made to have a significant gradient at any location in the parameter space, and b) the extrema of the consecutive acquisition functions are close although they are different only due to a small difference in the length-scales. Theoretical guarantees are provided and experiments clearly demonstrate the utility of the proposed method at high dimension using both benchmark test functions and real-world case studies.
V. Nguyen, D. Phung, T. Le, H. Bui Discriminative Bayesian Nonparametric Clustering Proceedings of International Joint Conference on Artificial Intelligence, (IJCAI), pp 2550-2556, 2017. Abstract
Discriminative Bayesian Nonparametric Clustering
We propose a general framework for discriminative
Bayesian nonparametric clustering to promote
the inter-discrimination among the learned clusters
in a fully Bayesian nonparametric (BNP) manner.
Our method combines existing BNP clustering and
discriminative models by enforcing latent cluster
indices to be consistent with the predicted labels
resulted from probabilistic discriminative model.
This formulation results in a well-defined generative
process wherein we can use either logistic regression
or SVM for discrimination. Using the proposed
framework, we develop two novel discriminative
BNP variants: the discriminative Dirichlet
process mixtures, and the discriminative-state in-
finite HMMs for sequential data. We develop ef-
ficient data-augmentation Gibbs samplers for posterior
inference. Extensive experiments in image
clustering and dynamic location clustering demonstrate
that by encouraging discrimination between
induced clusters, our model enhances the quality of
clustering in comparison with the traditional generative
C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, A. Shilton High Dimensional Bayesian Optimization Using Dropout Proceedings of International Joint Conference on Artificial Intelligence , (IJCAI), pp 2096-2102, 2017. Abstract
High Dimensional Bayesian Optimization Using Dropout
Scaling Bayesian optimization to high dimensions
is challenging task as the global optimization of
high-dimensional acquisition function can be expensive
and often infeasible. Existing methods depend
either on limited “active” variables or the additive
form of the objective function. We propose
a new method for high-dimensional Bayesian optimization,
that uses a dropout strategy to optimize
only a subset of variables at each iteration. We derive
theoretical bounds for the regret and show how
it can inform the derivation of our algorithm. We
demonstrate the efficacy of our algorithms for optimization
on two benchmark functions and two realworld
applications - training cascade classifiers and
optimizing alloy composition
V. Nguyen, S. K. Gupta, S. Rana, C. Li, S. Venkatesh A Bayesian Nonparametric Approach for Multi-label Classification Proceedings of The 8th Asian Conference on Machine Learning, (ACML), pp 254-269, 2016. Abstract
A Bayesian Nonparametric Approach for Multi-label Classification
Many real-world applications require multi-label classification where multiple target labels
are assigned to each instance. In multi-label classification, there exist the intrinsic correlations
between the labels and features. These correlations are beneficial for multi-label
classification task since they reflect the coexistence of the input and output spaces that can
be exploited for prediction. Traditional classification methods have attempted to reveal
these correlations in different ways. However, existing methods demand expensive computation
complexity for finding such correlation structures. Furthermore, these approaches
can not identify the suitable number of label-feature correlation patterns. In this paper, we
propose a Bayesian nonparametric (BNP) framework for multi-label classification that can
automatically learn and exploit the unknown number of multi-label correlation. We utilize
the recent techniques in stochastic inference to derive the cheap (but efficient) posterior
inference algorithm for the model. In addition, our model can naturally exploit the useful
information from missing label samples. Furthermore, we extend the model to update parameters
in an online fashion that highlights the flexibility of our model against the existing
approaches. We compare our method with the state-of-the-art multi-label classification algorithms
on real-world datasets using both complete and missing label settings. Our model
achieves better classification accuracy while our running time is consistently much faster
than the baselines in an order of magnitude.
V. Nguyen, S. Rana, S. K. Gupta, C. Li, S. Venkatesh Budgeted Batch Bayesian Optimization Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 1107-1112, 2016. [180/918=19%] Abstract
Budgeted Batch Bayesian Optimization
Parameter settings profoundly impact the performance
of machine learning algorithms and laboratory experiments.
The classical trial-error methods are exponentially expensive
in large parameter spaces, and Bayesian optimization (BO)
offers an elegant alternative for global optimization of black box
functions. In situations where the functions can be evaluated
at multiple points simultaneously, batch Bayesian optimization
is used. Current batch BO approaches are restrictive in fixing
the number of evaluations per batch, and this can be wasteful
when the number of specified evaluations is larger than the
number of real maxima in the underlying acquisition function.
We present the budgeted batch Bayesian optimization (B3O) for
hyper-parameter tuning and experimental design - we identify
the appropriate batch size for each iteration in an elegant
way. In particular, we use the infinite Gaussian mixture model
(IGMM) for automatically identifying the number of peaks in
the underlying acquisition functions. We solve the intractability
of estimating the IGMM directly from the acquisition function
by formulating the batch generalized slice sampling to efficiently
draw samples from the acquisition function. We perform extensive
experiments for benchmark functions and two real world
applications - machine learning hyper-parameter tuning and
experimental design for alloy hardening. We show empirically
that the proposed B3O outperforms the existing fixed batch BO
approaches in finding the optimum whilst requiring a fewer
number of evaluations, thus saving cost and time.
V. Nguyen, T. D. Nguyen, T. Le, S. Venkatesh, D. Phung One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 1113-1118, 2016. [180/918=19%] Abstract
One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems
Logistic regression (LR) is at the cornerstone of classification. Its extension for multiclass classification is the
workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where
the class labels are not known in advance, a problem we term label-drift classification, in a similar spirit of the so-called conceptdrift
problem in the literature. Label-drift classification problem naturally occurs in many applications, especially in the context
of streaming and online settings where the incoming data may contain samples categorized with new classes that have not
been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running
time and label-drift requirements. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR)
to offer a principled treatment for large-scale and label-drift classifications. Our key contribution is the derivation of sufficient
statistic update for MAP estimation of Polya-Gamma augmentation for LR. Manipulating these sufficient statistics is convenient,
allowing our proposed method to efficiently perform the labeldrift classification under an online setting without retraining the
model from scratch. To handle large-scale classification for big data, we further extend our OLR to a distributed setting for
parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale
datasets with more than one hundred million data points. The experimental results show that the predictive performances of our
methods are comparable or better than those of state-of-the-art baselines whilst the execution time is much faster at an order of
magnitude. To measure the inherent trade-off between speed and accuracy, we propose quadrant visualization and quadrant score,
on which our proposed model outperforms other methods on all datasets. In addition, the OLR and Spark-OLR are invariant
to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse
of big data cross-validation to select optimal hyperparameters.
T. Le, T.D. Nguyen,V. Nguyen, D. Phung Dual Space Gradient Descent for Online Learning Advances in Neural Information Processing Systems, (NIPS), pp 4583-4591, 2016. Abstract
Dual Space Gradient Descent for Online Learning
One crucial goal in kernel online learning is to bound the model size. Common
approaches employ budget maintenance procedures to restrict the model sizes using
removal, projection, or merging strategies. Although projection and merging, in the
literature, are known to be the most effective strategies, they demand extensive com
putation whilst removal strategy fails to retain information of the removed vectors.
An alternative way to address the model size problem is to apply random features
to approximate the kernel function. This allows the model to be maintained directly
in the random feature space, hence effectively resolve the curse of kernelization.
However, this approach still suffers from a serious shortcoming as it needs to use a
high dimensional random feature space to achieve a sufficiently accurate kernel
approximation. Consequently, it leads to a significant increase in the computational
cost. To address all of these aforementioned challenges, we present in this paper
the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes
random features as an auxiliary space to maintain information from data points
removed during budget maintenance. Consequently, our approach permits the
budget to be maintained in a simple, direct and elegant way while simultaneously
mitigating the impact of the dimensionality issue on learning performance. We
further provide convergence analysis and extensively conduct experiments on five
real-world datasets to demonstrate the predictive performance and scalability of
our proposed method in comparison with the state-of-the-art baselines.
T. Le, V. Nguyen, TD. Nguyen, D. Phung Nonparametric Budgeted Stochastic Gradient Descent Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, (AISTATS), pp 654-572, 2016. Abstract
One of the most challenging problems in kernel online learning is to bound the model size.
Budgeted kernel online learning addresses this issue by bounding the model size to a predefined budget.
However, determining an appropriate value for such predefined budget is arduous.
In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way.
We provide theoretical analysis to show that our framework is guaranteed to converge for a large collection of loss functions
(e.g. Hinge, Logistic, L2, L1, and ε-insensitive) which enables the proposed algorithm to perform both classification and regression tasks without hurting the ideal convergence rate O(1/T)
of the standard Stochastic Gradient Descent. We validate our algorithm on the real-world datasets to consolidate the theoretical claims.
T. Nguyen, V. Nguyen, FD. Salim, D. Phung SECC: Simultaneous Extraction of Context and Community from Pervasive Signals Proceedings of 2016 IEEE International Conference on Pervasive Computing and Communications, (PERCOM), pp 1-9, 2016. Abstract
SECC: Simultaneous Extraction of Context and Community from Pervasive Signals
Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the
wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics
or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder
and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities.
This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework.
Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels.
We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.
V. Nguyen, D. Phung, T. Le, S. Venkatesh Large Sample Asymptotic for Nonparametric Mixture Model with Count Data. Workshop on Advances in Approximate Bayesian Inference at Neural Information Processing Systems, (NIPSW), 2015. Abstract
Large Sample Asymptotic for Nonparametric Mixture Model with Count Data
Bayesian nonparametric models have become popular recently due to its flexibility
in identifying the unknown number of clusters. However, the flexibility comes at
a cost for learning. Thus, Small Variance Asymptotic (SVA) is one of the promising
approach for scalability in Bayesian nonparametric models. SVA approach for
count data is also developed in which the likelihood function is replaced by the
Kullback–Leibler divergence. In this paper, we present the Large Sample Asymptotic
for count data when the number of sample in Multinomial distribution goes
to infinity, we derive the similar result to SVA for scalable clustering.
V. Nguyen, D. Phung, D.S. Pham, and S. Venkatesh Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance. Annals of Data Science, pp 1-21, 2015. Abstract
Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance
In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged
in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying
such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent
segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of
Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for
stream data segmentation and pattern discovery.
In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.
V. Nguyen, D. Phung, S. Venkatesh, and H. Bui A Bayesian Nonparametric Approach to Multilevel Regression. Advances in Knowledge Discovery and Data Mining (PAKDD), pp 330-342, 2015. Abstract
A Bayesian Nonparametric Approach to Multilevel Regression
Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics
and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups.
Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context
distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression.
We provide collapsed Gibbs sampler for posterior inference.
We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model.
V. Nguyen, D. Phung, L. Nguyen, S. Venkatesh and H. Bui Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts. Proceedings of The 31st International Conference on Machine Learning (ICML), pp. 288–296, 2014. Abstract
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts
We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents
and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context
observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way:
integrating out all contents results in the DPM over contexts, whereas integrating out group-speciﬁc contexts results in the nDP mixture over content variables. We provide a Polya-urn view of
the model and an efﬁcient collapsed Gibbs inference procedure.
Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.
T.V. Nguyen, D. Phung, S. K. Gupta, and S. Venkatesh Interactive Browsing System for Anomaly Video Surveillance. IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp 384-389, 2013. Abstract
Interactive Browsing System for Anomaly Video Surveillance
Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users.
This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare
motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern.
A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames
that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner.
We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.