bruge
  • Home
  • Publications
  • CV
  • ACML20 Tutorial

2020

V. Nguyen, V. Masrani, R. Brekelmans, M. A. Osborne, F. Wood
Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective
Advances in Neural Information Processing Systems (NeurIPS), accepted, 2020.
Abstract

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational inference objective that lower-bounds the log evidence via one-dimensional Riemann integration, requires choosing a ``schedule'' of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and sigmoid belief networks.
PDF Code
V. Nguyen*, S. Schulze*, M. A. Osborne
Bayesian Optimisation for Iterative Learning
Advances in Neural Information Processing Systems (NeurIPS), accepted, 2020.
Preliminary version at 7th AutoML Workshop at International Conference on Machine Learning (ICML), 2020.
Abstract

Bayesian Optimization for Iterative Learning

The success of deep (reinforcement) learning systems crucially depends on the correct choice of hyperparameters which are notoriously sensitive and expensive to evaluate. Training these systems typically requires running iterative processes over multiple epochs or episodes. Traditional approaches only consider final performances of a hyperparameter although intermediate information from the learning curve is readily available. In this paper, we present a Bayesian optimization approach which exploits the iterative structure of learning algorithms for efficient hyperparameter tuning. First, we transform each training curve into a numeric score. Second, we selectively augment the data using the auxiliary information from the curve. This augmentation step enables modeling efficiency while preventing the ill-conditioned issue of Gaussian process covariance matrix happened when adding the whole curve. We demonstrate the efficiency of our algorithm by tuning hyperparameters for the training of deep reinforcement learning agents and convolutional neural networks. Our algorithm outperforms all existing baselines in identifying optimal hyperparameters in minimal time.
PDF Code Talk
J. Parker-Holder, V. Nguyen, S. Roberts
Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits
Advances in Neural Information Processing Systems (NeurIPS), accepted, 2020.
Preliminary version at 7th AutoML Workshop at International Conference on Machine Learning (ICML), 2020.
Abstract

Provably Efficient Online Hyperparameter Optimization with Population-Based Bandits

Selecting optimal hyperparameters is a key challenge in machine learning. A recent approach to this problem (PBT) showed it is possible to achieve impressive performance by updating both weights and hyperparameters in a single training run of a population of agents. Despite it's success, PBT relies on heuristics to explore the hyperparameter space, thus lacks theoretical guarantees, requires vast computational resources and often suffers from mode collapse when this is not available. In this work we introduce Population-Based Bandits (PB2), the first provably efficient PBT-style algorithm. PB2 uses a probabilistic model to balance exploration and exploitation, thus it is able to discover high performing hyperparameter configurations with far fewer agents than typically required by PBT.
PDF Code Blog Contributed Talk [3% selected]
V. Nguyen, M. A. Osborne
Knowing The What But Not The Where in Bayesian Optimization
International Conference on Machine Learning (ICML), 2020.
Abstract

Knowing The What But Not The Where in Bayesian Optimization

Bayesian optimization has demonstrated impressive success in finding the optimum location x* and value f*=f(x*)=max f(x) of the black-box function f. In some applications, however, the optimum value is known in advance and the goal is to find the corresponding optimum location. Existing work in Bayesian optimization (BO) has not effectively exploited the knowledge of f* for optimization. In this paper, we consider a new setting in BO in which the knowledge of the optimum value is available. Our goal is to exploit the knowledge about f* to search for the location x* efficiently. To achieve this goal, we first transform the Gaussian process surrogate using the information about the optimum value. Then, we propose two acquisition functions, called confidence bound minimization and expected regret minimization, which exploit the knowledge about the optimum value to identify the optimum location efficiently. We show that our approaches work both intuitively and quantitatively achieve better performance against standard BO methods. We demonstrate real applications in tuning a deep reinforcement learning algorithm on the CartPole problem and XGBoost on Skin Segmentation dataset in which the optimum values are publicly available.
PDF Code Talk
B. Ru, AS. Alvi, V. Nguyen, M. A. Osborne, SJ. Roberts
Bayesian Optimisation over Multiple Continuous and Categorical Inputs
International Conference on Machine Learning (ICML), 2020.
Abstract

Bayesian Optimisation over Multiple Continuous and Categorical Inputs

Efficient optimisation of black-box problems that comprise both continuous and categorical inputs is important, yet poses significant challenges. We propose a new approach, Continuous and Categorical Bayesian Optimisation (CoCaBO), which combines the strengths of multi-armed bandits and Bayesian optimisation to select values for both categorical and continuous inputs. We model this mixed-type space using a Gaussian Process kernel, designed to allow sharing of information across multiple categorical variables, each with multiple possible values; this allows CoCaBO to leverage all available data efficiently. We extend our method to the batch setting and propose an efficient selection procedure that dynamically balances exploration and exploitation whilst encouraging batch diversity. We demonstrate empirically that our method outperforms existing approaches on both synthetic and real-world optimisation tasks with continuous and categorical inputs.
PDF Code Talk
N. van Esbroeck, D. Lennon, H. Moon, V. Nguyen, F. Vigneau, LC. Camenzind, L. Yu, DM. Zumbühl, G. A. D. Briggs, D. Sejdinovic, N. Ares
Quantum device fine-tuning using unsupervised embedding learning
New Journal of Physics 2020.
Abstract

Quantum device fine-tuning using unsupervised embedding learning

Quantum devices with a large number of gate electrodes allow for precise control of device parameters. This capability is hard to fully exploit due to the complex dependence of these parameters on applied gate voltages. We experimentally demonstrate an algorithm capable of fine-tuning several device parameters at once. The algorithm acquires a measurement and assigns it a score using a variational auto-encoder. Gate voltage settings are set to optimise this score in real-time in an unsupervised fashion. We report fine-tuning times of a double quantum dot device within approximately 40 min.
PDF

2019

V. Nguyen, D. Lennon, H. Moon, N. Esbroeck, D. Sejdinovic, M. A. Osborne, G. A. D. Briggs, N. Ares
Controlling Quantum Device Measurement using Deep Reinforcement Learning
Deep Reinforcement Learning workshop, NeurIPS 2019.
Abstract

Controlling Quantum Device Measurement using Deep Reinforcement Learning

Qubits based on semiconductor quantum dot devices are promising building blocks for the realisation of quantum computers. However, measuring and characterising these quantum dot devices can be challenging and laborious for the experimentalists. In this paper, we develop an elegant application using deep reinforcement learning for controlling the measurement of quantum dot devices. Specifically, we present a computer-automated algorithm that measures a map of current flowing through a double quantum dot device for different settings of its gate electrodes. The algorithm seeks particular features called bias-triangles indicating the device is in the right operating regime of realising a qubit. Our approach requires no human intervention and significantly reduces the measurement time. This work alleviates the user effort required to measure multiple quantum dot devices, each with multiple gate electrodes.
Workshop PDF Arxiv
S. Kessler, V. Nguyen, S. Zohren, S. Robert
Indian Buffet Neural Networks for Continual Learning
Bayesian Deep Learning workshop, NeurIPS 2019.
Abstract

Indian Buffet Neural Networks for Continual Learning

PDF

V. Nguyen, S. Gupta, S. Rana, M. Thai, C. Li, S. Venkatesh
Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp. 1270-1275, 2019. [194/1046=18%]
Preliminary version appears at NIPS Workshop on Bayesian Optimization, (NIPSW), 2017.
Abstract

Efficient Bayesian Optimization for Uncertainty Reduction over Perceived Optima Locations

PDF Code

2018

S. Gopakumar, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh
Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation
Advances in Neural Information Processing Systems, (NeurIPS), pp. 5470-5478, 2018.
Abstract

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation

We introduce algorithmic assurance, the problem of testing whether machine learning algorithms are conforming to their intended design goal. We address this problem by proposing an efficient framework for algorithmic testing. To provide assurance, we need to efficiently discover scenarios where an algorithm decision deviates maximally from its intended gold standard. We mathematically formulate this task as an optimisation problem of an expensive, black-box function. We use an active learning approach based on Bayesian optimisation to solve this optimisation problem. We extend this framework to algorithms with vector-valued outputs by making appropriate modification in Bayesian optimisation via the EXP3 algorithm. We theoretically analyse our methods for convergence. Using two real-world applications, we demonstrate the efficiency of our methods. The significance of our problem formulation and initial solutions is that it will serve as the foundation in assuring humans about machines making complex decisions.
PDF Code
C. Li, S. Rana, S. Gupta, V. Nguyen, S. Venkatesh, A. Sutti, D. Rubin, T. Slezak, M. Height, M. Mohammed, and I. Gibson
Accelerating Experimental Design by Incorporating Experimenter Hunches
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp. 257-266, 2018. [84/948=9%]
Abstract

Accelerating Experimental Design by Incorporating Experimenter Hunches

Experimental design is a process of obtaining a product with target property via experimentation. Bayesian optimization offers a sample-efficient tool for experimental design when experiments are expensive. Often, expert experimenters have 'hunches' about the behavior of the experimental system, offering potentials to further improve the efficiency. In this paper, we consider per-variable monotonic trend in the underlying property that results in a unimodal trend in those variables for a target value optimization. For example, sweetness of a candy is monotonic to the sugar content. However, to obtain a target sweetness, the utility of the sugar content becomes a unimodal function, which peaks at the value giving the target sweetness and falls off both ways. In this paper, we propose a novel method to solve such problems that achieves two main objectives: a) the monotonicity information is used to the fullest extent possible, whilst ensuring that b) the convergence guarantee remains intact. This is achieved by a two-stage Gaussian process modeling, where the first stage uses the monotonicity trend to model the underlying property, and the second stage uses `virtual' samples, sampled from the first, to model the target value optimization function. The process is made theoretically consistent by adding appropriate adjustment factor in the posterior computation, necessitated because of using the `virtual' samples. The proposed method is evaluated through both simulations and real world experimental design problems of a) new short polymer fiber with the target length, and b) designing of a new three dimensional porous scaffolding with a target porosity. In all scenarios our method demonstrates faster convergence than the basic Bayesian optimization approach not using such `hunches'.
PDF Code
J. Berk, V. Nguyen, S. Gupta, S. Rana, S. Venkatesh
Exploration Enhanced Expected Improvement for Bayesian Optimization
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, (ECML-PKDD), pp. 621-637, 2018.
Abstract

Exploration Enhanced Expected Improvement for Bayesian Optimization

Bayesian optimization (BO) is a sample-efficient method for global optimization of expensive, noisy, black-box functions using probabilistic methods. The performance of a BO method depends on its selection strategy through an acquisition function. This must balance improving our understanding of the function in unknown regions (exploration) with locally improving on known promising samples (exploitation). Expected improvement (EI) is one of the most widely used acquisition functions for BO. Unfortunately, it has a tendency to over-exploit, meaning that it can be slow in finding new peaks. We propose a modification to EI that will allow for increased early exploration while providing similar exploitation once the system has been suitably explored. We also prove that our method has a sub-linear convergence rate and test it on a range of functions to compare its performance against the standard EI and other competing methods.
PDF Code
X. Zhang, W. Li, V. Nguyen, F. Zhuang, H. Xiong, S. Lu
Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning
International Joint Conference on Artificial Intelligence, (IJCAI), pp. 3125-3131, 2018.
Abstract

Label-Sensitive Task Grouping by Bayesian Nonparametric Approach for Multi-Task Multi-Label Learning

Multi-label learning is widely applied in many realworld applications, such as image and gene annotation. While most of the existing multi-label learning models focus on the single-task learning problem, there are always some tasks that share some commonalities, which can help each other to improve the learning performances if the knowledge in the similar tasks can be smartly shared. In this paper, we propose a LABel-sensitive TAsk Grouping framework, named LABTAG, based on Bayesian nonparametric approach for multi-task multi-label classification. The proposed framework explores the label correlations to capture featurelabel patterns, and clusters similar tasks into groups with shared knowledge, which are learned jointly to produce a strengthened multi-task multi-label model. We evaluate the model performance on three public multi-task multi-label data sets, and the results show that LABTAG outperforms the compared baselines with a significant margin.
PDF
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh
Filtering Bayesian Optimization Approach in Weakly Specified Search Space
Knowledge and Information Systems, (KAIS), 2018.
Abstract

Filtering Bayesian Optimization Approach in Weakly Specified Search Space

Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO have successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization—if a space is too large, we can miss the optimum with a limited budget, and on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of “weakly specified” search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. We develop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and two real-world applications which demonstrate the benefits of our proposed approach.
PDF Code

2017

S. Rana, C. Li, S. Gupta, V. Nguyen, S. Venkatesh
High Dimensional Bayesian Optimization with Elastic Gaussian Process
Proceedings of the 34th International Conference on Machine Learning, (ICML), pp 2883-2891, 2017.
Abstract

High Dimensional Bayesian Optimization with Elastic Gaussian Process

Bayesian optimization is an efficient way to optimize expensive black-box functions such as designing a new product with highest quality or hyperparameter tuning of a machine learning algorithm. However, it has a serious limitation when the parameter space is high-dimensional as Bayesian optimization crucially depends on solving a global optimization of a surrogate utility function in the same sized dimensions. The surrogate utility function, known commonly as acquisition function is a continuous function but can be extremely sharp at high dimension - having only a few peaks marooned in a large terrain of almost flat surface. Global optimization algorithms such as DIRECT are infeasible at higher dimensions and gradient-dependent methods cannot move if initialized in the flat terrain. We propose an algorithm that enables local gradient-dependent algorithms to move through the flat terrain by using a sequence of gross-to-finer Gaussian process priors on the objective function as we leverage two underlying facts - a) there exists a large enough length-scales for which the acquisition function can be made to have a significant gradient at any location in the parameter space, and b) the extrema of the consecutive acquisition functions are close although they are different only due to a small difference in the length-scales. Theoretical guarantees are provided and experiments clearly demonstrate the utility of the proposed method at high dimension using both benchmark test functions and real-world case studies.
PDF
T. Le, T. D Nguyen, V. Nguyen, D. Phung,
Approximation Vector Machines for Large-scale Online Learning
Journal of Machine Learning Research, (JMLR), 2017.
Abstract

Approximation Vector Machines for Large-scale Online Learning

One of the most challenging problems in kernel online learning is to bound the model size and to promote model sparsity. Sparse models not only improve computation and memory usage, but also enhance the generalization capacity – a principle that concurs with the law of parsimony. However, inappropriate sparsity modeling may also significantly degrade the performance. In this paper, we propose Approximation Vector Machine (AVM), a model that can simultaneously encourage sparsity and safeguard its risk in compromising the performance. In an online setting context, when an incoming instance arrives, we approximate this instance by one of its neighbors whose distance to it is less than a predefined threshold. Our key intuition is that since the newly seen instance is expressed by its nearby neighbor the optimal performance can be analytically formulated and maintained. We develop theoretical foundations to support this intuition and further establish an analysis for the common loss functions including Hinge, smooth Hinge, and Logistic (i.e., for the classifi- cation task) and `1, `2, and ε-insensitive (i.e., for the regression task) to characterize the gap between the approximation and optimal solutions. This gap crucially depends on two key factors including the frequency of approximation (i.e., how frequent the approximation operation takes place) and the predefined threshold. We conducted extensive experiments for classification and regression tasks in batch and online modes using several benchmark datasets. The quantitative results show that our proposed AVM obtained comparable predictive performances with current state-of-the-art methods while simultaneously achieving significant computational speed-up due to the ability of the proposed AVM in maintaining the model size.
PDF Code
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh
Regret for Expected Improvement over the Best-Observed Value and Stopping Condition
Proceedings of The 9th Asian Conference on Machine Learning, (ACML), pp. 279-294, 2017.
Abstract

Regret for Expected Improvement over the Best-Observed Value and Stopping Condition

Bayesian optimization (BO) is a sample-efficient method for global optimization of expensive, noisy, black-box functions using probabilistic methods. The performance of a BO method depends on its selection strategy through the acquisition function. Expected improvement (EI) is one of the most widely used acquisition functions for BO that finds the expectation of the improvement function over the incumbent. The incumbent is usually selected as the best-observed value so far, termed as y^max (for the maximizing problem). Recent work has studied the convergence rate for EI under some mild assumptions or zero noise of observations. Especially, the work of Wang and de Freitas (2014) has derived the sublinear regret for EI under a stochastic noise. However, due to the difficulty in stochastic noise setting and to make the convergent proof feasible, they use an alternative choice for the incumbent as the maximum of the Gaussian process predictive mean, \mu^max. This modification makes the algorithm computationally inefficient because it requires an additional global optimization step to estimate mmax that is costly and may be inaccurate. To address this issue, we derive a sublinear convergence rate for EI using the commonly used ymax. Moreover, our analysis is the first to study a stopping criteria for EI to prevent unnecessary evaluations. Our analysis complements the results of Wang and de Freitas (2014) to theoretically cover two incumbent settings for EI. Finally, we demonstrate empirically that EI using y^max is both more computationally efficiency and more accurate than EI using \mu^max.
PDF
V. Nguyen, S. Gupta, S. Rana, C. Li, S. Venkatesh
Bayesian Optimization in Weakly Specified Search Space
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 347-356, 2017. [72/778=9%]
Abstract

Bayesian Optimization in Weakly Specified Search Space

Bayesian optimization (BO) has recently emerged as a powerful and flexible tool for hyper-parameter tuning and more generally for the efficient global optimization of expensive black-box functions. Systems implementing BO has successfully solved difficult problems in automatic design choices and machine learning hyper-parameters tunings. Many recent advances in the methodologies and theories underlying Bayesian optimization have extended the framework to new applications and provided greater insights into the behavior of these algorithms. Still, these established techniques always require a user-defined space to perform optimization. This pre-defined space specifies the ranges of hyper-parameter values. In many situations, however, it can be difficult to prescribe such spaces, as a prior knowledge is often unavailable. Setting these regions arbitrarily can lead to inefficient optimization - if a space is too large, we can miss the optimum with a limited budget, on the other hand, if a space is too small, it may not contain the optimum point that we want to get. The unknown search space problem is intractable to solve in practice. Therefore, in this paper, we narrow down to consider specifically the setting of “weakly specified” search space for Bayesian optimization. By weakly specified space, we mean that the pre-defined space is placed at a sufficiently good region so that the optimization can expand and reach to the optimum. However, this pre-defined space need not include the global optimum. We tackle this problem by proposing the filtering expansion strategy for Bayesian optimization. Our approach starts from the initial region and gradually expands the search space. We develop an efficient algorithm for this strategy and derive its regret bound. These theoretical results are complemented by an extensive set of experiments on benchmark functions and two real-world applications which demonstrate the benefits of our proposed approach.
Code Selected as Best Papers, Invited for KAIS PDF
T. Le, K. Nguyen, V. Nguyen, T. D. Nguyen, D. Phung
GoGP: Fast Online Regression with Gaussian Processes
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 257-266, 2017. [72/778=9%]
Abstract

GoGP: Fast Online Regression with Gaussian Processes

One of the most current challenging problems in Gaussian process regression (GPR) is to handle large-scale datasets and to accommodate an online learning setting where data arrive irregularly on the fly. In this paper, we introduce a novel online Gaussian process model that could scale with massive datasets. Our approach is formulated based on alternative representation of the Gaussian process under geometric and optimization views, hence termed geometric-based online GP (GoGP). We developed theory to guarantee that with a good convergence rate our proposed algorithm always produces a (sparse) solution which is close to the true optima to any arbitrary level of approximation accuracy specified a priori. Furthermore, our method is proven to scale seamlessly not only with large-scale datasets, but also to adapt accurately with streaming data. We extensively evaluated our proposed model against state-of-the-art baselines using several large-scale datasets for online regression task. The experimental results show that our GoGP delivered comparable, or slightly better, predictive performance while achieving a magnitude of computational speedup compared with its rivals under online setting. More importantly, its convergence behavior is guaranteed through our theoretical analysis, which is rapid and stable while achieving lower errors.
Code Selected as Best Papers, Invited for KAIS PDF
V. Nguyen, D. Phung, T. Le, H. Bui
Discriminative Bayesian Nonparametric Clustering
Proceedings of International Joint Conference on Artificial Intelligence, (IJCAI), pp 2550-2556, 2017.
Abstract

Discriminative Bayesian Nonparametric Clustering

We propose a general framework for discriminative Bayesian nonparametric clustering to promote the inter-discrimination among the learned clusters in a fully Bayesian nonparametric (BNP) manner. Our method combines existing BNP clustering and discriminative models by enforcing latent cluster indices to be consistent with the predicted labels resulted from probabilistic discriminative model. This formulation results in a well-defined generative process wherein we can use either logistic regression or SVM for discrimination. Using the proposed framework, we develop two novel discriminative BNP variants: the discriminative Dirichlet process mixtures, and the discriminative-state in- finite HMMs for sequential data. We develop ef- ficient data-augmentation Gibbs samplers for posterior inference. Extensive experiments in image clustering and dynamic location clustering demonstrate that by encouraging discrimination between induced clusters, our model enhances the quality of clustering in comparison with the traditional generative BNP models.
PDF
C. Li, S. Gupta, S. Rana, V. Nguyen, S. Venkatesh, A. Shilton
High Dimensional Bayesian Optimization Using Dropout
Proceedings of International Joint Conference on Artificial Intelligence , (IJCAI), pp 2096-2102, 2017.
Abstract

High Dimensional Bayesian Optimization Using Dropout

Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited “active” variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a dropout strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two realworld applications - training cascade classifiers and optimizing alloy composition
PDF

2016 and before

T. Le, T.D. Nguyen,V. Nguyen, D. Phung
Dual Space Gradient Descent for Online Learning
Advances in Neural Information Processing Systems, (NIPS), pp 4583-4591, 2016.
Abstract

Dual Space Gradient Descent for Online Learning

One crucial goal in kernel online learning is to bound the model size. Common approaches employ budget maintenance procedures to restrict the model sizes using removal, projection, or merging strategies. Although projection and merging, in the literature, are known to be the most effective strategies, they demand extensive com putation whilst removal strategy fails to retain information of the removed vectors. An alternative way to address the model size problem is to apply random features to approximate the kernel function. This allows the model to be maintained directly in the random feature space, hence effectively resolve the curse of kernelization. However, this approach still suffers from a serious shortcoming as it needs to use a high dimensional random feature space to achieve a sufficiently accurate kernel approximation. Consequently, it leads to a significant increase in the computational cost. To address all of these aforementioned challenges, we present in this paper the Dual Space Gradient Descent (DualSGD), a novel framework that utilizes random features as an auxiliary space to maintain information from data points removed during budget maintenance. Consequently, our approach permits the budget to be maintained in a simple, direct and elegant way while simultaneously mitigating the impact of the dimensionality issue on learning performance. We further provide convergence analysis and extensively conduct experiments on five real-world datasets to demonstrate the predictive performance and scalability of our proposed method in comparison with the state-of-the-art baselines.
PDF Code
V. Nguyen, S. K. Gupta, S. Rana, C. Li, S. Venkatesh
A Bayesian Nonparametric Approach for Multi-label Classification
Proceedings of The 8th Asian Conference on Machine Learning, (ACML), pp 254-269, 2016.
Abstract

A Bayesian Nonparametric Approach for Multi-label Classification

Many real-world applications require multi-label classification where multiple target labels are assigned to each instance. In multi-label classification, there exist the intrinsic correlations between the labels and features. These correlations are beneficial for multi-label classification task since they reflect the coexistence of the input and output spaces that can be exploited for prediction. Traditional classification methods have attempted to reveal these correlations in different ways. However, existing methods demand expensive computation complexity for finding such correlation structures. Furthermore, these approaches can not identify the suitable number of label-feature correlation patterns. In this paper, we propose a Bayesian nonparametric (BNP) framework for multi-label classification that can automatically learn and exploit the unknown number of multi-label correlation. We utilize the recent techniques in stochastic inference to derive the cheap (but efficient) posterior inference algorithm for the model. In addition, our model can naturally exploit the useful information from missing label samples. Furthermore, we extend the model to update parameters in an online fashion that highlights the flexibility of our model against the existing approaches. We compare our method with the state-of-the-art multi-label classification algorithms on real-world datasets using both complete and missing label settings. Our model achieves better classification accuracy while our running time is consistently much faster than the baselines in an order of magnitude.
PDF Code Youtube Demo Best Paper Runner Up Award Best Poster Award
V. Nguyen, S. Rana, S. K. Gupta, C. Li, S. Venkatesh
Budgeted Batch Bayesian Optimization
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 1107-1112, 2016. [180/918=19%]
Abstract

Budgeted Batch Bayesian Optimization

Parameter settings profoundly impact the performance of machine learning algorithms and laboratory experiments. The classical trial-error methods are exponentially expensive in large parameter spaces, and Bayesian optimization (BO) offers an elegant alternative for global optimization of black box functions. In situations where the functions can be evaluated at multiple points simultaneously, batch Bayesian optimization is used. Current batch BO approaches are restrictive in fixing the number of evaluations per batch, and this can be wasteful when the number of specified evaluations is larger than the number of real maxima in the underlying acquisition function. We present the budgeted batch Bayesian optimization (B3O) for hyper-parameter tuning and experimental design - we identify the appropriate batch size for each iteration in an elegant way. In particular, we use the infinite Gaussian mixture model (IGMM) for automatically identifying the number of peaks in the underlying acquisition functions. We solve the intractability of estimating the IGMM directly from the acquisition function by formulating the batch generalized slice sampling to efficiently draw samples from the acquisition function. We perform extensive experiments for benchmark functions and two real world applications - machine learning hyper-parameter tuning and experimental design for alloy hardening. We show empirically that the proposed B3O outperforms the existing fixed batch BO approaches in finding the optimum whilst requiring a fewer number of evaluations, thus saving cost and time.
PDF Code
V. Nguyen, T. D. Nguyen, T. Le, S. Venkatesh, D. Phung
One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems
Proceedings of the IEEE International Conference on Data Mining, (ICDM), pp 1113-1118, 2016. [180/918=19%]
Abstract

One-pass Logistic Regression for Label-drift and Large-scale Classification on Distributed Systems

Logistic regression (LR) is at the cornerstone of classification. Its extension for multiclass classification is the workhorse in industry, where a set of predefined classes is required. The model, however, fails to work in the case where the class labels are not known in advance, a problem we term label-drift classification, in a similar spirit of the so-called conceptdrift problem in the literature. Label-drift classification problem naturally occurs in many applications, especially in the context of streaming and online settings where the incoming data may contain samples categorized with new classes that have not been previously seen. Additionally, in the wave of big data, traditional LR methods may fail due to their expense of running time and label-drift requirements. In this paper, we introduce a novel variant of LR, namely one-pass logistic regression (OLR) to offer a principled treatment for large-scale and label-drift classifications. Our key contribution is the derivation of sufficient statistic update for MAP estimation of Polya-Gamma augmentation for LR. Manipulating these sufficient statistics is convenient, allowing our proposed method to efficiently perform the labeldrift classification under an online setting without retraining the model from scratch. To handle large-scale classification for big data, we further extend our OLR to a distributed setting for parallelization, termed sparkling OLR (Spark-OLR). We demonstrate the scalability of our proposed methods on large-scale datasets with more than one hundred million data points. The experimental results show that the predictive performances of our methods are comparable or better than those of state-of-the-art baselines whilst the execution time is much faster at an order of magnitude. To measure the inherent trade-off between speed and accuracy, we propose quadrant visualization and quadrant score, on which our proposed model outperforms other methods on all datasets. In addition, the OLR and Spark-OLR are invariant to data shuffling and have no hyperparameter to tune that significantly benefits data practitioners and overcomes the curse of big data cross-validation to select optimal hyperparameters.
PDF Code
T. Le, V. Nguyen, TD. Nguyen, D. Phung
Nonparametric Budgeted Stochastic Gradient Descent
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, (AISTATS), pp 654-572, 2016.
Abstract

Nonparametric Budgeted Stochastic Gradient Descent

One of the most challenging problems in kernel online learning is to bound the model size. Budgeted kernel online learning addresses this issue by bounding the model size to a predefined budget. However, determining an appropriate value for such predefined budget is arduous. In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way. We provide theoretical analysis to show that our framework is guaranteed to converge for a large collection of loss functions (e.g. Hinge, Logistic, L2, L1, and ε-insensitive) which enables the proposed algorithm to perform both classification and regression tasks without hurting the ideal convergence rate O(1/T) of the standard Stochastic Gradient Descent. We validate our algorithm on the real-world datasets to consolidate the theoretical claims.
PDF Code
T. Nguyen, V. Nguyen, FD. Salim, D. Phung
SECC: Simultaneous Extraction of Context and Community from Pervasive Signals
Proceedings of 2016 IEEE International Conference on Pervasive Computing and Communications, (PERCOM), pp 1-9, 2016.
Abstract

SECC: Simultaneous Extraction of Context and Community from Pervasive Signals

Understanding user contexts and group structures plays a central role in pervasive computing. These contexts and community structures are complex to mine from data collected in the wild due to the unprecedented growth of data, noise, uncertainties and complexities. Typical existing approaches would first extract the latent patterns to explain the human dynamics or behaviors and then use them as the way to consistently formulate numerical representations for community detection, often via a clustering method. While being able to capture highorder and complex representations, these two steps are performed separately. More importantly, they face a fundamental difficulty in determining the correct number of latent patterns and communities. This paper presents an approach that seamlessly addresses these challenges to simultaneously discover latent patterns and communities in a unified Bayesian nonparametric framework. Our Simultaneous Extraction of Context and Community (SECC) model roots in the nested Dirichlet process theory which allows nested structure to be built to explain data at multiple levels. We demonstrate our framework on three public datasets where the advantages of the proposed approach are validated.
PDF
V. Nguyen
Bayesian Nonparametric Multilevel Modelling and Applications
Deakin University, Thesis, December 2015.
Abstract

Bayesian Nonparametric Multilevel Modelling and Applications

PDF
V. Nguyen, D. Phung, T. Le, S. Venkatesh
Large Sample Asymptotic for Nonparametric Mixture Model with Count Data.
Workshop on Advances in Approximate Bayesian Inference at Neural Information Processing Systems, (NIPSW), 2015.
Abstract

Large Sample Asymptotic for Nonparametric Mixture Model with Count Data

Bayesian nonparametric models have become popular recently due to its flexibility in identifying the unknown number of clusters. However, the flexibility comes at a cost for learning. Thus, Small Variance Asymptotic (SVA) is one of the promising approach for scalability in Bayesian nonparametric models. SVA approach for count data is also developed in which the likelihood function is replaced by the Kullback–Leibler divergence. In this paper, we present the Large Sample Asymptotic for count data when the number of sample in Multinomial distribution goes to infinity, we derive the similar result to SVA for scalable clustering.
PDF Code Poster
V. Nguyen, D. Phung, D.S. Pham, and S. Venkatesh
Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance.
Annals of Data Science, pp 1-21, 2015.
Abstract

Bayesian Nonparametric Approaches to Abnormality Detection in Video Surveillance

In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveillance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmentation and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparametric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.
PDF Code
V. Nguyen, D. Phung, S. Venkatesh, and H. Bui
A Bayesian Nonparametric Approach to Multilevel Regression.
Advances in Knowledge Discovery and Data Mining (PAKDD), pp 330-342, 2015.
Abstract

A Bayesian Nonparametric Approach to Multilevel Regression

Regression is at the cornerstone of statistical analysis. Multilevel regression, on the other hand, receives little research attention, though it is prevalent in economics, biostatistics and healthcare to name a few. We present a Bayesian nonparametric framework for multilevel regression where individuals including observations and outcomes are organized into groups. Furthermore, our approach exploits additional group-specific context observations, we use Dirichlet Process with product-space base measure in a nested structure to model group-level context distribution and the regression distribution to accommodate the multilevel structure of the data. The proposed model simultaneously partitions groups into cluster and perform regression. We provide collapsed Gibbs sampler for posterior inference. We perform extensive experiments on econometric panel data and healthcare longitudinal data to demonstrate the effectiveness of the proposed model.
PDF Slide
V. Nguyen, D. Phung, L. Nguyen, S. Venkatesh and H. Bui
Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts.
Proceedings of The 31st International Conference on Machine Learning (ICML), pp. 288–296, 2014.
Abstract

Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts

We present a Bayesian nonparametric framework for multilevel clustering which utilizes group-level context information to simultaneously discover low-dimensional structures of the group contents and partitions groups into clusters. Using the Dirichlet process as the building block, our model constructs a product base-measure with a nested structure to accommodate content and context observations at multiple levels. The proposed model possesses properties that link the nested Dirichlet processes (nDP) and the Dirichlet process mixture models (DPM) in an interesting way: integrating out all contents results in the DPM over contexts, whereas integrating out group-specific contexts results in the nDP mixture over content variables. We provide a Polya-urn view of the model and an efficient collapsed Gibbs inference procedure. Extensive experiments on real-world datasets demonstrate the advantage of utilizing context information via our model in both text and image domains.
PDF Slide
T.V. Nguyen, D. Phung, S. K. Gupta, and S. Venkatesh
Interactive Browsing System for Anomaly Video Surveillance.
IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), pp 384-389, 2013.
Abstract

Interactive Browsing System for Anomaly Video Surveillance

Existing anomaly detection methods in video surveillance exhibit lack of congruence between rare events detected by algorithms and what is considered anomalous by users. This paper introduces a novel browsing model to address this issue, allowing users to interactively examine rare events in an intuitive manner. Introducing a novel way to compute rare motion patterns, we estimate latent factors of foreground motion patterns through Bayesian Nonparametric Factor analysis. Each factor corresponds to a typical motion pattern. A rarity score for each factor is computed, and ordered in decreasing order of rarity, permitting users to browse events using any proportion of rare factors. Rare events correspond to frames that contain the rare factors chosen. We present the user with an interface to inspect events that incorporate these rarest factors in a spatial-temporal manner. We demonstrate the system on a public video data set, showing key aspects of the browsing paradigm.
PDF Code Poster

Last updated Dec 14, 2020.
Based on the code of Adam Lopez.
Created with git, jekyll, bootstrap, and vim.