Imagine an alien drone suddenly landing in your backyard one night. After plucking up enough courage, you'd pop its hood to discover gooey technology beyond all comprehension. Any logic, theorems, or instruments on this planet, would not help you to command it. Your only option is to kick, shake, and turn its knobs repeatedly to see how it reacts and slowly gain control through trial and error. Small children do this when they encounter a new toy, and data scientists do it while searching for better hyperparameters to train their deep-learning models.
The strategy you take to how you explore the unknown - whether an alien spacecraft, a new toy, or the performance of your machine learning model - is aptly called black-box optimization. David Eriksson is a Research Scientist at Meta with a black belt in unwrapping black boxes and pushing the limits of AutoML.
How it started
After a long journey in computer science, engineering, and mathematics, David got his first taste of machine learning in 2014 at Cornell University, working with Prof. Christine Shoemaker on global optimization - a branch of applied mathematics - during the first year of his Ph.D. studies. While not technically machine learning, this work served as the perfect gateway to Bayesian optimization.
The academic atmosphere around machine learning felt like a breath of fresh air: "I liked how open the ML community was with articles from the top conferences never being behind paywalls, how quickly the submission process moved for the top conferences, and how many interesting problems there were to solve with direct impact both in the sciences and in industry.", he says.
Towards the end of his time at Cornell, David got busy teaming up with his advisor David Bindel and Andrew Wilson on scaling Gaussian processes to massive datasets, ideas that eventually ended up being incorporated in the popular GPyTorch package. He also interned at Google, demonstrating enhanced algorithms for black-box optimization in the Google Vizier project.
After graduation, there were plenty of opportunities for a talented research scientist. David's next adventure at Uber AI Labs, led by Zoubin Ghahramani, was a deeper dive into Bayesian optimization and Gaussian processes. This work culminated in the development of the TuRBO algorithm, a scalable Bayesian optimization approach for high-dimensional problems. The paper was awarded a NeurIPS 2019 spotlight.
Illustration of the TuRBO algorithm utilizing local trust regions.
Pioneering at Meta
The success with TuRBO and the ever-growing passion for Bayesian methods finally led David to his current Research Scientist position at Meta, where the team has gained a world-class reputation with a strong publication record at top venues like NeurIPS and from developing open-source tools like BoTorch and Ax.
The day-to-day work at Meta is a mix between scientific exploration and solving business-critical problems, and the team works closely with other teams like Instagram and Reality Labs. Meta's global scale means that problems are often challenging, and innovation around new state-of-the-art methods is a given. The team publishes new ground-breaking papers and develops pioneering open-source tools for the entire machine learning community.
For example, feature selection, hyperparameter optimization, and online experimentation have traditionally been treated as separate problems where it is common to optimize for a single metric such as accuracy. However, David's team has taken a more holistic approach and merged them into a single high-dimensional problem where they are optimizing for multiple competing metrics such as latency, size, and power efficiency. One particular example is their Bayesian multi-objective neural architecture search (Bayesian NAS) work which has helped optimize large-scale machine learning models for on-device deployment.
The Bayesian NAS (right) is much more sample-efficient than Sobol (left) in exploring the trade-offs between accuracy and latency.
Merging multiple optimization problems into a single large problem comes at a price, though. Wherever machine learning pioneers break new ground on an unforeseen scale, the curse of dimensionality lurks behind the corner. Even the state-of-the-art Bayesian optimization algorithms start to misbehave when dealing with hundreds of parameters or too many objectives. David's team has recently fought back this curse with new state-of-the-art approaches like SAASBO and MORBO.
The future of AutoML
Different forms of AutoML have become increasingly popular over the last few years. For example, automatic hyperparameter tuning is a common way to squeeze extra performance out of a machine learning model today. Bayesian optimization - David's true calling - is considered state-of-the-art and can be more than two orders of magnitude more sample-efficient than alternatives like random search. This was demonstrated at the big black-box optimization challenge - by Twitter, Meta, Valohai, and others - at NeurIPS 2020. David and the other organizers made dozens of black box problems available in the cloud for benchmarking with interesting results.
The black box optimization competition for the NeurIPS 2020.
The research in black-box optimization may feel overly academic to the average reader. However, it is worth pointing out that even a 1% improvement may have a massive compounding impact on the industry, potentially lubricating all machine learning projects via faster training and more accurate and efficient models.
If you are interested in the science behind AutoML or want to tame your black boxes more efficiently, David recommends checking out AutoML Seminars. They have great talks for getting informed on the latest research and state-of-the-art methods. You can also contact David through his webpage if you ever need help in taming an alien spacecraft with a Bayesian optimizer.