Stochastic optimization comes in a wide variety of flavors under names such as stochastic search, dynamic programming, stochastic programming, stochastic control, ranking and selection and multiarmed bandit problems (to name just a few), with variations that use names such as simulation optimization, approximate dynamic programming, reinforcement learning and POMDPs. With careful notation, we can model all of these with one framework which requires making the transition from finding the best decision (a deterministic scalar or vector) to find the best policy (which is a function). An often overlooked distinction is the difference between offline learning (optimizing terminal reward) and online learning (optimizing cumulative reward). We then address the challenge of searching over policies. We identify two core strategies for designing policies, each of which divide into two classes, producing four classes of policies. These four classes cover all of the different subfields of stochastic optimization. By drawing ideas from other fields, we open up new approaches for solving problems, as well as new problems. Using a simple energy storage problem, we demonstrate that each of the four classes of policies may work best depending on the data, while hybrids may work even better.
Warren B. Powell is a professor in the Department of Operations Research and Financial Engineering at Princeton University, where he has taught since 1981 after receiving his BSE from Princeton University and Ph.D. from MIT. He is the founder and director of the laboratory for Computational Stochastic Optimization and Learning (CASTLE Labs), which spans contributions to models and algorithms in stochastic optimization, with applications to energy systems, transportation, health and medical research, business analytics, and the laboratory sciences (see www.castlelab.princeton.edu). He has pioneered the use of approximate dynamic programming for high-dimensional applications in freight transportation, where his projects have twice been recognized as an Edelman finalist, and one won the Daniel Wagner prize. This research led him into the field of optimal learning for optimizing expensive functions using the knowledge gradient. A Fellow of Informs, he has served in a range of service positions spanning the Society for Transportation and Logistics, the Informs Computing Society, and the Informs Optimization Society. He has two books and over 200 papers, and is working on a new book “Optimization under Uncertainty: A Unified Framework.” He has produced 50 graduate students and post-docs, and has supervised almost 200 undergraduate senior theses (see http://tinyurl.com/powellacademictree).