Algorithms For Decision Making
source: https://algorithmsbook.com

Algorithms For Decision Making

By Mykel J. Kochenderfer, Tim A. Wheeler, and Kyle H. Wray; MIT Press 2022.

Just a week back I received an alert from one of the authors of this work that the Print Edition is now published for Sale at MIT Press, and that a PDF version was also available for free download.

That, was arm twist enough and I weak-elbow promised a mini review as penance, in a fortnight or so, I thought. But I was wrong because it's only been a week and am ready for the cleansing!

The read was educational for me, as one raised in Engineering determinism, and re-reading almost the same material cloaked in layers of new semantic context. A dominant one is the micro-economic one on Actors, Utility function and Rational Agents. Another not much changed was Operations Research optimization problem solving, not so much changed in form, but in meaning and applicability to a much broader range of contexts that was ever taught during my student days. The same with Linear Programing, unchanged, but showing up as insightful algorithims when circumstances allow; and were there not many such circumstances in this book?

Readers Who May Benefit Most

The Book inspired me to recite some quick mini-review sections in reverse. So I here answer my last question first. For whom has it been written? If I were to try and imagine the kind of reader that may benefit the most from this work (apart from obvious coursework degree students that need to pass its course content to graduate), my first list of beneficiaries would be practitioners from among whom the book draws most of its concepts. Econometrists, Game Theorists, Control Theorists, Operations Researchers and Planners - Dynamic Programmers included.

Then there's the non numericists with their own set of knowledge domain numeric problems to crack. I am thinking of really any kind of experimentalist who has a focused theoretical backing into which he or she has boxed themselves into, and need help for an eventual breakthrough in their work. I think those are the ones for whom this book is heaven-sent.

It is written amorally (authors' term) enough to not interfere in their driving research questions but enhances their chances of an algorithmically assisted and keener look at their experimental data from which to find their proverbial haystack needle. Some of it can actually be programed straight into experimental instrumentation (should Julia allow) to help say, trap that elusive Schrodinger Cat running through their laboratory lattice condensates for example;

But I would be reluctant to suggest this work to Empiricists in search of Evidence for a Candidate Theory already at Hand. The sheer number of innocuous looking assumptions that go into any of their Online Policy Planners for example, makes the tools valid in a surprisingly narrow field of views, and in even fewer convergence foci of relevance. And knowing how the algorithms focus assumes well understood preconditions for their use. Outside of them, the entire exercise is futile and certain to be stillborn. On to the book proper.

First Impressions: My quick read impression (drawn from Prelude Table of Contents, and Chapter 1).

It is a collective work led purposefully and contains something for every reader, although it might take a slow scan to pick out nuggets of interest.

Algorithms for Decision Making is an apt title for it spans a wide range of problem solving use cases (to speak algorithmically) or problem domains in general

The work is highly structured. One (or the Algorithm) receives Observations; and then responds to them with Actions. What the context is depends on the type of reader that I just alluded to. The Structure comprises the wrapping around of those two events of Observing and of Reacting.

Use Cases

But the Problem Domain suggested in the Prelude includes an interestingly wide variety. The thinking behind this book addresses a broad range of settings. The preface for example outlines Applications such as:

?Aircraft Collision Avoidance; Automated Driving; Cancer Screening; Financial Portfolio Allocation & Closing; Disaster Surveillance; and Science Exploration;

Methods suggested include:

Explicit Programming; Supervised Learning; Optimisation; Planning; and Reinforced Learning

It also incorporates Historical trends in the Subject that include: development of its specialized Taxonomy, Economics, Psychology, Neuroscience, Computer Science, Engineering, Mathematics and Operations Research.

The Authors are further cognizant of relative complexity in the types of solvable problems and introduce a didactic of digestion in the form of:

Probabilistic Point Decisions (Reasoning), Sequential Decision Making, Model Uncertainty, State Uncertainty, and eventually Concurrent (Multiagent scenario) Decision making. I think this is a good heuristic approach for the reader as it lays out complexity in small enough digestible chunks; *but perhaps at the cost of not providing desimplifying disclaimers - back into the cruel complex realities that the didactic avoids [ This is yet to be decided]. Thus Far, The work is highly readable.

Al Gorithm Julia:???:????? ?????????

Every now and again there's an algorithmic inset that illustrates some procedure - perhaps this is the point in this book's title. But they just pop up unannounced and not explained. By the fifth one or so I begin to think; is this not Python-looking (indentation style)? So I search the text for Python and only get two hits. Pythontex - a lark; and an almost apology for Julia in a full Appendix G [pp627-650] specifying this recent Python-cum-Matlab-cum-R programing language creation that the book's Raison d'etre (Algorithms) is spelled out in.

A search back for 'Julia' pops up right at the beginning in the Preface, in a mention as the preferred expression of algorithms, in a muted afterthought where it does not display its critical cogency in rest of this book viz - that all Sonnets in this opus, are an Ode in/to Al Gorithm Julia:???:????? ?????????.

So a first side distraction for me is what inspired this choice of Julia. I remember Matlab being good (sort-of) at array indexing and manipulation, Python, at dynamic runtime linking of object methods, and perhaps R for their statistical compute framework? So there, Julia is seemingly their optimal choice of the moment at Stanford. Perhaps someone there has started a countdown for MATLAB as a closed source?

So, is Julia ready to assume MATLAB's mantle in lab instrumentation proper? It would certainly raise the pace and hopefully the yield of instrument-driven scientific exploration. But the converse argument for expiring annual seat licenses should also be already prodding lax experimenters forwards (wouldn't you think?).

PART I: Probabilistic Reasoning

Chapter 2 Representation - Begins with "Plausibility" Relations where a concave arrowhead-like curved < and >?symbolize directed less and more plausible propositions on the basis of probability distribution. [No mention is made of a possibly alternative Convex shaped umbrella or parachute-like inequality symbol; (but that is for readers' speculation)].

Short work is made of Discrete Probability Distributions. This portion relies inordinately much on refined Mathematics and does not give a new reader on the subject sufficient context on the distinction between Distribution, and say for example, Density thereof; But it is true and accurate with ample reference for clarifying further readings. [My warning in hindsight is for the reader to keep referring to the Appendices for telltale keywords around one's comprehension problem areas]

Next, Continuous Distribution provides more colorful discussion on types of distribution and enriches the reader who has breached that initial discrete leap of faith.

It also conceptually facilitates Joint Distributions (of multiple variables), Marginal Distribution of one of them (as Projection, as others are summed to exhaustion), and then Discrete Joint Distributions which provide a basis for the rest of this game choices book thus far. A first Title-giving Algorithm (2.1) also appears at this point showing savings based on Independent Probability Distributions.

The richness of content expands exponentially in Joint Continuous Distributions which finally introduce the central Bayes Rule (2.25) and thereafter Conditional Models around Gaussian processes: Linear, Sigmoid, Deterministic, Bayesian Networks, and Conditional Independence.

Chapter 3 Inference - To me the content is standard statistics fare and it leads to notable "Gibbs Sampling" as 'kind of Markov chain Monte Carlo technique' on multivariate (joint distribution) Gaussian processes where Marginal Distributions feature prominently.

The Summary (3.10) of this chapter informally provides a grand Algorithm of Inference that possibly guides the rest of the book. The Exercises are also meatier spanning four pages as it illustrates a very rich tapestry of inference scenarios. Lecturers should love this chapter for didactic value.

Chapter 4 Parameter Learning regards Maximum Likelihood Estimate of parameters of a stochastic process - cumbersome for lack of data, Bayesian parameter learning with conditionally independent with uniform distribution priors; Non-parametric 'kernel' specification using analytical (Gaussian) engines with a standard deviation for sampling bandwidth, and treatment of missing data and meanings of missingness

Chapter 5 Structure Learning regards problems where the structures assumed in Chapter 4 for example are not known and have to be estimated themselves.The technique follows the same pattern - "A maximum a-posteriori approach to structure learning involves finding a G that maximizes P ( G | D )"; and then "searching the space of networks for the highest-scoring network".

Chapter 6 Simple Decisions - "decision making from the perspective of utility theory". Preference constraints, Utility Functions, Analytical - Quadratic, Logarithmic, Exponential; Rationality of Utility Function, Irrationality of / in measured observation behaviors; non-rationality of agents.

Part II Sequential Problems

Chapter 7 Exact Solution of Markov Decision Processes (MDP), Policy Evaluation Lookahead, Convergence, Q-Function and finding maximal Policy.

Outlines nomenclature and terminology which at first is seemingly obtuse but later turns meaningful when cast in contextual problem solving situations. Action space (a), State space (s); Markov assumption, Stationarity, State Transition model (T). Reward Function (R), Horizon, Utility (U), Discount factor(γ), Returns (R), Policy (π), Stochastic, Deterministic, Stationary, Optimal, Optimum value function, Dynamic Programming, Policy Evaluation, Bellman Expectation Equation, Value Function Policy (U[π]), Optimal Q-Function, Convergence, Iteration (U'=R+TU), Bellman backup or update iteration, Bellman Optimality equation, Bellman Residual, Gauss-Seidel iteration, Linear Programming formulation, Linear Systems with Quadratic Reward - example of Linear Gaussian dynamics.

A key inflection point in this chapter narrative is in pp139 and onwards covering Value Function Policies: A maximal (greedy) policy -?

π ( s ) = arg max[a]( R ( s, a ) + γ ∑[s′] T ( s ′ | s, a ) U (s′) )

Chapter 8 Approximations - An interesting breakdown from Parametric, Nearest Neighbor, Kernel Smoothing, Linear Interpolation, Simplex Interpolation, Linear Regression, Neural Network Regression - where Basis Functions required by Linear Regression are provided as Depth / Width parameter dimension.The rest of this Part II on Sequential Problems is better enumerated in reverse:

Chapter 14 Policy Validation - assuming one has been arrived at; using

Chapter 13 Actor Critic Methods - with policy reference (critic) evaluation; when

Chapter 12 Policy Gradient Optimization - arriving at some Policy result with

Chapter 11 Policy Gradient Estimation - when each can be arrived at along the way

Chapter 10 Policy Search - so long as there is a criterion to search by during

Chapter 9 Online Planning - the systematic 'groping about' for a best decision when there's not enough time for any certainty in each decision point as required in

Chapter 8 Approximation nor in more luxurious circumstances of say:

Chapter 7 Exact Solutions with known parameters.

Which is all well and good so long as there is no uncertainty. In stead we get

Part III Model Uncertainity

This part reviews the Sequential problem solutions where Reward and Transition are well known, and when that is not the case, there is the need for Reinforced Learning

Chapter 15 Exploration and Exploitation assesses the stand-off between the gambler's challenge and the wagerer's decision options

Chapter 16 Model-Based Methods outlines how a player may 'count the cards' being dealt and the payoffs made to refresh the next prospects

Chapter 17 Model-Free Methods attempts the same without the benefit of 'structured card decks'

Chapter 18 Imitation Learning adopts auxiliary Heuristics to overcome intrinsic optimum search limitation; almost in the sense of simulated learning. A final development in the chapter summary says: "Generative adversarial imitation learning iteratively optimizes a discriminator and a policy; the discriminator tries to discriminate between decisions made by the policy and decisions made by the expert, and the policy attempts to deceive the discriminator". At times the context invokes human like actors in zero-sum contest.

Part IV State Uncertainty

This part seems to provide more redeeming titles:

Chapter 19 Beliefs - covers perhaps what I could characterise as 'Guessing Games' and the Kalman Filter is raised as the preferred guessing game strawman. Extended and Unscented variants are also described. An interesting point is made that when the underlying model is unknown the Kalman computational procedure then relies upon the assumption that Transition (T) and Observation (O) are linear Gaussian and the Belief Simplex (B) is Gaussian.

This is unlike more familiar use Engineering context Kalman Filter use cases where no stationarity assumptions are invoked precisely for non-stationary processes. The chapter ends with "Partially Observable Markov Decision Processes (POMDP) which extend MDPs to include state uncertainty", where "uncertainty requires agents in a POMDP to maintain a belief over their state". Section 20.6 on Linear Policy clarifies this seemingly confusing point where exact solutions are obtained entirely 'off-line'.

As in the above part on Sequential Problems the remainder of Part IV is better enumerated in reverse order starting with:

Chapter 23 Controller Abstractions - the device for maintaining internal state and provide algorithms to construct controllers using policy iteration, nonlinear programming, and gradient ascent in a manner that improves Online Planning as in

Chapter 22 Online Belief State Planning from a current belief state using Lookahead with Rollouts, Forward Search, Branch and Bound or Sparse Sampling - Monte Carlo Tree Search, Determinized Sparse Tree Search and Gap Heuristic Searches. Unless of course there is adequate information for better informed: as in

Chapter 21 Offline Belief State Planning using observation to inform iteration steps for Fully Observable states, Bounded states - Informed bound or Range bound, Point Discrete, Heuristic and Triangulated; unless exact belief are available: as in

Chapter 20 Exact Belief State Planning where Belief is considered a State and each iteration a Conditional Plan based on the utility of beliefs factored on state variable Alpha-vector of utilities composed with the (dual) Belief vector.

Alpha vector section 20.4 Pruning is discussed but there is a break in the narrative with a dangling sentence at end of p412 ..."We can check whether an alpha vector α is dominated by" followed by a sudden appraisal of dominating Alpha vectors p143. The process though iterates first a one step determination for Alpha vectors and drops non-dominating ones before proceeding to the next step, and continues in like fashion until the iteration horizon is reached. Dominating alpha vectors are those that result in maximal utility (U ? ( b ) = max α ? π b) i.e the maximal α[?].b at each step.

Part V Multiagent Systems

By this point a similar pattern of treatment of subject has emerged with:

Chapter 24 Multiagent Reasoning, Simple games for the most with Game equilibria - Dominant Strategy, Nash, Correlated; and shotcut algorithms to the last one via Iterated Best Response approaches that do not always arrive at a Nash Equilibrium or end with endless Cycles. Games can also be structured variously as Behavioral, Fictitious Play, and Utility function Gradient Ascent.

Chapter 25 Sequential Problems runs through the same gamut of scenarios, but specific to Markov games. Of note is that with Gradient Ascent an optimal stochastic policy may be arrived at without having to assume a model structure.

Chapter 26 State Uncertainty here expounds the further features of Partial Observation Markov Games (POMG) as Multiagent POMDMs

Chapter 27 Collaborative Agents is a final chapter that attempts to classify the many Multiagent problem types covered in this book. Initially categorized as Decentralised, then sub-classified by algorithm and structure; such as Factored, Transition Independent, Observation Independent, Reward Independent, Networked, Cardinality by Agent numbers, level of Observability, Communication and Algorithm Model (here model being any of: MDP, POMDP, MMDP, Dec-MDP, MPOMDP and DecMPOMDP).

The Appendices

The Appendices are also quite informative. In addition to

Appendix G on Julia, there is

Appenix F Problems

Appendix E Search Algorithms

Appendix D Neural Representations

Appendix C Computational Complexity

Appendix B Probability Density, and

Appendix A Mathematical Concepts

All the Appendices actually qualify as Chapters, crucial ones even, that serve the text well. I am sure the Authors were constrained by book publication custom to ferret them to one side as non-body text; But they could serve the cause even more as annotation for context inside of the body proper, to eliminate any doubt that might have arisen therein. I personally paused a few times to connect the rapid-fire body text narrative while not knowing that the Appendices had the dots already connected for me.

I think that if one were to balance the content between Body Chapters and Appendices a weighing scale might rest at a near level with a lean left to the practically minded and rightwards for the analytically inclined.

At this stage I decided that this reader type metaphor may be elaborated further through trying to imagine the kind of reader that may benefit the most from this work. But that's already been said in reverse order.

要查看或添加评论,请登录

Malome Tebatso Khomo的更多文章

  • Proof Positive: Beads & Mirrors are High Academic Currency at the African University

    Proof Positive: Beads & Mirrors are High Academic Currency at the African University

    A week ago one Public University Vice Chancellor was hauled to a South African Parliamentary Committee on Higher…

    2 条评论
  • Structure of Dynamical Systems, A Symplectic View of Physics J.-M. Cushman-de Vries Translation Editors RH Cushman, GM Tuynman

    Structure of Dynamical Systems, A Symplectic View of Physics J.-M. Cushman-de Vries Translation Editors RH Cushman, GM Tuynman

    As usual, book hound sleuth Dr Dmitry Kostokov (@Dmitry Kostokov) or one of his many connections at LinkedIn unearthed…

    1 条评论
  • Muxe Nkondo, The One that Never Ran

    Muxe Nkondo, The One that Never Ran

    In life, one is often honoured to get to know of such a person; and better when you did get to know him in person…

    2 条评论
  • Broken Holiday Breaks

    Broken Holiday Breaks

    Some two years ago unusual noises came out of Tanzania in an area named Loliondo, not a famous place except that it was…

    3 条评论
  • Book-run Longevity of Specialty Titles

    Book-run Longevity of Specialty Titles

    Khulu Mbatha is a literary chronicler of the South Africa 1976 Generation. I say so because this book publisher need…

  • The Travails of Political Independents, South Africa

    The Travails of Political Independents, South Africa

    The year 2024 Is upon us already and the five-yearly general elections are only a week away. The 29th of May 2024 to be…

    16 条评论
  • Electrify Us, Please!

    Electrify Us, Please!

    Here's your chance. This tip crossed my visual frustum momentarily (we don't use Desktops any more).

  • A Debacle In Review

    A Debacle In Review

    A number of times have I set forth to write a Book Review. Primed by the anticipation of what I shall find in the read,…

  • A Spectacle @ESKOM

    A Spectacle @ESKOM

    A Public Briefing was made 14 December 2022 regarding the resignation of ESKOM CEO Mr André de Ruyter. This is an…

    2 条评论
  • The Pedigree of Atom

    The Pedigree of Atom

    A quick review on the historical aspect Discovery of the Atom in the graduate course Quantum Mechanics Script released…

社区洞察

其他会员也浏览了