Sparsity Methods for Systems and Control-Book Summary-Review
Sid Ahmed Attia
Utilities & Energy - Principal Engineer and Industry Manager @ MathWorks | Control and Optimization | Energy Systems | System Architectures | Power Systems Dynamics and Economics
The book Sparsity Methods for Systems and Control by Masaaki Nagahara is written for undergraduate students in electrical engineering, computer science, systems, and mathematics. The book is also helpful as a quick introduction to sparsity for more advanced students, researchers and industry practitioners. The book is easy to read, and I have personally got quickly absorbed by its content.
The theory is well explained, and the examples reproducible using the Matlab code included in the text. The book contains many useful references for researchers interested in diving deeper into the theory and its implications. Overall, the book is highly recommended for anyone curious about sparsity; mathematics is kept to an undergraduate level.
Excellent point; the book is open access and can be freely downloaded.
Sparsity Methods for Systems and Control
By Masaaki Nagahara, The University of Kitakyushu, Japan
Hard copies can be ordered for 100$.
The author structures the book into two parts. The first part focuses on the sparse representation of vectors, while the second on sparsity methods for optimal control. The book contains overall ten chapters.
The topic is highly relevant for engineers in the industry. Most of the engineering signals once seen in the appropriate domain are sparse; for example, vibrational signals are usually sparse in the Fourier domain. Understanding the theory and its applications opens new areas in data compression (Tera-Bytes of data are flowing through SCADA systems daily) and control over networks.
Chapter 1: Introduction
Chapter 1 is an introduction and motivation to study the particularities of sparsity. The chapter discusses, in particular, the principle of parsimony which William Kneale summarises as
It is futile to do with more things that which can be done with fewer.
As further motivation, the author introduces and formulates the group testing problem as a binary optimisation problem and briefly introduces the l0, a non-homogeneous norm. Chapter 1 continues by introducing the l1 norm and additional motivational examples, including signal reconstruction, impulse response estimation in geophysics, neural networks overfitting, LASSO (Least Absolute Shrinkage and Selection Operator) regularisation in regression and classification problems and compressed sensing. The author introduces additional examples from control theory, i.e. fuel optimal control L1 problem, maximum hands-off control L0-problem, discrete-valued control, specific optimization problems arising in H-infinity control and networked control systems.
I find the author selection of the different examples exciting and rich enough to keep the reader focused.
Chapter 2: What is Sparsity?
The author dedicates the whole Chapter 2 to defining sparsity, an essential concept in many areas with strong links to volume in high dimensional spaces (Big Data). The author focuses on vectors and their representations and where sparsity is defined by the non-homogenous norm l0 of a vector, i.e. the number of its non-zero components. The author introduces concepts like atom and dictionary from the sparse coding area and thoroughly explains the differences between the different norms involved. The author formulates in an extremely pedagogical way the main problem called the sparse representation. It can be formulated as finding the vector with the least number of non-zero components subject to a set of linear equalities (the matrix appearing below is the dictionary, and its column vectors the atoms).
(1)
Despite its simplicity, the problem above is a combinatorial problem. The solution can be obtained using a brute force that consists of testing all the possible subsets until a solution is found; the problem is NP-hard (non-deterministic polynomial).
Chapter 2 concludes with extensions of sparsity to functions and their representations in the Fourier and Haar domains.
Chapter 3: Curve Fitting and Sparse Optimisation
In Chapter 3 and Chapter 4, the author describes theoretical and algorithmic foundations of optimization in the sparse representation problem (1). Chapter 3 is introductory and focuses on sparsity in the Least Squares (LS) and polynomial regression problem. The later consists of solving only for the constraints as shown in (1); the l2 problem defines in addition the solution with the least l2 norm of the vector x (sum of the square of the vector elements). While the LS problem consists of minimizing the l2 norm of the error between the first and second term of the constraints in (1). The author introduces important algorithmic techniques like the Ridge regularization (shrinkage family of methods see, e.g. Python scikit-learn routine from sklearn.linear_model import Ridge), the regularized LS problem is formulated as
(2)
The regression coefficients are forced to shrink by imposing a quadratic penalty on their size. If the regression vector is sparse or one wants to find the sparsest solution to (1), the problem is naturally formulated as in (2) above. The author introduces LASSO or l1 regularization, which is another shrinkage method and as such important in regression problems in general (see, e.g. Python scikit-learn routine from sklearn.linear_model import Lasso)
(3)
However, in most of the cases of sparse vector solution, the approximation above is not satisfactory. Problem (1) can instead be better approximated by a convex optimization problem that is amenable to efficient solutions. Indeed, under the condition of Restricted Isometry Property, the solutions to the l0 and l1 problems are identical (see A Note on the Complexity of Lp Minimization). The l1 (sum of the absolute value of the vector elements) problem can be formulated as follows
(4)
The l1 problem is convex and is a convex approximation to the initial l0 problem; the author explains in many details with examples the important concept of a convex approximation (in fact, the challenges of the l0 optimization stems from the fact that the non-homogeneous l0 norm is discontinuous at 0). The author recommends using CVX for solving the problem above (see CVX for the Matlab package and CVX Python for the Python package). The CVX package is based on state of the art interior-point methods. In Chapter 4, the author describes additional algorithms which are simpler to code and are more suitable for real-time implementation.
Chapter 4: Algorithms for Convex Optimisation
Chapter 4 is dedicated to algorithms for convex optimization. The author first starts by introducing the theory of convex sets and convex functions. The author explains clearly all the needed material in a pedagogic way. The material covered includes convex sets, convex and strongly convex functions, among other definitions. The author introduces the reader to the proximal operator theory, a pillar concept in convex optimization that extends far beyond the book's scope. Many examples of proximable functions and algorithms are included. Important algorithms like the Douglas-Rachford and Dykstra splitting (the split refers to the fact that the algorithm iterates between two terms see Proximal Algorithms Python package), generalized LASSO and ADMM (Alternating Direction Method of Multipliers) are introduced together with their convergence results. The reader-oriented towards applications might find Chapter 4 interesting because the different algorithms are summarized at the end of each section. The algorithms steps are made easy to follow and to code. Matlab code for an example of image denoising is provided at the end of the Chapter.
Chapter 5: Greedy Algorithms
Chapter 5 considers direct solutions to the initial l0 optimization problem, and due to its combinatorial nature, greedy algorithms are a natural choice. The author, after introducing the important concept of restricted isometry property or mutual coherence of a matrix (max cosine between all the vectors constituting the matrix) based on which a sufficient condition for a solution to be the sparsest is stated. The author explains the main algorithms, i.e. the Matching Pursuit (MP) and the Orthogonal Matching Pursuit (OMP). The author does pedagogic work in simplifying and making accessible specialized literature on the topic (see, e.g. Sparse Lab and Emmanuel Candes website). Basically, using the properties of the mutual coherence, the MP pursuit family of algorithms looks for the sparsest solution iteratively using the residual error and by increasing the size of the vector for each additional iteration. The author concludes Chapter 5 by introducing thresholding algorithms that combine ideas from proximal algorithms and greedy algorithms and provides readable Matlab code for the three variants. The algorithms steps are made easy to follow and to code.
Chapter 6: Applications of Sparse Representation
Chapter 6 is devoted to applications of the concepts described in the previous Chapters of the book. The Chapter starts with the interpolation problem using Splines and where the following optimization is considered.
(5)
Where lambda is a smoothing parameter and the function y the unknown. The optimization problem above is an infinite-dimensional (search for a function) on a particular Sobolev space (square-integrable functions with their derivatives up to the second-order). The problem is known in the literature to have cubic spline as an explicit finite dimensional unique minimizer (see, e.g. The Elements of Statistical Learning). The author, following the ideas from control-theoretic splines (see Control Splines I and Control Splines II), reformulates the problem as the control of a double-integrator system with a quadratic penalty on the control input (torque or acceleration) and the output (position) deviation from a prescribed set of locations (think of (5) as finding a trajectory of a robot that needs to visit a set of locations and such that its energy consumption is minimized). Using a finite basis, the transformed system then reduces to a finite-dimensional problem similar to the one studied in the previous chapters. The author concludes by introducing the maximum hand-off control problem discussed further in details in Chapter 8 through Chapter 10.
Part II is dedicated to Sparsity Methods in Optimal Control and is the book's core part for the control system community. The maximum hands-off control chapters are also relevant for researchers.
Chapter 7: Dynamical Systems and Optimal Control
Chapter 7 is an introductory chapter to dynamical systems and optimal control. The author goes through fundamental concepts such as state-space description and solution of linear differential equations. The controllability aspects are discussed using both sets theoretic and analytical approaches. The former is popular among the reachability community. The minimum-time control problem, a popular and fundamental control problem that consists of steering a system from an initial state to the equilibrium in minimum time using bounded control input, is discussed in details using the set-theoretic approach (see Foundations of Optimal Control Theory for the theory). The author discusses the necessary conditions of optimality for dynamical systems known also as the Minimum Principle (as a gross simplification, think of it as the generalization of the Lagrange-Multiplier rule applied to dynamical systems) formulated in the late 1950s by Lev Pontryagin. The minimum principle is then applied to the minimum-time problem of linear systems, followed by useful results on its existence and uniqueness. The author concludes the chapter with the academic example of a minimum-time problem for a double-integrator and shows that the solution is a state-feedback of the bang-bang type (control takes the extreme values, a principle that holds for a larger class of input-affine systems).
Chapter 8: Maximum Hands-Off Control
Chapter 8 is devoted to the L0 optimal control problem called the maximum hands-off control. Similar to its finite-dimensional variant (parameter optimization as introduced in the previous chapters) consists of finding a bounded control input with the least L0 norm that satisfies the dynamics of the linear system describing the object to be controlled and a set of boundary conditions on the states. The L0 problem is stated as follows.
subject to
(6)
The difficulties include extending the theory developed in the previous chapters to the particular case where one has to deal with functions (control inputs). The non-homogeneous L0 norm of a function is defined as the length of the time interval where the function is non-zero, e.g. the L0 norm of a non-zero function is equal to the length of its definition interval. The hands-off control problem consists of finding the bounded control with the least deviation from the 0-function (or the equilibrium for linearized systems). Under mild conditions, the author proves the L0 optimal control existence and its equivalence to the L1 (norm defined as the integral of the absolute value of the control over the time interval of interest) optimal control problem. The latter control problem is also known as the minimum fuel problem and is well documented in the literature (see the monumental work of Michael Athens in that area). The author concludes the chapter with an academic example of a rocket control problem.
Chapter 9: Numerical Optimisation by Time Discretization
In Chapter 8 and Chapter 9, the author considers the L0 control problem algorithmic aspects and numerical solutions. The linear dynamics are discretized, leading to a set of difference-equations; the author discusses some practical aspects in conducting the discretization, namely avoiding the pathological case that might lead to a loss of the controllability of the discretized system. The control input is assumed to piece-wise constant leading to a number N of unknowns where N is the number of steps. In contrast to a full discretization approach that considers the control and the state as the unknowns, the dynamics are here embedded in the optimization problem and the only unknown are the piece-wise constant inputs. The problem is then reduced to a finite-dimensional l1 (piece-wise constant input as the unknown parameters) problem that can be efficiently solved using the algorithms described in Chapter 4 and Chapter 5. The author provides two readable Matlab codes, one using a CVX call and the other uses an implementation of the ADMM algorithm (see the author website Nagahara Masaaki sample code).
Chapter 10: Advanced Topics
Chapter 10 considers advanced and research-oriented topics like the mixed L1/L2 optimal problems (similar to the elastic net penalty from the Machine Learning literature). Using the Minimum Principle, the author shows that the solution is a saturation composed with a soft thresholding function and shows its continuity. The author discusses the case of discrete-valued control inputs and reformulates it nicely as a weighted L1 problem. Finally, the author explores the maximum hands-off time-optimal problem and give some hints on future research directions.
Ph.D., | Power Systems | Power Electronics | Control Systems | Grid Compliance | Specialist in model based controller design, verification and validation
3 年Thank you for sharing.