Safe Learning in Robotics

Safe Learning in Robotics#

Robot learning aims to enable autonomous operation in complex, uncertain environments.
Challenges include partial knowledge of dynamics, sensors, and other agents.
Safety guarantees are crucial but difficult with partial knowledge.
Control theory uses models to provide guarantees.
Reinforcement learning is data-driven for adaptability but lacks guarantees.
Combining model- and data-driven approaches leverages their complementary strengths.

Key directions are:
- Robustness against worst-case scenarios.
- Adaptation by learning from observations.
- Leveraging models from domain knowledge and data.
Control provides the basis for safety-critical applications.
Safe RL research has grown rapidly.
Simulation enables RL progress but transferring to real robots remains challenging.

../_images/80_comparison_model_driven_data_driven.svg — Fig. 24 A comparison of model-driven, data-driven, and combined approaches. *Taken from [BGH+22]*.#

The safe learning control problem is formulated as an optimization with 3 main components:
1. System model describing robot dynamics.
2. Cost function defining the control objective.
3. Constraints specifying safety requirements.
The goal is to find a policy fulfilling the task under the safety constraints.
Any of the 3 components could be initially unknown or partially known.

../_images/80_safe_control_block_diagram.svg — Fig. 25 Block diagram representing safe learning control approaches. *Taken from [BGH+22]*.#

Safety Constraints#

../_images/80_safety_levels.svg — Fig. 26 Illustration of Safety Levels. *Taken from [BGH+22]*.#

Safety level III: constraint satisfaction guaranteed.#

The system satisfies hard constraints:

\[ c_k^j(x_k, u_k, w_k) \le 0 \]

for all times \(k \in \{0, \dots , N\}\) and constraint indexes \(j \in \{1, \dots, n_c\}\).

Safety level II: constraint satisfaction with high probability.#

The system satisfies probabilistic constraints:

\[ P\left[c_k^j(x_k, u_k, w_k ) \le 0 \right] \ge p^j, \]

where \(P[\cdot]\) denotes the probability and \(p^j \in (0, 1)\) defines the likelihood of the jth constraint being satisfied, for all times \(k \in \{0, \dots , N\}\) and constraint indexes \(j \in \{1, \dots, n_c\}\).

Safety level I: constraint satisfaction encouraged#

The system encourages constraint satisfaction. This can be achieved in different ways:

One way is to add a penalty term to the objective function that discourages the violation of constraints with a high cost. A non-negative \(\epsilon_j\) is added to the right-hand side of the inequality in Safety level III, for all times \(k \in \{0, \dots , N\}\) and constraint indexes \(j \in \{1, \dots, n_c\}\):

\[ c_k^j(x_k, u_k, w_k) \le \epsilon_j, \]

and an appropriate penalty term l () ≥ 0, with l () = 0 ⇐⇒ = 0, is added to the objective function. The vector includes all elements ϵj and is an additional variable of the optimization problem.

Another way is to provide guarantees on the expected value of the constraint but only at a trajectory level:

\[ J_{c^j} = E\left[ \sum\limits_{k=0}^{N-1} c_k^j(x_k, u_k, w_k) \right] \le d_j, \]

where \(J_{c^j}\) represents the expected total constraint cost, and \(d_j\) defines the constraint threshold.

Safe Learning Control Approaches#

Learning uncertain dynamics to safely improve performance#

These works rely on an apriori model of the robot dynamics. The robot’s performance is improved by learning the uncertain dynamics from data. Safety is typically guaranteed based on standard control-theoretic frameworks, achieving safety level II or III.

Encouraging safety and robustness in RL#

These works encompass approaches that usually do not have knowledge of an apriori robot model or the safety constraints. Rather than providing hard safety guarantees, these approaches encourage safe robot operation (safety level I), for example, by penalizing dangerous actions.

Certifying learning-based control under dynamics uncertainty#

These works aim to provide safety certificates for learning-based controllers that do not inherently consider safety constraints. These approaches modify the learning controller output by constraining the control policy, leveraging a known safe backup controller, or modifying the controller output directly to achieve stability and/or constraint satisfaction. They typically achieve safety level II or III.

Model Predictive Safety Filter#

General learning-based control, particularly Reinforcement Learning, has shown great success in solving complex and high-dimensional control tasks.
However most techniques cannot ensure that safety constraints under physical limitations are met, particularly during learning iterations.
To address this limitation, safety frameworks emerged from control theory.
MPC techniques can be used for such safety filters to turn a safety-critical dynamical system into an inherently safe system to which any learning-based controller without safety certificates can be applied out of the box.

../_images/80_safety_filter.svg — Fig. 27 Based on the current state \(x\), a learning-based controller provides an input \(u_L = \pi_L(x) \in \mathbb{R}^m\), which is processed by the safety filter \(u = \pi_S(x, u_S)\) and applied to the real system. *Taken from [HWMZ20]*.#

The idea is to address the solution to the stochastic optimal control problem through learning-based control methods.
The proposed learning-based control input \(u_L(k)\) at time \(k\) is then verified in terms of safety by computing a safe backup trajectory from the one-step predicted state \(x_{1|k}\) to a safe terminal set \(X_f\) or by modifying \(u_L(k)\) as little as possible while still providing a safe backup trajectory.
The optimization problem necessary for validating safety of the input is computationally cheaper than a direct optimization of the task and can often be carried out over a reasonably short horizon.

The model predictive safety filter \(\pi_S\) is realized through an MPC-like optimization problem of the form:

\[\begin{split} \begin{array}{ll} \displaystyle\min_{U} & || u_{0|k} - u_L(k)||\\ \text{subject to} & x_{i+1|k} = f(x_{i|k}, u_{i|k}, i + k)\\ & U = [u_{0|k}, \dots, u_{N|k}] \in U_j, \forall j = 1, \dots, n_{cu}\\ & X = [x_{0|k}, \dots, x_{N|k}] \in X_j, \forall j = 1, \dots, n_{cx}\\ & x_{N|k} \in X_f \\ & x_{0|k} = x_k \end{array} \end{split}\]

../_images/80_safe_learning_approaches.svg — Fig. 28 Summary of safe learning control approaches. *Taken from [BGH+22]*.#