training_classical_control.inverted_pendulum#

Original code taken from: Farama-Foundation/Gymnasium

MIT License: Farama-Foundation/Gymnasium

Module Contents#

Classes#

InvertedPendulumEnv

Description

Data#

__all__

logger

API#

training_classical_control.inverted_pendulum.__all__#

[‘InvertedPendulumEnv’]

training_classical_control.inverted_pendulum.logger#

‘getLogger(…)’

class training_classical_control.inverted_pendulum.InvertedPendulumEnv(render_mode: Optional[str] = None, *, masspole: float = 0.1, masscart: float = 1.0, length: float = 1.0, x_threshold: float = 3, theta_threshold: float = 24, force_max: float = 30.0)[source]#

Bases: gymnasium.envs.classic_control.cartpole.CartPoleEnv

Description

The inverted pendulum problem is based on the classic problem in control theory. The system consists of an inverted pole attached at one end to a cart, and the other end being free. The pole can rotate around its fixed point and the cart can move horizontally. The pole starts by default in a random upright position and the goal is to move the cart to keep it upright.

**Note** This environment is a modified version of the CartPole environment.
It allows configuring most relevant parameters of the system (e.g. cart mass,
pole mass, pole length) and it uses a continuous action space instead of
a discrete one.

Action Space

The action is a ndarray with shape (1,) representing the force applied to the cart.

+—–+—————————+————-+————-+ | Num | Action | Control Min | Control Max | +=====+===========================+=============+=============+ | 0 | Force applied on the cart | -10 | 10 | +—–+—————————+————-+————-+

Observation Space

The observation is a ndarray with shape (4,) where the elements correspond to the following:

+—–+———————————————–+——+—–+ | Num | Observation | Min | Max | +=====+===============================================+======+=====+ | 0 | position of the cart along the linear surface | -3 | 3 | | 1 | linear velocity of the cart | -Inf | Inf | | 2 | vertical angle of the pole on the cart | -24 | 24 | | 3 | angular velocity of the pole on the cart | -Inf | Inf | +—–+———————————————–+——+—–+

Rewards

The goal is to make the inverted pendulum remain upright (within a certain angle limit) as long as possible - as such a reward of +1 is awarded for each timestep that the pole is upright.

Starting State

All observations start in state (0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [-0.01, 0.01] added to the values for stochasticity.

Episode End

The episode ends when any of the following happens:

  1. Termination: Any of the state space values is no longer finite.

  2. Termination: The absolute value of the vertical angle between the pole and the cart are greater than a threshold value (which defaults to 24 degrees).

:param masspole: mass of the pole. :param masscart: mass of the cart. :param length: length of the pole. :param x_threshold: threshold for cart position. :param theta_threshold: threshold for pole angle. :param force_max: maximum absolute value for force applied to Cart.

Initialization

step(action: float) tuple[numpy.typing.NDArray, float, bool, bool, dict][source]#
reset(*, seed: Optional[int] = None, options: Optional[dict] = None) tuple[numpy.typing.NDArray, dict][source]#