training_classical_control.inverted_pendulum
#
Original code taken from: Farama-Foundation/Gymnasium
MIT License: Farama-Foundation/Gymnasium
Module Contents#
Classes#
Description |
Data#
API#
- training_classical_control.inverted_pendulum.__all__#
[‘InvertedPendulumEnv’]
- training_classical_control.inverted_pendulum.logger#
‘getLogger(…)’
- class training_classical_control.inverted_pendulum.InvertedPendulumEnv(render_mode: Optional[str] = None, *, masspole: float = 0.1, masscart: float = 1.0, length: float = 1.0, x_threshold: float = 3, theta_threshold: float = 24, force_max: float = 30.0)[source]#
Bases:
gymnasium.envs.classic_control.cartpole.CartPoleEnv
Description
The inverted pendulum problem is based on the classic problem in control theory. The system consists of an inverted pole attached at one end to a cart, and the other end being free. The pole can rotate around its fixed point and the cart can move horizontally. The pole starts by default in a random upright position and the goal is to move the cart to keep it upright.
**Note** This environment is a modified version of the CartPole environment. It allows configuring most relevant parameters of the system (e.g. cart mass, pole mass, pole length) and it uses a continuous action space instead of a discrete one.
Action Space
The action is a
ndarray
with shape(1,)
representing the force applied to the cart.+—–+—————————+————-+————-+ | Num | Action | Control Min | Control Max | +=====+===========================+=============+=============+ | 0 | Force applied on the cart | -10 | 10 | +—–+—————————+————-+————-+
Observation Space
The observation is a
ndarray
with shape(4,)
where the elements correspond to the following:+—–+———————————————–+——+—–+ | Num | Observation | Min | Max | +=====+===============================================+======+=====+ | 0 | position of the cart along the linear surface | -3 | 3 | | 1 | linear velocity of the cart | -Inf | Inf | | 2 | vertical angle of the pole on the cart | -24 | 24 | | 3 | angular velocity of the pole on the cart | -Inf | Inf | +—–+———————————————–+——+—–+
Rewards
The goal is to make the inverted pendulum remain upright (within a certain angle limit) as long as possible - as such a reward of +1 is awarded for each timestep that the pole is upright.
Starting State
All observations start in state (0.0, 0.0, 0.0, 0.0) with a uniform noise in the range of [-0.01, 0.01] added to the values for stochasticity.
Episode End
The episode ends when any of the following happens:
Termination: Any of the state space values is no longer finite.
Termination: The absolute value of the vertical angle between the pole and the cart are greater than a threshold value (which defaults to 24 degrees).
:param masspole: mass of the pole. :param masscart: mass of the cart. :param length: length of the pole. :param x_threshold: threshold for cart position. :param theta_threshold: threshold for pole angle. :param force_max: maximum absolute value for force applied to Cart.
Initialization