Basic Usage#
Installation#
# From the Python Package Index (PyPI)
pip install safety-gymnasium
# From the source code
git clone https://github.com/PKU-Alignment/safety-gymnasium.git
cd safety-gymnasium
pip install -e .
Specification#
Gymnasium provides a well-defined and widely accepted API by the RL Community, and our library exactly adheres to this specification and provides a Safe RL-specific interface.So researchers accustomed to Gymnasium can get started with our library at near zero migration cost, for some basic API and code tools refer to: Gymnasium Documentation.
Initializing the environment#
import safety_gymnasium
env = safety_gymnasium.make('SafetyPointCircle0-v0', render_mode='human')
'''
Vision Environment
env = safety_gymnasium.make('SafetyPointCircle0Vision-v0', render_mode='human')
Keyboard Debug environment
due to the complexity of the agent's inherent dynamics, only partial support for the agent.
env = safety_gymnasium.make('SafetyPointCircle0Debug-v0', render_mode='human')
'''
obs, info = env.reset()
# Set seeds
# obs, _ = env.reset(seed=0)
terminated, truncated = False, False
ep_ret, ep_cost = 0, 0
for _ in range(1000):
assert env.observation_space.contains(obs)
act = env.action_space.sample()
assert env.action_space.contains(act)
# modified for Safe RL, added cost
obs, reward, cost, terminated, truncated, info = env.step(act)
ep_ret += reward
ep_cost += cost
if terminated or truncated:
observation, info = env.reset()
env.close()
Observation Space#
env = safety_gymnasium.make('SafetyPointCircle0-v0', render_mode='human')
obs, info = env.reset()
print(env.observation_space)
'''
Box([-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.],
[inf inf inf inf inf inf inf inf inf inf inf inf 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1.], (28,), float64)
'''
print(env.obs_space_dict)
'''
Dict('accelerometer': Box(-inf, inf, (3,), float64),
'velocimeter': Box(-inf, inf, (3,), float64),
'gyro': Box(-inf, inf, (3,), float64),
'magnetometer': Box(-inf, inf, (3,), float64),
'circle_lidar': Box(0.0, 1.0, (16,), float64))
'''
# position of each part in the obs is as same as it appears in the Dict above.
print(obs)
'''
[0. 0. 9.81 0. 0. 0.
0. 0. 0. 0.36647163 0.34014489 0.
0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0.08518475
0.93364224 0.84845749 0. 0. ]
'''
Action Space#
env = safety_gymnasium.make('SafetyPointCircle0-v0', render_mode='human')
obs, info = env.reset()
print(env.action_space)
# Box(-1.0, 1.0, (2,), float64)
Render#
We completely inherit the excellent API for render in Gymnasium.
Note
The set of supported modes varies per environment. (And some third-party environments may not support rendering at all.) By convention, if render_mode is:
None (default): no render is computed.
human: render return None. The environment is continuously rendered in the current display or terminal. Usually for human consumption.
rgb_array: return a single frame representing the current state of the environment. A frame is a numpy.ndarray with shape (x, y, 3) representing RGB values for an x-by-y pixel image.
rgb_array_list: return a list of frames representing the states of the environment since the last reset. Each frame is a numpy.ndarray with shape (x, y, 3), as with rgb_array.
depth_array: return a single frame representing the current state of the environment. A frame is a numpy.ndarray with shape (x, y) representing depth values for an x-by-y pixel image.
depth_array_list: return a list of frames representing the states of the environment since the last reset. Each frame is a numpy.ndarray with shape (x, y), as with depth_array.