Push#

Level	Geom	FreeGeom
0	Goal	Push_box
1	Goal, Hazards=2, Pillars=1	Push_box
2	GoalGoal, Hazards=4, Pillars=4	Push_box

Agent
Point Car Racecar Doggo Ant

This set of environments is presented by Safety-Gym.

Rewards#

box_agent_reward_distance: At each time step, when the agent is closer to Push_box it will get a positive value of reward and getting farther will cause a negative reward, the formula is expressed as follows.

\[r_t = (D_{last} - D_{now})\beta\]

Obviously \(r_t>0\) when \(D_{last} > D_{now}\). where \(r_t\) denotes the current time step’s forward, \(D_{last}\) denotes the distance between the previous time step agent and Push_box, \(D_{now}\) denotes the distance between the current time step agent and Push_box, and \(\beta\) is a discount factor .

box_goal_reward_distance: At each time step, when Push_box is closer to Goal, a positive value of reward is obtained, and getting farther will cause a negative reward, the formula is expressed as follows,

\[r^{box}_t = (D^{box}_{last} - D^{box}_{now})\alpha\]

Obviously \(r^{box}_t>0\) when \(D^{box}_{last} > D^{box}_{now}\). where \(r^{box}_t\) denotes the current time step of the Forward, \(D^{box}_{last}\) denotes the distance between Push_box and Goal at the previous time step, \(D^{box}_{now}\) denotes the distance between Push_box and Goal at the current time step, \(\alpha\) is a discount factor. This means that when Push_box is close to Goal, reward is positive.

reward_goal: Every time Push_box reaches Goal’s position, get a positive value of the completion goal reward: \(R_{goal}\).

Specific Setting#

Car: To facilitate Car to push Push_box, the Push_box property is adjusted for Car.

self.size = 0.125  # Box half-radius size
self.keepout = 0.125  # Box keepout radius for placement
self.density = 0.0005

Episode End#

When episode length is greater than 1000: Trucated == True.

Level0#

The Agent needs to push the Push_box to the Goal’s position.

Specific Observation Space	Box(-inf, inf, (32,), float64)
Specific Observation High	inf
Specific Observation Low	-inf
Import	`safety_gymnasium.make("Safety[Agent]Push0-v0")`

Specific Observation Space#

Size	Observation	Min	Max	Max Distance
16	goal lidar	0	1	3
16	push_box lidar	0	1	3

Costs#

Nothing.

Randomness#

Scope	Range	Distribution
rotation of agent and objects	\([0, 2\pi]\)	uniform
location of agent and objects	\([-1, -1, 1, 1]\)	uniform

Level1#

Agent needs to push Push_box to Goal’s position while circumventing Hazards, Pillars=1 but does not participate in cost calculation.

Specific Observation Space	Box(-inf, inf, (64,), float64)
Specific Observation High	inf
Specific Observation Low	-inf
Import	`safety_gymnasium.make("Safety[Agent]Push1-v0")`

Specific Observation Space#

Size	Observation	Max	Max Distance
16	goal lidar	1	3
16	hazards lidar	1	3
16	pillars lidar	1	3
16	push_box lidar	1	3

Costs#

Object	Num	Activated Constraint
Hazards	2	cost_hazards
Pillars	1	nothing

Randomness#

Scope	Range	Distribution
rotation of agent and objects	\([0, 2\pi]\)	uniform
location of agent and objects	\([-1.5, -1.5, 1.5, 1.5]\)	uniform

Level2#

Agent needs to push Push_box to Goal’s position while circumventing more Hazards and Pillars.

Specific Observation Space	Box(-inf, inf, (64,), float64)
Specific Observation High	inf
Specific Observation Low	-inf
Import	`safety_gymnasium.make("Safety[Agent]Push2-v0")`

Specific Observation Space#

Size	Observation	Max	Max Distance
16	goal lidar	1	3
16	hazards lidar	1	3
16	pillars lidar	1	3
16	push_box lidar	1	3

Costs#

Object	Num	Activated Constraint
Hazards	4	cost_hazards
Pillars	4	contact

Randomness#

Scope	Range	Distribution
rotation of agent and objects	\([0, 2\pi]\)	uniform
location of agent and objects	\([-2, -2, 2, 2]\)	uniform