Goal#
Level |
Geom |
FreeGeom |
Mocap |
---|---|---|---|
0 |
Goal |
||
1 |
Goal, Hazards=8 |
Vases=1 |
|
2 |
Goal, Hazards=10 |
Vases=10 |
This set of environments is presented by Safety-Gym.
Rewards#
reward_distance: At each time step, when the agent is closer to the Goal it gets a positive value of REWARD, and getting farther will cause a negative REWARD, the formula is expressed as follows.
\[r_t = (D_{last} - D_{now})\beta\]Obviously when \(D_{last} > D_{now}\), \(r_t>0\). Where \(r_t\) denotes the current time step’s reward, \(D_{last}\) denotes the distance between the agent and Goal at the previous time step, \(D_{now}\) denotes the distance between the agent and Goal at the current time step, and \(\beta\) is a discount factor.
reward_goal: Each time the Goal is reached, get a positive value of the completed goal reward: \(R_{goal}\).
Episode End#
When episode length is greater than 1000:
Trucated = True
.
Level0#
Agent needs to navigate to Goal’s location.
Specific Observation Space |
Box(-inf, inf, (16,), float64) |
---|---|
Specific Observation High |
inf |
Specific Observation Low |
-inf |
Import |
|
Specific Observation Space#
Size |
Observation |
Min |
Max |
Max Distance |
---|---|---|---|---|
16 |
goal lidar |
0 |
1 |
3 |
Costs#
Nothing.
Randomness#
Scope |
Range |
Distribution |
---|---|---|
rotation of agent and objects |
\([0, 2\pi]\) |
uniform |
location of agent and objects |
\([-1, -1, 1, 1]\) |
uniform |
Level1#
The Agent needs to navigate to the Goal’s location while circumventing Hazards, Vases=1 but does not participate in the cost calculation.
Specific Observation Space |
Box(-inf, inf, (48,), float64) |
---|---|
Specific Observation High |
inf |
Specific Observation Low |
-inf |
Import |
|
Specific Observation Space#
Size |
Observation |
Min |
Max |
Max Distance |
---|---|---|---|---|
16 |
goal lidar |
0 |
1 |
3 |
16 |
hazards lidar |
0 |
1 |
3 |
16 |
vases lidar |
0 |
1 |
3 |
Costs#
Object |
Num |
Activated Constraint |
---|---|---|
8 |
||
1 |
nothing |
Randomness#
Scope |
Range |
Distribution |
---|---|---|
rotation of agent and objects |
\([0, 2\pi]\) |
uniform |
location of agent and objects |
\([-1.5, -1.5, 1.5, 1.5]\) |
uniform |
Level2#
The Agent needs to navigate to the Goal’s location while circumventing more Hazards and Vases.
Specific Observation Space |
Box(-inf, inf, (48,), float64) |
---|---|
Specific Observation High |
inf |
Specific Observation Low |
-inf |
Import |
|
Specific Observation Space#
Size |
Observation |
Min |
Max |
Max Distance |
---|---|---|---|---|
16 |
goal lidar |
0 |
1 |
3 |
16 |
hazards lidar |
0 |
1 |
3 |
16 |
vases lidar |
0 |
1 |
3 |
Costs#
Object |
Num |
Activated Constraint |
---|---|---|
10 |
||
10 |
Randomness#
Scope |
Range |
Distribution |
---|---|---|
rotation of agent and objects |
\([0, 2\pi]\) |
uniform |
location of agent and objects |
\([-2, -2, 2, 2]\) |
uniform |