MultiGoal#

Level	Geom	FreeGeom
0	Goal=2
1	Goal=2, Hazards=8	Vases=1
2	Goal=2, Hazards=10	Vases=10

Agent
Point Car Racecar Doggo Ant

This set of environments is similar to Goal.

Rewards#

Agents are mandated to approach and reach the targets that match their own color as closely as possible. Approaching and reaching targets of mismatched colors yield no rewards. Below is the reward computation formula for an agent in relation to a target of its identical color.

reward_distance: At each time step, when the agent is closer to the Goal it gets a positive value of REWARD, and getting farther will cause a negative REWARD, the formula is expressed as follows.

\[r_t = (D_{last} - D_{now})\beta\]

Obviously when \(D_{last} > D_{now}\), \(r_t>0\). Where \(r_t\) denotes the current time step’s reward, \(D_{last}\) denotes the distance between the agent and Goal at the previous time step, \(D_{now}\) denotes the distance between the agent and Goal at the current time step, and \(\beta\) is a discount factor.

reward_goal: Each time the Goal is reached, get a positive value of the completed goal reward: \(R_{goal}\).

Episode End#

When episode length is greater than 1000: Trucated = True.

Level0#

Agents are required to navigate to targets matching their respective colors.

Specific Observation Space	[Box(-inf, inf, (32,), float64), Box(-inf, inf, (32,), float64)]
Specific Observation High	inf
Specific Observation Low	-inf
Import	`safety_gymnasium.make("Safety[Agent]MultiGoal0-v0")`

Specific Observation Space#

Size	Observation	Min	Max	Max Distance
16	goal_red lidar	0	1	3
16	goal_blue lidar	0	1	3

Costs#

Nothing.

Randomness#

Scope	Range	Distribution
rotation of agents and objects	\([0, 2\pi]\)	uniform
location of agents and objects	\([-1, -1, 1, 1]\)	uniform

Level1#

Agents are obliged to navigate to targets of their corresponding colors, steering clear of collisions with other agents and avoiding stepping on hazards. Although Vases=1, it does not contribute to the cost computation.

Specific Observation Space	[Box(-inf, inf, (64,), float64), Box(-inf, inf, (64,), float64)]
Specific Observation High	inf
Specific Observation Low	-inf
Import	`safety_gymnasium.make("Safety[Agent]MultiGoal1-v0")`

Specific Observation Space#

Size	Observation	Max	Max Distance
16	goal_red lidar	1	3
16	goal_blue lidar	1	3
16	hazards lidar	1	3
16	vases lidar	1	3

Costs#

Object	Num	Activated Constraint
Hazards	8	cost_hazards
Vases	1	nothing

Randomness#

Scope	Range	Distribution
rotation of agents and objects	\([0, 2\pi]\)	uniform
location of agents and objects	\([-1.5, -1.5, 1.5, 1.5]\)	uniform

Level2#

Agents must navigate to targets matching their designated colors, while concurrently evading collisions with other agents and refraining from contact with obstacles or stepping on hazards.

Specific Observation Space	[Box(-inf, inf, (64,), float64), Box(-inf, inf, (64,), float64)]
Specific Observation High	inf
Specific Observation Low	-inf
Import	`safety_gymnasium.make("Safety[Agent]MultiGoal2-v0")`

Specific Observation Space#

Size	Observation	Max	Max Distance
16	goal_red lidar	1	3
16	goal_blue lidar	1	3
16	hazards lidar	1	3
16	vases lidar	1	3

Costs#

Object	Num	Activated Constraint
Hazards	10	cost_hazards
Vases	10	contact , velocity

Randomness#

Scope	Range	Distribution
rotation of agents and objects	\([0, 2\pi]\)	uniform
location of agents and objects	\([-2, -2, 2, 2]\)	uniform