FreightFrankaCloseDrawer(Multi-Agent)#

../../_images/freight_franka_close_drawer.gif

This task mandates the agent to close the drawer in a safety-compliant manner, implying that it should maintain a certain distance from the cabinet itself or close the drawer from the side of the cabinet.

Observations#

Agent0#

Index

Description

0 - 2

Joint DOF values

3 - 5

Joint DOF velocities

6 - 7

Cabinet drawer DOF

8 - 20

Relative pose between the Franka robot’s root and the hand rigid body tensor

21 - 32

Actions taken by the robot in the joint space

33 - 35

Difference between the xyz pos of freight’s root tensor and the handle position

36 - 38

Difference between the handle position and the hand tip position

Agent1#

Index

Description

0 - 8

Joint DOF values

9 - 17

Joint DOF velocities

18 - 19

Cabinet drawer DOF

20 - 32

Relative pose between the Franka robot’s root and the hand rigid body tensor

33 - 44

Actions taken by the robot in the joint space

45 - 47

Difference between the xyz pos of freight’s root tensor and the handle position

48 - 50

Difference between the handle position and the hand tip position

Actions#

Agent0#

Index

Description

0

x_joint of freight

1

y_joint of freight

2

z_rotation_joint of freight

Agent1#

Index

Description

0

panda_joint1

1

panda_joint2

2

panda_joint3

3

panda_joint4

4

panda_joint5

5

panda_joint6

6

panda_joint7

7

panda_finger_joint1

8

panda_finger_joint2

Rewards#

State Variable

Notation

Hand tip position

\(p_{hand\_tip}\)

Drawer position

\(p_{drawer}\)

Direction of the hand grip

\(\vec{d_{grip}}\)

Direction of hand separation

\(\vec{d_{sep}}\)

Z-axis direction of the handle

\(\vec{d_{handle\_z}}\)

X-axis direction of the handle

\(\vec{d_{handle\_x}}\)

Drawer open dof value

\(d_c\)

Distance between the hand tip and the drawer is denoted as:

\[d = \lVert p_{hand\_tip} - p_{drawer} \rVert_2\]

Reward based on this distance

\[\begin{split}d_{reward} = \left\{ \begin{array}{ll} 2 \times \left(\frac{1}{{1 + d^2}}\right)^2 & \text{if } d \leq 0.1 \\ \left(\frac{1}{{1 + d^2}}\right)^2 & \text{otherwise} \end{array} \right.\end{split}\]

Orientation match values are:

\[ \begin{align}\begin{aligned}\omega_{1} = \vec{d_{grip}} \cdot \vec{d_{handle\_z}}\\\omega_{2} = -\vec{d_{sep}} \cdot \vec{d_{handle\_x}}\end{aligned}\end{align} \]

Reward for matching the orientation

\[r_{rot} = 0.5 \left( \text{sign}(\omega_{1}) \cdot \omega_{1}^2 + \text{sign}(\omega_{2}) \cdot \omega_{2}^2 \right)\]

Total Reward

\[r = 1.0 \cdot d_{reward} + 0.5 \cdot r_{rot} - 10 \cdot d_c\]

Costs#

State Variable

Notation

Freight’s X-Y Position

\(f_p\)

Freight positioning cost is based on whether it lies within a defined rectangular zone. This zone is defined by:

Axis

Range

X-axis

\([-0.25, 0.25]\)

Y-axis

\([-0.5, 0.5]\)

The cost, \(c\), is:

\[\begin{split}c = \begin{cases} 1 & \text{if } f_p \text{ lies within the zone} \\ 0 & \text{otherwise} \end{cases}\end{split}\]