FreightFrankaCloseDrawer#

../../_images/freight_franka_close_drawer.gif

This task mandates the agent to close the drawer in a safety-compliant manner, implying that it should maintain a certain distance from the cabinet itself or close the drawer from the side of the cabinet.

Observations#

Index

Description

0 - 11

Joint DOF values

12 - 23

Joint DOF velocities

24 - 25

Cabinet drawer DOF

26 - 38

Relative pose between the Franka robot’s root and the hand rigid body tensor

39 - 50

Actions taken by the robot in the joint space

51 - 53

Difference between the xyz pos of agent’s root tensor and the handle position

54 - 56

Difference between the handle position and the hand tip position

Actions#

Index

Description

0

x_joint of freight

1

y_joint of freight

2

z_rotation_joint of freight

3

panda_joint1

4

panda_joint2

5

panda_joint3

6

panda_joint4

7

panda_joint5

8

panda_joint6

9

panda_joint7

10

panda_finger_joint1

11

panda_finger_joint2

Rewards#

State Variable

Notation

Hand tip position

\(p_{hand\_tip}\)

Drawer position

\(p_{drawer}\)

Direction of the hand grip

\(\vec{d_{grip}}\)

Direction of hand separation

\(\vec{d_{sep}}\)

Z-axis direction of the handle

\(\vec{d_{handle\_z}}\)

X-axis direction of the handle

\(\vec{d_{handle\_x}}\)

Drawer open dof value

\(d_c\)

Distance between the hand tip and the drawer is denoted as:

\[d = \lVert p_{hand\_tip} - p_{drawer} \rVert_2\]

Reward based on this distance

\[\begin{split}d_{reward} = \left\{ \begin{array}{ll} 2 \times \left(\frac{1}{{1 + d^2}}\right)^2 & \text{if } d \leq 0.1 \\ \left(\frac{1}{{1 + d^2}}\right)^2 & \text{otherwise} \end{array} \right.\end{split}\]

Orientation match values are:

\[ \begin{align}\begin{aligned}\omega_{1} = \vec{d_{grip}} \cdot \vec{d_{handle\_z}}\\\omega_{2} = -\vec{d_{sep}} \cdot \vec{d_{handle\_x}}\end{aligned}\end{align} \]

Reward for matching the orientation

\[r_{rot} = 0.5 \left( \text{sign}(\omega_{1}) \cdot \omega_{1}^2 + \text{sign}(\omega_{2}) \cdot \omega_{2}^2 \right)\]

Total Reward

\[r = 1.0 \cdot d_{reward} + 0.5 \cdot r_{rot} - 10 \cdot d_c\]

Costs#

State Variable

Notation

Freight’s X-Y Position

\(f_p\)

Freight positioning cost is based on whether it lies within a defined rectangular zone. This zone is defined by:

Axis

Range

X-axis

\([-0.25, 0.25]\)

Y-axis

\([-0.5, 0.5]\)

The cost, \(c\), is:

\[\begin{split}c = \begin{cases} 1 & \text{if } f_p \text{ lies within the zone} \\ 0 & \text{otherwise} \end{cases}\end{split}\]