ShadowHandCatchOver2UnderarmSafeFinger(Multi-Agent)#

Agent

ShadowHands

../../_images/shadow_hand_catch_over2_underarm_safe_finger.gif

This task is inspired by the Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning and is based on the proposed ShadowHandCatchOver2Underarm. Drawing inspiration from the real-world characteristics of ShadowHand, it incorporates constraints on the fingers.

The object needs to be thrown from the vertical hand to the palm-up hand.

Observations#

Agent0#

Index

Description

0 - 23

right Shadow Hand dof position

24 - 47

right Shadow Hand dof velocity

48 - 71

right Shadow Hand dof force

72 - 136

right Shadow Hand fingertip pose, linear velocity, angle velocity (5 x 13)

137 - 166

right Shadow Hand fingertip force, torque (5 x 6)

167 - 169

right Shadow Hand base position

170 - 172

right Shadow Hand base rotation

173 - 198

right Shadow Hand actions

199 - 205

object pose

206 - 208

object linear velocity

209 - 211

object angle velocity

212 - 218

goal pose

219 - 222

goal rot - object rot

Agent1#

Index

Description

0 - 23

left Shadow Hand dof position

24 - 47

left Shadow Hand dof velocity

48 - 71

left Shadow Hand dof force

72 - 136

left Shadow Hand fingertip pose, linear velocity, angle velocity (5 x 13)

137 - 166

left Shadow Hand fingertip force, torque (5 x 6)

167 - 169

left Shadow Hand base position

170 - 172

left Shadow Hand base rotation

173 - 198

left Shadow Hand actions

199 - 205

object pose

206 - 208

object linear velocity

209 - 211

object angle velocity

212 - 218

goal pose

219 - 222

goal rot - object rot

Actions#

Agent0#

Index

Description

0 - 19

right Shadow Hand actuated joint

20 - 22

right Shadow Hand base translation

23 - 25

right Shadow Hand base rotation

Agent1#

Index

Description

0 - 19

left Shadow Hand actuated joint

20 - 22

left Shadow Hand base translation

23 - 25

left Shadow Hand base rotation

Rewards#

Let’s denote the positions of the object and the goal as \(x_o\) and \(x_g\), respectively. The translational position difference between the object and the goal, denoted as \(d_t\), can be calculated as:

\[d_t = \Vert x_o - x_g \Vert_2\]

Additionally, we define the angular position difference between the object and the goal as \(d_a\). The rotational difference, denoted as \(d_r\), is given by the formula:

\[d_r = 2\arcsin(\text{{clamp}}(\Vert d_a \Vert_2, \text{{max}} = 1.0))\]

Finally, the rewards are determined using the specific formula:

\[r = \exp[-0.2(\alpha d_t + d_r)]\]

Here, \(\alpha\) represents a constant that balances the translational and rotational rewards.

Costs#

../../_images/shadow_hand_dof.jpg
../../_images/shadow_hand_safe_finger.jpg

Safety Finger constrains the freedom of joints 2, 3, and 4 of the forefinger. Without the constraint, joints 2 and 3 have freedom of \([0^\circ,90^\circ]\) and joint 4 of \([-20^\circ,20^\circ]\). The safety tasks restrict joints 2, 3, and 4 within \([22.5^\circ, 67.5^\circ]\), \([22.5^\circ, 67.5^\circ]\), and \([-10^\circ, 10^\circ]\) respectively. Let \(\mathtt{ang\_2}, \mathtt{ang\_3}, \mathtt{ang\_4}\) be the angles of joints 2, 3, 4, and the cost is defined as:

\[c_t = \mathbb{I}( \mathtt{ang\_2} \not\in [22.5^\circ, 67.5^\circ], \text{ or } \mathtt{ang\_3} \not\in [22.5^\circ, 67.5^\circ], \text{ or } \mathtt{ang\_4} \not\in [-10^\circ, 10^\circ] ).\]