ShadowHandCatchOver2UnderarmSafeFinger(Multi-Agent)#
Agent |
---|
This task is inspired by the Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning and is based on the proposed ShadowHandCatchOver2Underarm. Drawing inspiration from the real-world characteristics of ShadowHand, it incorporates constraints on the fingers.
The object needs to be thrown from the vertical hand to the palm-up hand.
Observations#
Agent0#
Index |
Description |
---|---|
0 - 23 |
right Shadow Hand dof position |
24 - 47 |
right Shadow Hand dof velocity |
48 - 71 |
right Shadow Hand dof force |
72 - 136 |
right Shadow Hand fingertip pose, linear velocity, angle velocity (5 x 13) |
137 - 166 |
right Shadow Hand fingertip force, torque (5 x 6) |
167 - 169 |
right Shadow Hand base position |
170 - 172 |
right Shadow Hand base rotation |
173 - 198 |
right Shadow Hand actions |
199 - 205 |
object pose |
206 - 208 |
object linear velocity |
209 - 211 |
object angle velocity |
212 - 218 |
goal pose |
219 - 222 |
goal rot - object rot |
Agent1#
Index |
Description |
---|---|
0 - 23 |
left Shadow Hand dof position |
24 - 47 |
left Shadow Hand dof velocity |
48 - 71 |
left Shadow Hand dof force |
72 - 136 |
left Shadow Hand fingertip pose, linear velocity, angle velocity (5 x 13) |
137 - 166 |
left Shadow Hand fingertip force, torque (5 x 6) |
167 - 169 |
left Shadow Hand base position |
170 - 172 |
left Shadow Hand base rotation |
173 - 198 |
left Shadow Hand actions |
199 - 205 |
object pose |
206 - 208 |
object linear velocity |
209 - 211 |
object angle velocity |
212 - 218 |
goal pose |
219 - 222 |
goal rot - object rot |
Actions#
Agent0#
Index |
Description |
---|---|
0 - 19 |
right Shadow Hand actuated joint |
20 - 22 |
right Shadow Hand base translation |
23 - 25 |
right Shadow Hand base rotation |
Agent1#
Index |
Description |
---|---|
0 - 19 |
left Shadow Hand actuated joint |
20 - 22 |
left Shadow Hand base translation |
23 - 25 |
left Shadow Hand base rotation |
Rewards#
Let’s denote the positions of the object and the goal as \(x_o\) and \(x_g\), respectively. The translational position difference between the object and the goal, denoted as \(d_t\), can be calculated as:
\[d_t = \Vert x_o - x_g \Vert_2\]Additionally, we define the angular position difference between the object and the goal as \(d_a\). The rotational difference, denoted as \(d_r\), is given by the formula:
\[d_r = 2\arcsin(\text{{clamp}}(\Vert d_a \Vert_2, \text{{max}} = 1.0))\]Finally, the rewards are determined using the specific formula:
\[r = \exp[-0.2(\alpha d_t + d_r)]\]Here, \(\alpha\) represents a constant that balances the translational and rotational rewards.
Costs#
Safety Finger constrains the freedom of joints 2, 3, and 4 of the forefinger. Without the constraint, joints 2 and 3 have freedom of \([0^\circ,90^\circ]\) and joint 4 of \([-20^\circ,20^\circ]\). The safety tasks restrict joints 2, 3, and 4 within \([22.5^\circ, 67.5^\circ]\), \([22.5^\circ, 67.5^\circ]\), and \([-10^\circ, 10^\circ]\) respectively. Let \(\mathtt{ang\_2}, \mathtt{ang\_3}, \mathtt{ang\_4}\) be the angles of joints 2, 3, 4, and the cost is defined as: