rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.qbhand_synergy_grasp#
1. Module Contents#
1.1. Classes#
This class corresponds to the GraspAndPlace task. This environment consists of dual-hands, an object and a bucket that requires us to pick up the object and put it into the bucket. |
1.2. Functions#
Compute the reward of all environment. |
|
1.3. API#
- class rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.qbhand_synergy_grasp.QbSoftHandSynergyGraspTask(cfg, rl_device, sim_device, graphics_device_id, headless, virtual_screen_capture, force_render, agent_index=[[[0, 1, 2, 3, 4, 5]], [[0, 1, 2, 3, 4, 5]]], is_multi_agent=False)[source]#
Bases:
rofunc.learning.RofuncRL.tasks.isaacgymenv.base.vec_task.VecTaskThis class corresponds to the GraspAndPlace task. This environment consists of dual-hands, an object and a bucket that requires us to pick up the object and put it into the bucket.
Initialization
Initialise the VecTask.
- Args:
config: config dictionary for the environment. sim_device: the device to simulate physics on. eg. ‘cuda:0’ or ‘cpu’ graphics_device_id: the device ID to render with. headless: Set to False to disable viewer rendering. virtual_screen_capture: Set to True to allow the users get captured screen in RGB array via env.render(mode=’rgb_array’). force_render: Set to True to always force rendering in the steps (if the control_freq_inv is greater than 1 we suggest stting this arg to True)
- create_sim()[source]#
Allocates which device will simulate and which device will render the scene. Defines the simulation type to be used
- compute_reward(actions)[source]#
- Compute the reward of all environment. The core function is compute_hand_reward(
self.rew_buf, self.reset_buf, self.reset_goal_buf, self.progress_buf, self.successes, self.consecutive_successes, self.max_episode_length, self.object_pos, self.object_rot, self.goal_pos, self.goal_rot, self.block_right_handle_pos, self.block_left_handle_pos, self.left_hand_pos, self.right_hand_pos, self.right_hand_ff_pos, self.right_hand_mf_pos, self.right_hand_rf_pos, self.right_hand_lf_pos, self.right_hand_th_pos, self.left_hand_ff_pos, self.left_hand_mf_pos, self.left_hand_rf_pos, self.left_hand_lf_pos, self.left_hand_th_pos, self.dist_reward_scale, self.rot_reward_scale, self.rot_eps, self.actions, self.action_penalty_scale, self.success_tolerance, self.reach_goal_bonus, self.fall_dist, self.fall_penalty, self.max_consecutive_successes, self.av_factor, (self.object_type == “pen”)
) , which we will introduce in detail there
- Args:
actions (tensor): Actions of agents in the all environment
- compute_observations()[source]#
Compute the observations of all environment. The core function is self.compute_full_state(True), which we will introduce in detail there
- compute_full_state(asymm_obs=False)[source]#
Compute the observations of all environment. The observation is composed of three parts: the state values of the left and right hands, and the information of objects and target. The state values of the left and right hands were the same for each task, including hand joint and finger positions, velocity, and force information. The detail 361-dimensional observational space as shown in below:
Index Description 0 - 14 right shadow hand dof position 15 - 29 right shadow hand dof velocity 30 - 44 right shadow hand dof force 45 - 109 right shadow hand fingertip pose, linear velocity, angle velocity (5 x 13) 110 - 139 right shadow hand fingertip force, torque (5 x 6) 140 - 142 right shadow hand base position 143 - 145 right shadow hand base rotation 146 - 166 right shadow hand actions 167 - 181 left shadow hand dof position 182 - 196 left shadow hand dof velocity 197 - 211 left shadow hand dof force 212 - 276 left shadow hand fingertip pose, linear velocity, angle velocity (5 x 13) 277 - 306 left shadow hand fingertip force, torque (5 x 6) 307 - 309 left shadow hand base position 310 - 312 left shadow hand base rotation 313 - 333 left shadow hand actions 334 - 340 object pose 341 - 343 object linear velocity 344 - 346 object angle velocity 347 - 349 block right handle position 350 - 353 block right handle rotation 354 - 356 block left handle position 357 - 360 block left handle rotation
- reset_target_pose(env_ids, apply_reset=False)[source]#
Reset and randomize the goal pose
- Args:
env_ids (tensor): The index of the environment that needs to reset goal pose
apply_reset (bool): Whether to reset the goal directly here, usually used when the same task wants to complete multiple goals
- reset_idx(env_ids, goal_env_ids)[source]#
Reset and randomize the environment
- Args:
env_ids (tensor): The index of the environment that needs to reset
goal_env_ids (tensor): The index of the environment that only goals need reset
- pre_physics_step(actions)[source]#
The pre-processing of the physics step. Determine whether the reset environment is needed, and calculate the next movement of Shadowhand through the given action. The 52-dimensional action space as shown in below:
Index Description 0 - 2 right shadow hand base translation 3 - 5 right shadow hand base rotation 6 - 20 right shadow hand actuated joint 21 - 23 left shadow hand base translation 24 - 26 left shadow hand base rotation 27 - 41 left shadow hand actuated joint
- Args:
actions (tensor): Actions of agents in the all environment
- rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.qbhand_synergy_grasp.depth_image_to_point_cloud_GPU(camera_tensor, camera_view_matrix_inv, camera_proj_matrix, u, v, width: float, height: float, depth_bar: float, device: rofunc.learning.RofuncRL.tasks.utils.torch_jit_utils.torch.device)#
- rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.qbhand_synergy_grasp.compute_hand_reward(rew_buf, reset_buf, reset_goal_buf, progress_buf, successes, consecutive_successes, max_episode_length: float, object_pos, object_rot, hand_pos, hand_ff_pos, hand_mf_pos, hand_rf_pos, hand_lf_pos, hand_th_pos, dist_reward_scale: float, rot_reward_scale: float, rot_eps: float, actions, action_penalty_scale: float, success_tolerance: float, reach_goal_bonus: float, fall_dist: float, fall_penalty: float, max_consecutive_successes: int, av_factor: float, ignore_z_rot: bool, prev_synergy_actions)#
Compute the reward of all environment.
- Args:
rew_buf (tensor): The reward buffer of all environments at this time
reset_buf (tensor): The reset buffer of all environments at this time
reset_goal_buf (tensor): The only-goal reset buffer of all environments at this time
progress_buf (tensor): The porgress buffer of all environments at this time
successes (tensor): The successes buffer of all environments at this time
consecutive_successes (tensor): The consecutive successes buffer of all environments at this time
max_episode_length (float): The max episode length in this environment
object_pos (tensor): The position of the object
object_rot (tensor): The rotation of the object
target_pos (tensor): The position of the target
target_rot (tensor): The rotate of the target
block_right_handle_pos (tensor): The position of the right block handle
- right_hand_ff_pos, right_hand_mf_pos, right_hand_rf_pos, right_hand_lf_pos, right_hand_th_pos (tensor): The position of the five fingers
of the right hand
dist_reward_scale (float): The scale of the distance reward
rot_reward_scale (float): The scale of the rotation reward
rot_eps (float): The epsilon of the rotation calculate
actions (tensor): The action buffer of all environments at this time
action_penalty_scale (float): The scale of the action penalty reward
success_tolerance (float): The tolerance of the success determined
reach_goal_bonus (float): The reward given when the object reaches the goal
fall_dist (float): When the object is far from the Shadowhand, it is judged as falling
fall_penalty (float): The reward given when the object is fell
max_consecutive_successes (float): The maximum of the consecutive successes
av_factor (float): The average factor for calculate the consecutive successes
- ignore_z_rot (bool): Is it necessary to ignore the rot of the z-axis, which is usually used
for some specific objects (e.g. pen)
- rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.qbhand_synergy_grasp.randomize_rotation(rand0, rand1, x_unit_tensor, y_unit_tensor)#
- rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.qbhand_synergy_grasp.randomize_rotation_pen(rand0, rand1, max_angle, x_unit_tensor, y_unit_tensor, z_unit_tensor)#