rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.shadow_hand_over#

1.  Module Contents#

1.1.  Classes#

ShadowHandOverTask

This class corresponds to the HandOver task. This environment consists of two shadow hands with palms facing up, opposite each other, and an object that needs to be passed. In the beginning, the object will fall randomly in the area of the shadow hand on the right side. Then the hand holds the object and passes the object to the other hand. Note that the base of the hand is fixed. More importantly, the hand which holds the object initially can not directly touch the target, nor can it directly roll the object to the other hand, so the object must be thrown up and stays in the air in the process

1.2.  Functions#

depth_image_to_point_cloud_GPU

compute_hand_reward

Compute the reward of all environment.

randomize_rotation

randomize_rotation_pen

1.3.  API#

class rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.shadow_hand_over.ShadowHandOverTask(cfg, rl_device, sim_device, graphics_device_id, headless, virtual_screen_capture, force_render, agent_index=[[[0, 1, 2, 3, 4, 5]], [[0, 1, 2, 3, 4, 5]]], is_multi_agent=False)[source]#

Bases: rofunc.learning.RofuncRL.tasks.isaacgymenv.base.vec_task.VecTask

This class corresponds to the HandOver task. This environment consists of two shadow hands with palms facing up, opposite each other, and an object that needs to be passed. In the beginning, the object will fall randomly in the area of the shadow hand on the right side. Then the hand holds the object and passes the object to the other hand. Note that the base of the hand is fixed. More importantly, the hand which holds the object initially can not directly touch the target, nor can it directly roll the object to the other hand, so the object must be thrown up and stays in the air in the process

Initialization

Parameters:
  • cfg

  • rl_device

  • sim_device

  • graphics_device_id

  • headless

  • virtual_screen_capture

  • force_render

  • agent_index – Specifies how to divide the agents of the hands, useful only when using a multi-agent algorithm. It contains two lists, representing the left hand and the right hand. Each list has six numbers from 0 to 5, representing the palm, middle finger, ring finger, tail finger, index finger, and thumb. Each part can be combined arbitrarily, and if placed in the same list, it means that it is divided into the same agent. The default setting is [[[0, 1, 2, 3, 4, 5]], [[0, 1, 2, 3, 4, 5]]], which means that the two whole hands are regarded as one agent respectively.

  • is_meta – Specifies whether it is a multi-agent environment

create_sim()[source]#

Allocates which device will simulate and which device will render the scene. Defines the simulation type to be used

compute_reward(actions)[source]#
Compute the reward of all environment. The core function is compute_hand_reward(

self.rew_buf, self.reset_buf, self.reset_goal_buf, self.progress_buf, self.successes, self.consecutive_successes, self.max_episode_length, self.object_pos, self.object_rot, self.goal_pos, self.goal_rot, self.dist_reward_scale, self.rot_reward_scale, self.rot_eps, self.actions, self.action_penalty_scale, self.success_tolerance, self.reach_goal_bonus, self.fall_dist, self.fall_penalty, self.max_consecutive_successes, self.av_factor, (self.object_type == “pen”)

) , which we will introduce in detail there

Args:

actions (tensor): Actions of agents in the all environment

compute_observations()[source]#

Compute the observations of all environment. The core function is self.compute_full_state(True), which we will introduce in detail there

compute_full_state(asymm_obs=False)[source]#

Compute the observations of all environment. The observation is composed of three parts: the state values of the left and right hands, and the information of objects and target. The state values of the left and right hands were the same for each task, including hand joint and finger positions, velocity, and force information. The detail 422-dimensional observational space as shown in below:

Index Description 0 - 23 right shadow hand dof position 24 - 47 right shadow hand dof velocity 48 - 71 right shadow hand dof force 72 - 136 right shadow hand fingertip pose, linear velocity, angle velocity (5 x 13) 137 - 166 right shadow hand fingertip force, torque (5 x 6) 167 - 169 right shadow hand base position 170 - 172 right shadow hand base rotation 173 - 198 right shadow hand actions 199 - 222 left shadow hand dof position 223 - 246 left shadow hand dof velocity 247 - 270 left shadow hand dof force 271 - 335 left shadow hand fingertip pose, linear velocity, angle velocity (5 x 13) 336 - 365 left shadow hand fingertip force, torque (5 x 6) 366 - 368 left shadow hand base position 369 - 371 left shadow hand base rotation 372 - 397 left shadow hand actions 398 - 404 object pose 405 - 407 object linear velocity 408 - 410 object angle velocity 411 - 417 goal pose 418 - 421 goal rot - object rot

compute_point_cloud_observation(collect_demonstration=False)[source]#

Compute the observations of all environment. The observation is composed of three parts: the state values of the left and right hands, and the information of objects and target. The state values of the left and right hands were the same for each task, including hand joint and finger positions, velocity, and force information. The detail 422-dimensional observational space as shown in below:

Index Description 0 - 23 right shadow hand dof position 24 - 47 right shadow hand dof velocity 48 - 71 right shadow hand dof force 72 - 136 right shadow hand fingertip pose, linear velocity, angle velocity (5 x 13) 137 - 166 right shadow hand fingertip force, torque (5 x 6) 167 - 169 right shadow hand base position 170 - 172 right shadow hand base rotation 173 - 198 right shadow hand actions 199 - 222 left shadow hand dof position 223 - 246 left shadow hand dof velocity 247 - 270 left shadow hand dof force 271 - 335 left shadow hand fingertip pose, linear velocity, angle velocity (5 x 13) 336 - 365 left shadow hand fingertip force, torque (5 x 6) 366 - 368 left shadow hand base position 369 - 371 left shadow hand base rotation 372 - 397 left shadow hand actions 398 - 404 object pose 405 - 407 object linear velocity 408 - 410 object angle velocity 411 - 417 goal pose 418 - 421 goal rot - object rot

reset_target_pose(env_ids, apply_reset=False)[source]#

Reset and randomize the goal pose

Args:

env_ids (tensor): The index of the environment that needs to reset goal pose

apply_reset (bool): Whether to reset the goal directly here, usually used when the same task wants to complete multiple goals

reset_idx(env_ids, goal_env_ids)[source]#

Reset and randomize the environment

Args:

env_ids (tensor): The index of the environment that needs to reset

goal_env_ids (tensor): The index of the environment that only goals need reset

pre_physics_step(actions)[source]#

The pre-processing of the physics step. Determine whether the reset environment is needed, and calculate the next movement of Shadowhand through the given action. The 40-dimensional action space as shown in below:

Index Description 0 - 19 right shadow hand actuated joint 20 - 39 left shadow hand actuated joint

Args:

actions (tensor): Actions of agents in the all environment

post_physics_step()[source]#

The post-processing of the physics step. Compute the observation and reward, and visualize auxiliary lines for debug when needed

rand_row(tensor, dim_needed)[source]#
sample_points(points, sample_num=1000, sample_mathed='furthest')[source]#
camera_visulization(is_depth_image=False)[source]#
rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.shadow_hand_over.depth_image_to_point_cloud_GPU(camera_tensor, camera_view_matrix_inv, camera_proj_matrix, u, v, width: float, height: float, depth_bar: float, device: rofunc.learning.RofuncRL.tasks.utils.torch_jit_utils.torch.device)#
rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.shadow_hand_over.compute_hand_reward(rew_buf, reset_buf, reset_goal_buf, progress_buf, successes, consecutive_successes, max_episode_length: float, object_pos, object_rot, target_pos, target_rot, dist_reward_scale: float, rot_reward_scale: float, rot_eps: float, actions, action_penalty_scale: float, success_tolerance: float, reach_goal_bonus: float, fall_dist: float, fall_penalty: float, max_consecutive_successes: int, av_factor: float, ignore_z_rot: bool)#

Compute the reward of all environment.

Args:

rew_buf (tensor): The reward buffer of all environments at this time

reset_buf (tensor): The reset buffer of all environments at this time

reset_goal_buf (tensor): The only-goal reset buffer of all environments at this time

progress_buf (tensor): The porgress buffer of all environments at this time

successes (tensor): The successes buffer of all environments at this time

consecutive_successes (tensor): The consecutive successes buffer of all environments at this time

max_episode_length (float): The max episode length in this environment

object_pos (tensor): The position of the object

object_rot (tensor): The rotation of the object

target_pos (tensor): The position of the target

target_rot (tensor): The rotate of the target

dist_reward_scale (float): The scale of the distance reward

rot_reward_scale (float): The scale of the rotation reward

rot_eps (float): The epsilon of the rotation calculate

actions (tensor): The action buffer of all environments at this time

action_penalty_scale (float): The scale of the action penalty reward

success_tolerance (float): The tolerance of the success determined

reach_goal_bonus (float): The reward given when the object reaches the goal

fall_dist (float): When the object is far from the Shadowhand, it is judged as falling

fall_penalty (float): The reward given when the object is fell

max_consecutive_successes (float): The maximum of the consecutive successes

av_factor (float): The average factor for calculate the consecutive successes

ignore_z_rot (bool): Is it necessary to ignore the rot of the z-axis, which is usually used

for some specific objects (e.g. pen)

rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.shadow_hand_over.randomize_rotation(rand0, rand1, x_unit_tensor, y_unit_tensor)#
rofunc.learning.RofuncRL.tasks.isaacgymenv.hands.shadow_hand_over.randomize_rotation_pen(rand0, rand1, max_angle, x_unit_tensor, y_unit_tensor, z_unit_tensor)#