Robotic manipulation remains one of the most important research directions in embodied AI. While traditional kinematics-based controllers provide stable motion execution, they often struggle in unstructured environments where adaptability is required.
Recent advances in Reinforcement Learning (RL) have enabled robotic arms to learn task-oriented behaviors directly from interaction, making it possible to achieve robust control policies without manually designing every motion strategy.
In this project, we extend the original SO-ARM101 Isaac Lab implementation by integrating the AgileX Dual-Arm Nero Manipulator, allowing developers to quickly train and validate dual-arm RL policies using NVIDIA Isaac Lab.
ReposityOpen-source implementation:
https://github.com/smalleha/isaac_so_arm101.git
βββ robotsβ βββ dual_neroβ β βββ dual_nero.pyβ β βββ __init__.pyβ β βββ meshesβ β βββ urdfβ β βββ dual_nero.urdfβββ scriptsβ βββ rl_gamesβ β βββ play.pyβ β βββ train.pyβ βββ zero_agent.pyβββ tasksβ βββ __init__.pyβ βββ reachβ βββ agentsβ βββ dual_nero_joint_pos_env_cfg.pyβ βββ dual_nero_reach_env_cfg.pyβ βββ __init__.pyβ βββ mdpβββ ui_extension_example.py
Key additions include:
Before importing the robot into Isaac Lab, mesh references inside the URDF should be converted to relative paths.
For example:
<link name="base_link"> <inertial> <origin rpy="0 0 0" xyz="-0.0592395620981769 -0.068642440505388 0.0562764736144042"/> <mass value="4.46524857458863"/> <inertia ixx="0.0608289280191989" ixy="-2.54649959130438E-06" ixz="1.11851948046933E-07" iyy="0.0218722514454004" iyz="-0.000689477252402357" izz="0.0680524540174318"/> </inertial> <visual> <origin rpy="0 0 0" xyz="0 0 0"/> <geometry> <mesh filename="../meshes/base_link.dae"/> </geometry> <material name=""> <color rgba="0.776470588235294 0.756862745098039 0.737254901960784 1"/> </material> </visual> <collision> <origin rpy="0 0 0" xyz="0 0 0"/> <geometry> <mesh filename="../meshes/base_link.dae"/> </geometry> </collision> </link>
This ensures that the asset can correctly locate mesh resources during simulation.
3.2 Creating the Robot ConfigurationNext, create a robot configuration file:
isaac_so_arm101/robots/dual_nero/dual_nero.py
This file defines:
- Robot articulation properties
- Joint stiffness and damping
- Actuator configuration
- Initial joint states
- Gripper settings
The resulting DUAL_NERO_CFG
object becomes the robot asset used by Isaac Lab during training.
from pathlib import Pathimport isaaclab.sim as sim_utilsfrom isaaclab.actuators import ImplicitActuatorCfgfrom isaaclab.assets.articulation import ArticulationCfgTEMPLATE_ASSETS_DATA_DIR = Path(__file__).resolve().parentDUAL_NERO_CFG = ArticulationCfg( spawn=sim_utils.UrdfFileCfg( fix_base=True, merge_fixed_joints=False, replace_cylinders_with_capsules=True, asset_path=f"{TEMPLATE_ASSETS_DATA_DIR}/urdf/dual_nero.urdf", activate_contact_sensors=False, # set as false while waiting for capsule implementation rigid_props=sim_utils.RigidBodyPropertiesCfg( disable_gravity=False, max_depenetration_velocity=5.0, ), articulation_props=sim_utils.ArticulationRootPropertiesCfg( enabled_self_collisions=True, solver_position_iteration_count=8, solver_velocity_iteration_count=0, ), joint_drive=sim_utils.UrdfConverterCfg.JointDriveCfg( gains=sim_utils.UrdfConverterCfg.JointDriveCfg.PDGainsCfg(stiffness=0, damping=0) ), ), init_state=ArticulationCfg.InitialStateCfg( rot=(1.0, 0.0, 0.0, 0.0), joint_pos={ "left_joint.*": 0.0, "right_joint.*": 0.0, "left_gripper_joint.*": 0.0, "right_gripper_joint.*": 0.0, }, # Set initial joint velocities to zero joint_vel={".*": 0.0}, ), actuators={ "arm": ImplicitActuatorCfg( joint_names_expr=["left_joint.*", "right_joint.*"], effort_limit=25.0, velocity_limit=1.5, stiffness={ "left_joint1": 200.0, "left_joint2": 170.0, "left_joint3": 120.0, "left_joint4": 80.0, "left_joint5": 50.0, "left_joint6": 20.0, "left_joint7": 10.0, "right_joint1": 200.0, "right_joint2": 170.0, "right_joint3": 120.0, "right_joint4": 80.0, "right_joint5": 50.0, "right_joint6": 20.0, "right_joint7": 10.0 }, damping={ "left_joint1": 100.0, "left_joint2": 60.0, "left_joint3": 70.0, "left_joint4": 24.0, "left_joint5": 20.0, "left_joint6": 10.0, "left_joint7": 5, "right_joint1": 100.0, "right_joint2": 60.0, "right_joint3": 70.0, "right_joint4": 24.0, "right_joint5": 20.0, "right_joint6": 10.0, "right_joint7": 5, }, ), "gripper": ImplicitActuatorCfg( joint_names_expr=["left_gripper_joint.*","right_gripper_joint.*"], effort_limit_sim=22, # Increased from 1.9 to 2.5 for stronger grip velocity_limit_sim=1.5, stiffness=800.0, # Increased from 25.0 to 60.0 for more reliable closing damping=20.0, # Increased from 10.0 to 20.0 for stability ), }, soft_joint_pos_limit_factor=0.9,)
Create an **init**.py file inside the dual_nero directory so that Python recognizes it as a package.
Two environment configuration files are required:
tasks/reach/βββ dual_nero_joint_pos_env_cfg.pyβββ dual_nero_reach_env_cfg.py
4.1 Joint Position Environment Configuration
dual_nero_joint_pos_env_cfg.py
specifies:
- Controlled joints
- End-effector links
- Action spaces
- Command targets
import mathimport isaaclab_tasks.manager_based.manipulation.reach.mdp as mdpfrom isaaclab.utils import configclassfrom isaac_so_arm101.robots import DUAL_NERO_CFG # noqa: F401 from isaac_so_arm101.tasks.reach.dual_nero_reach_env_cfg import Dual_NeroReachEnvCfgfrom isaaclab.assets.articulation import ArticulationCfg@configclassclass Dual_Nero_ReachEnvCfg(Dual_NeroReachEnvCfg): def __post_init__(self): # post init of parent super().__post_init__() # switch robot to OpenArm self.scene.robot = DUAL_NERO_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot",) # override rewards self.rewards.left_end_effector_position_tracking.params["asset_cfg"].body_names = ["left_gripper_base"] self.rewards.left_end_effector_position_tracking_fine_grained.params["asset_cfg"].body_names = [ "left_gripper_base" ] self.rewards.left_end_effector_orientation_tracking.params["asset_cfg"].body_names = ["left_gripper_base"] self.rewards.right_end_effector_position_tracking.params["asset_cfg"].body_names = ["right_gripper_base"] self.rewards.right_end_effector_position_tracking_fine_grained.params["asset_cfg"].body_names = [ "right_gripper_base" ] self.rewards.right_end_effector_orientation_tracking.params["asset_cfg"].body_names = ["right_gripper_base"] # override actions self.actions.left_arm_action = mdp.JointPositionActionCfg( asset_name="robot", joint_names=[ "left_joint.*", ], scale=0.5, use_default_offset=True, ) self.actions.right_arm_action = mdp.JointPositionActionCfg( asset_name="robot", joint_names=[ "right_joint.*", ], scale=0.5, use_default_offset=True, ) # override command generator body # end-effector is along z-direction self.commands.left_ee_pose.body_name = "left_gripper_base" self.commands.right_ee_pose.body_name = "right_gripper_base"@configclass class Dual_Nero_ReachEnvCfg_PLAY(Dual_Nero_ReachEnvCfg): def __post_init__(self): # post init of parent super().__post_init__() # make a smaller scene for play self.scene.num_envs = 50 self.scene.env_spacing = 2.5 # disable randomization for play self.observations.policy.enable_corruption = False
4.2 Reach Task Definitiondual_nero_reach_env_cfg.py
contains the full RL environment definition.
This includes:
1.Scene Configuration
- Ground plane
- Lighting
- Robot asset
2.Command Generation
3.Observation Space
4.Reward Design
5.Curriculum Learning
6.Episode Settings
import mathfrom dataclasses import MISSINGimport isaaclab.sim as sim_utilsfrom isaaclab.assets import ArticulationCfg, AssetBaseCfgfrom isaaclab.envs import ManagerBasedRLEnvCfgfrom isaaclab.managers import ActionTermCfg as ActionTermfrom isaaclab.managers import CurriculumTermCfg as CurrTermfrom isaaclab.managers import EventTermCfg as EventTermfrom isaaclab.managers import ObservationGroupCfg as ObsGroupfrom isaaclab.managers import ObservationTermCfg as ObsTermfrom isaaclab.managers import RewardTermCfg as RewTermfrom isaaclab.managers import SceneEntityCfgfrom isaaclab.managers import TerminationTermCfg as DoneTermfrom isaaclab.scene import InteractiveSceneCfgfrom isaaclab.utils import configclassfrom isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoiseimport isaaclab_tasks.manager_based.manipulation.reach.mdp as mdp### Scene definition##@configclassclass Dual_NeroReachSceneCfg(InteractiveSceneCfg): """Configuration for the scene with a robotic arm.""" # world ground = AssetBaseCfg( prim_path="/World/ground", spawn=sim_utils.GroundPlaneCfg(), init_state=AssetBaseCfg.InitialStateCfg(pos=(0.0, 0.0, 0)), ) # robots robot: ArticulationCfg = MISSING # lights light = AssetBaseCfg( prim_path="/World/light", spawn=sim_utils.DomeLightCfg(color=(0.75, 0.75, 0.75), intensity=2500.0), )### MDP settings##@configclassclass CommandsCfg: """Command terms for the MDP.""" left_ee_pose = mdp.UniformPoseCommandCfg( asset_name="robot", body_name=MISSING, resampling_time_range=(4.0, 4.0), debug_vis=True, ranges=mdp.UniformPoseCommandCfg.Ranges( pos_x=(0.5, 0.5), pos_y=(0.15, 0.25), pos_z=(0.3, 0.5), roll=(-math.pi / 6, math.pi / 6), pitch=(3 * math.pi / 2, 3 * math.pi / 2), yaw=(8 * math.pi / 9, 10 * math.pi / 9), ), ) right_ee_pose = mdp.UniformPoseCommandCfg( asset_name="robot", body_name=MISSING, resampling_time_range=(4.0, 4.0), debug_vis=True, ranges=mdp.UniformPoseCommandCfg.Ranges( pos_x=(0.5, 0.5), pos_y=(-0.25, -0.15), pos_z=(0.3, 0.5), roll=(-math.pi / 6, math.pi / 6), pitch=(3 * math.pi / 2, 3 * math.pi / 2), yaw=(8 * math.pi / 9, 10 * math.pi / 9), ), )@configclassclass ActionsCfg: """Action specifications for the MDP.""" left_arm_action: ActionTerm = MISSING right_arm_action: ActionTerm = MISSING@configclassclass ObservationsCfg: """Observation specifications for the MDP.""" @configclass class PolicyCfg(ObsGroup): """Observations for policy group.""" # observation terms (order preserved) left_joint_pos = ObsTerm( func=mdp.joint_pos_rel, params={ "asset_cfg": SceneEntityCfg( "robot", joint_names=[ "left_joint.*", ], ) }, noise=Unoise(n_min=-0.01, n_max=0.01), ) right_joint_pos = ObsTerm( func=mdp.joint_pos_rel, params={ "asset_cfg": SceneEntityCfg( "robot", joint_names=[ "right_joint.*", ], ) }, noise=Unoise(n_min=-0.01, n_max=0.01), ) left_joint_vel = ObsTerm( func=mdp.joint_vel_rel, params={ "asset_cfg": SceneEntityCfg( "robot", joint_names=[ "left_joint.*", ], ) }, noise=Unoise(n_min=-0.01, n_max=0.01), ) right_joint_vel = ObsTerm( func=mdp.joint_vel_rel, params={ "asset_cfg": SceneEntityCfg( "robot", joint_names=[ "right_joint.*", ], ) }, noise=Unoise(n_min=-0.01, n_max=0.01), ) left_pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "left_ee_pose"}) right_pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "right_ee_pose"}) left_actions = ObsTerm(func=mdp.last_action, params={"action_name": "left_arm_action"}) right_actions = ObsTerm(func=mdp.last_action, params={"action_name": "right_arm_action"}) def __post_init__(self): self.enable_corruption = True self.concatenate_terms = True # observation groups policy: PolicyCfg = PolicyCfg()@configclassclass EventCfg: """Configuration for events.""" reset_robot_joints = EventTerm( func=mdp.reset_joints_by_scale, mode="reset", params={ "position_range": (0.5, 1.5), "velocity_range": (0.0, 0.0), }, )@configclassclass RewardsCfg: """Reward terms for the MDP.""" left_end_effector_position_tracking = RewTerm( func=mdp.position_command_error, weight=-0.2, params={ "asset_cfg": SceneEntityCfg("robot", body_names=MISSING), "command_name": "left_ee_pose", }, ) right_end_effector_position_tracking = RewTerm( func=mdp.position_command_error, weight=-0.25, params={ "asset_cfg": SceneEntityCfg("robot", body_names=MISSING), "command_name": "right_ee_pose", }, ) left_end_effector_position_tracking_fine_grained = RewTerm( func=mdp.position_command_error_tanh, weight=0.1, params={ "asset_cfg": SceneEntityCfg("robot", body_names=MISSING), "std": 0.1, "command_name": "left_ee_pose", }, ) right_end_effector_position_tracking_fine_grained = RewTerm( func=mdp.position_command_error_tanh, weight=0.2, params={ "asset_cfg": SceneEntityCfg("robot", body_names=MISSING), "std": 0.1, "command_name": "right_ee_pose", }, ) left_end_effector_orientation_tracking = RewTerm( func=mdp.orientation_command_error, weight=-0.1, params={ "asset_cfg": SceneEntityCfg("robot", body_names=MISSING), "command_name": "left_ee_pose", }, ) right_end_effector_orientation_tracking = RewTerm( func=mdp.orientation_command_error, weight=-0.1, params={ "asset_cfg": SceneEntityCfg("robot", body_names=MISSING), "command_name": "right_ee_pose", }, ) # action penalty action_rate = RewTerm(func=mdp.action_rate_l2, weight=-0.0001) left_joint_vel = RewTerm( func=mdp.joint_vel_l2, weight=-0.0001, params={ "asset_cfg": SceneEntityCfg( "robot", joint_names=[ "left_joint.*", ], ) }, ) right_joint_vel = RewTerm( func=mdp.joint_vel_l2, weight=-0.0001, params={ "asset_cfg": SceneEntityCfg( "robot", joint_names=[ "right_joint.*", ], ) }, )@configclassclass TerminationsCfg: """Termination terms for the MDP.""" time_out = DoneTerm(func=mdp.time_out, time_out=True)@configclassclass CurriculumCfg: """Curriculum terms for the MDP.""" action_rate = CurrTerm( func=mdp.modify_reward_weight, params={"term_name": "action_rate", "weight": -0.005, "num_steps": 4500}, ) left_joint_vel = CurrTerm( func=mdp.modify_reward_weight, params={"term_name": "left_joint_vel", "weight": -0.001, "num_steps": 4500}, ) right_joint_vel = CurrTerm( func=mdp.modify_reward_weight, params={"term_name": "right_joint_vel", "weight": -0.001, "num_steps": 4500}, )### Environment configuration##@configclassclass Dual_NeroReachEnvCfg(ManagerBasedRLEnvCfg): """Configuration for the reach end-effector pose tracking environment.""" # Scene settings scene: Dual_NeroReachSceneCfg = Dual_NeroReachSceneCfg(num_envs=4096, env_spacing=2.5) # Basic settings observations: ObservationsCfg = ObservationsCfg() actions: ActionsCfg = ActionsCfg() commands: CommandsCfg = CommandsCfg() # MDP settings rewards: RewardsCfg = RewardsCfg() terminations: TerminationsCfg = TerminationsCfg() events: EventCfg = EventCfg() curriculum: CurriculumCfg = CurriculumCfg() def __post_init__(self): """Post initialization.""" # general settings self.decimation = 2 self.sim.render_interval = self.decimation self.episode_length_s = 24.0 self.viewer.eye = (3.5, 3.5, 3.5) # simulation settings self.sim.dt = 1.0 / 60.0\`\`\`**### 4.3 Registering the Environment** Register the task inside:\`src/isaac_so_arm101/tasks/reach/\__init_\_.py\`\`\`\` PYTHONgym.register( id="Isaac-Dual-Nero-Reach-v0", entry_point="isaaclab.envs:ManagerBasedRLEnv", kwargs={ "env_cfg_entry_point":f"{\__name_\_}.dual_nero_joint_pos_env_cfg:Dual_Nero_ReachEnvCfg", "rsl_rl_cfg_entry_point": f"{agents.\__name_\_}.rsl_rl_ppo_cfg:ReachPPORunnerCfg", "rl_games_cfg_entry_point": f"{agents.\__name_\_}:rl_games_ppo_cfg.yaml", }, disable_env_checker=True,)
- Training the Reach PolicyStep 1.Activate the Isaac Lab Environment
conda activate env_isaaclab
Step 2.Navigate to the project directory:
cd isaac_so_arm101
Step 3.Train the whole project
Option 1:RSL-RL
- Train:
uv run train \ --task Isaac-Dual-Nero-Reach-v0 \ --headless
- Evaluate:
uv run play \ --task Isaac-Dual-Nero-Reach-v0
- Training result:
dual_nero_rsl_rl.gif
Option 2: RL-Games
- Train:
python3 scripts/rl_games/train.py \ --task Isaac-Dual-Nero-Reach-v0 \ --headless
- Evaluate:
python3 scripts/rl_games/play.py \ --task Isaac-Dual-Nero-Reach-v0
- Training result:
dual_nero_rl_games.gif
Both frameworks successfully learn the dual-arm reaching task.
However, in our experiments:
- RL-Games converges faster
- Motion trajectories appear smoother
- Final reaching accuracy is generally higher
For relatively complex robot morphologies such as dual-arm manipulators, RL-Games currently provides more stable performance and is recommended as the default training backend.
FAQQ1: Why is my Dual-Nero robot collapsing or shaking violently after the URDF?
This is usually caused by incorrect actuator parameters or unrealistic inertial properties.
Common causes include:
- Joint stiffness set too high
- Damping values too low
- Incorrect mass distribution in the URDF
- Self-collision configuration issues
- Unstable simulation timestep
Before starting RL training, verify that the robot can remain stable under gravity using only PD control.
Quick check: If the robot cannot stand still without RL, the issue is likely in the robot model rather than the training algorithm.
Q2: Why does the reward improve, but the robot never reaches the target accurately?
A rising reward does not always indicate successful task completion.
Typical reasons include:
- Reward weights are unbalanced
- Orientation rewards dominate position rewards
- Action penalties are too strong
- Command sampling range is too large
- End-effector link is incorrectly configured
In most reach tasks, incorrect reward shaping is the primary reason for poor final accuracy.
Q3: How do I verify that the end-effector link is configured correctly?
One of the most common mistakes in Isaac Lab reach tasks is assigning the wrong end-effector body.
For Dual-Nero, the target link should be:
left_gripper_baseright_gripper_base
Symptoms of an incorrect configuration include:
- Reward remains low
- Robot moves randomly
- Training appears to converge but fails visually
- End-effector does not move toward the target marker
Always verify the body name in Isaac Sim before launching large-scale training.
Q4: Why does RL-Games perform better than RSL-RL for this task?
Both frameworks are PPO-based, but their implementations differ.
For large-scale manipulation environments:
RL-Games generally scales better with thousands of parallel environments
PPO updates are often more stable
Training throughput is higher on modern GPUs
For Dual-Nero reach experiments, RL-Games typically achieves smoother trajectories and faster convergence.
However, results may vary depending on reward design and task complexity.
Q5: My policy works in simulation but fails on the real robot. Why?
This is the most common Sim-to-Real issue.
Possible causes include:
- Joint friction mismatch
- Encoder noise
- Latency differences
- Payload variations
- Inaccurate motor models
- Unmodeled cable effects
To improve transfer performance:
- Apply domain randomization
- Add observation noise
- Randomize dynamics parameters
- Validate trajectories at low speed first
Successful simulation training is only the first step toward real-world deployment.
π· Have Question?If you encounter any issues with environment installation, parameter configuration, or RL training, feel free to leave your questions for further discussion.
[Read more](javascript:void(0))