Train a Dual-Arm Nero Reach Task in Isaac Lab

wpnews.pro

Robotic manipulation remains one of the most important research directions in embodied AI. While traditional kinematics-based controllers provide stable motion execution, they often struggle in unstructured environments where adaptability is required.

Recent advances in Reinforcement Learning (RL) have enabled robotic arms to learn task-oriented behaviors directly from interaction, making it possible to achieve robust control policies without manually designing every motion strategy.

In this project, we extend the original SO-ARM101 Isaac Lab implementation by integrating the AgileX Dual-Arm Nero Manipulator, allowing developers to quickly train and validate dual-arm RL policies using NVIDIA Isaac Lab.

ReposityOpen-source implementation:

https://github.com/smalleha/isaac_so_arm101.git

├── robots│   ├── dual_nero│   │   ├── dual_nero.py│   │   ├── __init__.py│   │   ├── meshes│   │   └── urdf│   │       └── dual_nero.urdf├── scripts│   ├── rl_games│   │   ├── play.py│   │   └── train.py│   └── zero_agent.py├── tasks│   ├── __init__.py│   └── reach│       ├── agents│       ├── dual_nero_joint_pos_env_cfg.py│       ├── dual_nero_reach_env_cfg.py│       ├── __init__.py│       └── mdp└── ui_extension_example.py

Key additions include:

Before importing the robot into Isaac Lab, mesh references inside the URDF should be converted to relative paths.

For example:

<link name="base_link">    <inertial>      <origin rpy="0 0 0" xyz="-0.0592395620981769 -0.068642440505388 0.0562764736144042"/>      <mass value="4.46524857458863"/>      <inertia ixx="0.0608289280191989" ixy="-2.54649959130438E-06" ixz="1.11851948046933E-07" iyy="0.0218722514454004" iyz="-0.000689477252402357" izz="0.0680524540174318"/>    </inertial>    <visual>      <origin rpy="0 0 0" xyz="0 0 0"/>      <geometry>        <mesh filename="../meshes/base_link.dae"/>      </geometry>      <material name="">        <color rgba="0.776470588235294 0.756862745098039 0.737254901960784 1"/>      </material>    </visual>    <collision>      <origin rpy="0 0 0" xyz="0 0 0"/>      <geometry>        <mesh filename="../meshes/base_link.dae"/>      </geometry>    </collision>  </link>

This ensures that the asset can correctly locate mesh resources during simulation.

3.2 Creating the Robot ConfigurationNext, create a robot configuration file:

isaac_so_arm101/robots/dual_nero/dual_nero.py

This file defines:

Robot articulation properties
Joint stiffness and damping
Actuator configuration
Initial joint states
Gripper settings

The resulting DUAL_NERO_CFG

object becomes the robot asset used by Isaac Lab during training.

from pathlib import Pathimport isaaclab.sim as sim_utilsfrom isaaclab.actuators import ImplicitActuatorCfgfrom isaaclab.assets.articulation import ArticulationCfgTEMPLATE_ASSETS_DATA_DIR = Path(__file__).resolve().parentDUAL_NERO_CFG = ArticulationCfg(    spawn=sim_utils.UrdfFileCfg(        fix_base=True,        merge_fixed_joints=False,        replace_cylinders_with_capsules=True,        asset_path=f"{TEMPLATE_ASSETS_DATA_DIR}/urdf/dual_nero.urdf",        activate_contact_sensors=False, # set as false while waiting for capsule implementation        rigid_props=sim_utils.RigidBodyPropertiesCfg(            disable_gravity=False,            max_depenetration_velocity=5.0,        ),        articulation_props=sim_utils.ArticulationRootPropertiesCfg(            enabled_self_collisions=True,            solver_position_iteration_count=8,            solver_velocity_iteration_count=0,        ),        joint_drive=sim_utils.UrdfConverterCfg.JointDriveCfg(            gains=sim_utils.UrdfConverterCfg.JointDriveCfg.PDGainsCfg(stiffness=0, damping=0)        ),    ),    init_state=ArticulationCfg.InitialStateCfg(        rot=(1.0, 0.0, 0.0, 0.0),        joint_pos={                "left_joint.*": 0.0,                "right_joint.*": 0.0,                "left_gripper_joint.*": 0.0,                  "right_gripper_joint.*": 0.0,          },        # Set initial joint velocities to zero        joint_vel={".*": 0.0},    ),    actuators={            "arm": ImplicitActuatorCfg(                joint_names_expr=["left_joint.*", "right_joint.*"],                effort_limit=25.0,                 velocity_limit=1.5,                               stiffness={                    "left_joint1": 200.0,                     "left_joint2": 170.0,                    "left_joint3": 120.0,                    "left_joint4": 80.0,                    "left_joint5": 50.0,                    "left_joint6": 20.0,                    "left_joint7": 10.0,                    "right_joint1": 200.0,                     "right_joint2": 170.0,                    "right_joint3": 120.0,                    "right_joint4": 80.0,                    "right_joint5": 50.0,                    "right_joint6": 20.0,                    "right_joint7": 10.0                },                               damping={                    "left_joint1": 100.0,                    "left_joint2": 60.0,                    "left_joint3": 70.0,                    "left_joint4": 24.0,                    "left_joint5": 20.0,                    "left_joint6": 10.0,                    "left_joint7": 5,                    "right_joint1": 100.0,                    "right_joint2": 60.0,                    "right_joint3": 70.0,                    "right_joint4": 24.0,                    "right_joint5": 20.0,                    "right_joint6": 10.0,                    "right_joint7": 5,                },            ),        "gripper": ImplicitActuatorCfg(            joint_names_expr=["left_gripper_joint.*","right_gripper_joint.*"],            effort_limit_sim=22,  # Increased from 1.9 to 2.5 for stronger grip            velocity_limit_sim=1.5,            stiffness=800.0,  # Increased from 25.0 to 60.0 for more reliable closing            damping=20.0,  # Increased from 10.0 to 20.0 for stability        ),        },    soft_joint_pos_limit_factor=0.9,)

Create an **init**.py file inside the dual_nero directory so that Python recognizes it as a package.

Two environment configuration files are required:

tasks/reach/├── dual_nero_joint_pos_env_cfg.py└── dual_nero_reach_env_cfg.py

4.1 Joint Position Environment Configuration

dual_nero_joint_pos_env_cfg.py

specifies:

Controlled joints
End-effector links
Action spaces
Command targets

import mathimport isaaclab_tasks.manager_based.manipulation.reach.mdp as mdpfrom isaaclab.utils import configclassfrom isaac_so_arm101.robots import DUAL_NERO_CFG # noqa: F401   from isaac_so_arm101.tasks.reach.dual_nero_reach_env_cfg import Dual_NeroReachEnvCfgfrom isaaclab.assets.articulation import ArticulationCfg@configclassclass Dual_Nero_ReachEnvCfg(Dual_NeroReachEnvCfg):    def __post_init__(self):        # post init of parent        super().__post_init__()        # switch robot to OpenArm        self.scene.robot = DUAL_NERO_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot",)        # override rewards        self.rewards.left_end_effector_position_tracking.params["asset_cfg"].body_names = ["left_gripper_base"]        self.rewards.left_end_effector_position_tracking_fine_grained.params["asset_cfg"].body_names = [            "left_gripper_base"        ]        self.rewards.left_end_effector_orientation_tracking.params["asset_cfg"].body_names = ["left_gripper_base"]        self.rewards.right_end_effector_position_tracking.params["asset_cfg"].body_names = ["right_gripper_base"]        self.rewards.right_end_effector_position_tracking_fine_grained.params["asset_cfg"].body_names = [            "right_gripper_base"        ]        self.rewards.right_end_effector_orientation_tracking.params["asset_cfg"].body_names = ["right_gripper_base"]        # override actions        self.actions.left_arm_action = mdp.JointPositionActionCfg(            asset_name="robot",            joint_names=[                "left_joint.*",            ],            scale=0.5,            use_default_offset=True,        )        self.actions.right_arm_action = mdp.JointPositionActionCfg(            asset_name="robot",            joint_names=[                "right_joint.*",            ],            scale=0.5,            use_default_offset=True,        )        # override command generator body        # end-effector is along z-direction        self.commands.left_ee_pose.body_name = "left_gripper_base"        self.commands.right_ee_pose.body_name = "right_gripper_base"@configclass    class Dual_Nero_ReachEnvCfg_PLAY(Dual_Nero_ReachEnvCfg):    def __post_init__(self):        # post init of parent        super().__post_init__()        # make a smaller scene for play        self.scene.num_envs = 50        self.scene.env_spacing = 2.5        # disable randomization for play        self.observations.policy.enable_corruption = False

4.2 Reach Task Definitiondual_nero_reach_env_cfg.py

contains the full RL environment definition.

This includes:

1.Scene Configuration

Ground plane
Lighting
Robot asset

2.Command Generation

3.Observation Space

4.Reward Design

5.Curriculum Learning

6.Episode Settings

import mathfrom dataclasses import MISSINGimport isaaclab.sim as sim_utilsfrom isaaclab.assets import ArticulationCfg, AssetBaseCfgfrom isaaclab.envs import ManagerBasedRLEnvCfgfrom isaaclab.managers import ActionTermCfg as ActionTermfrom isaaclab.managers import CurriculumTermCfg as CurrTermfrom isaaclab.managers import EventTermCfg as EventTermfrom isaaclab.managers import ObservationGroupCfg as ObsGroupfrom isaaclab.managers import ObservationTermCfg as ObsTermfrom isaaclab.managers import RewardTermCfg as RewTermfrom isaaclab.managers import SceneEntityCfgfrom isaaclab.managers import TerminationTermCfg as DoneTermfrom isaaclab.scene import InteractiveSceneCfgfrom isaaclab.utils import configclassfrom isaaclab.utils.noise import AdditiveUniformNoiseCfg as Unoiseimport isaaclab_tasks.manager_based.manipulation.reach.mdp as mdp### Scene definition##@configclassclass Dual_NeroReachSceneCfg(InteractiveSceneCfg):    """Configuration for the scene with a robotic arm."""    # world    ground = AssetBaseCfg(        prim_path="/World/ground",        spawn=sim_utils.GroundPlaneCfg(),        init_state=AssetBaseCfg.InitialStateCfg(pos=(0.0, 0.0, 0)),    )    # robots    robot: ArticulationCfg = MISSING    # lights    light = AssetBaseCfg(        prim_path="/World/light",        spawn=sim_utils.DomeLightCfg(color=(0.75, 0.75, 0.75), intensity=2500.0),    )### MDP settings##@configclassclass CommandsCfg:    """Command terms for the MDP."""    left_ee_pose = mdp.UniformPoseCommandCfg(        asset_name="robot",        body_name=MISSING,        resampling_time_range=(4.0, 4.0),        debug_vis=True,        ranges=mdp.UniformPoseCommandCfg.Ranges(            pos_x=(0.5, 0.5),            pos_y=(0.15, 0.25),            pos_z=(0.3, 0.5),            roll=(-math.pi / 6, math.pi / 6),            pitch=(3 * math.pi / 2, 3 * math.pi / 2),            yaw=(8 * math.pi / 9, 10 * math.pi / 9),        ),    )    right_ee_pose = mdp.UniformPoseCommandCfg(        asset_name="robot",        body_name=MISSING,        resampling_time_range=(4.0, 4.0),        debug_vis=True,        ranges=mdp.UniformPoseCommandCfg.Ranges(            pos_x=(0.5, 0.5),            pos_y=(-0.25, -0.15),            pos_z=(0.3, 0.5),            roll=(-math.pi / 6, math.pi / 6),            pitch=(3 * math.pi / 2, 3 * math.pi / 2),            yaw=(8 * math.pi / 9, 10 * math.pi / 9),        ),    )@configclassclass ActionsCfg:    """Action specifications for the MDP."""    left_arm_action: ActionTerm = MISSING    right_arm_action: ActionTerm = MISSING@configclassclass ObservationsCfg:    """Observation specifications for the MDP."""    @configclass    class PolicyCfg(ObsGroup):        """Observations for policy group."""        # observation terms (order preserved)        left_joint_pos = ObsTerm(            func=mdp.joint_pos_rel,            params={                "asset_cfg": SceneEntityCfg(                    "robot",                    joint_names=[                        "left_joint.*",                    ],                )            },            noise=Unoise(n_min=-0.01, n_max=0.01),        )        right_joint_pos = ObsTerm(            func=mdp.joint_pos_rel,            params={                "asset_cfg": SceneEntityCfg(                    "robot",                    joint_names=[                        "right_joint.*",                    ],                )            },            noise=Unoise(n_min=-0.01, n_max=0.01),        )        left_joint_vel = ObsTerm(            func=mdp.joint_vel_rel,            params={                "asset_cfg": SceneEntityCfg(                    "robot",                    joint_names=[                        "left_joint.*",                    ],                )            },            noise=Unoise(n_min=-0.01, n_max=0.01),        )        right_joint_vel = ObsTerm(            func=mdp.joint_vel_rel,            params={                "asset_cfg": SceneEntityCfg(                    "robot",                    joint_names=[                        "right_joint.*",                    ],                )            },            noise=Unoise(n_min=-0.01, n_max=0.01),        )        left_pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "left_ee_pose"})        right_pose_command = ObsTerm(func=mdp.generated_commands, params={"command_name": "right_ee_pose"})        left_actions = ObsTerm(func=mdp.last_action, params={"action_name": "left_arm_action"})        right_actions = ObsTerm(func=mdp.last_action, params={"action_name": "right_arm_action"})        def __post_init__(self):            self.enable_corruption = True            self.concatenate_terms = True    # observation groups    policy: PolicyCfg = PolicyCfg()@configclassclass EventCfg:    """Configuration for events."""    reset_robot_joints = EventTerm(        func=mdp.reset_joints_by_scale,        mode="reset",        params={            "position_range": (0.5, 1.5),            "velocity_range": (0.0, 0.0),        },    )@configclassclass RewardsCfg:    """Reward terms for the MDP."""    left_end_effector_position_tracking = RewTerm(        func=mdp.position_command_error,        weight=-0.2,        params={            "asset_cfg": SceneEntityCfg("robot", body_names=MISSING),            "command_name": "left_ee_pose",        },    )    right_end_effector_position_tracking = RewTerm(        func=mdp.position_command_error,        weight=-0.25,        params={            "asset_cfg": SceneEntityCfg("robot", body_names=MISSING),            "command_name": "right_ee_pose",        },    )    left_end_effector_position_tracking_fine_grained = RewTerm(        func=mdp.position_command_error_tanh,        weight=0.1,        params={            "asset_cfg": SceneEntityCfg("robot", body_names=MISSING),            "std": 0.1,            "command_name": "left_ee_pose",        },    )    right_end_effector_position_tracking_fine_grained = RewTerm(        func=mdp.position_command_error_tanh,        weight=0.2,        params={            "asset_cfg": SceneEntityCfg("robot", body_names=MISSING),            "std": 0.1,            "command_name": "right_ee_pose",        },    )    left_end_effector_orientation_tracking = RewTerm(        func=mdp.orientation_command_error,        weight=-0.1,        params={            "asset_cfg": SceneEntityCfg("robot", body_names=MISSING),            "command_name": "left_ee_pose",        },    )    right_end_effector_orientation_tracking = RewTerm(        func=mdp.orientation_command_error,        weight=-0.1,        params={            "asset_cfg": SceneEntityCfg("robot", body_names=MISSING),            "command_name": "right_ee_pose",        },    )    # action penalty    action_rate = RewTerm(func=mdp.action_rate_l2, weight=-0.0001)    left_joint_vel = RewTerm(        func=mdp.joint_vel_l2,        weight=-0.0001,        params={            "asset_cfg": SceneEntityCfg(                "robot",                joint_names=[                    "left_joint.*",                ],            )        },    )    right_joint_vel = RewTerm(        func=mdp.joint_vel_l2,        weight=-0.0001,        params={            "asset_cfg": SceneEntityCfg(                "robot",                joint_names=[                    "right_joint.*",                ],            )        },    )@configclassclass TerminationsCfg:    """Termination terms for the MDP."""    time_out = DoneTerm(func=mdp.time_out, time_out=True)@configclassclass CurriculumCfg:    """Curriculum terms for the MDP."""    action_rate = CurrTerm(        func=mdp.modify_reward_weight,        params={"term_name": "action_rate", "weight": -0.005, "num_steps": 4500},    )    left_joint_vel = CurrTerm(        func=mdp.modify_reward_weight,        params={"term_name": "left_joint_vel", "weight": -0.001, "num_steps": 4500},    )    right_joint_vel = CurrTerm(        func=mdp.modify_reward_weight,        params={"term_name": "right_joint_vel", "weight": -0.001, "num_steps": 4500},    )### Environment configuration##@configclassclass Dual_NeroReachEnvCfg(ManagerBasedRLEnvCfg):    """Configuration for the reach end-effector pose tracking environment."""    # Scene settings    scene: Dual_NeroReachSceneCfg = Dual_NeroReachSceneCfg(num_envs=4096, env_spacing=2.5)    # Basic settings    observations: ObservationsCfg = ObservationsCfg()    actions: ActionsCfg = ActionsCfg()    commands: CommandsCfg = CommandsCfg()    # MDP settings    rewards: RewardsCfg = RewardsCfg()    terminations: TerminationsCfg = TerminationsCfg()    events: EventCfg = EventCfg()    curriculum: CurriculumCfg = CurriculumCfg()    def __post_init__(self):        """Post initialization."""        # general settings        self.decimation = 2        self.sim.render_interval = self.decimation        self.episode_length_s = 24.0        self.viewer.eye = (3.5, 3.5, 3.5)        # simulation settings        self.sim.dt = 1.0 / 60.0\`\`\`**### 4.3 Registering the Environment** Register the task inside:\`src/isaac_so_arm101/tasks/reach/\__init_\_.py\`\`\`\` PYTHONgym.register(    id="Isaac-Dual-Nero-Reach-v0",    entry_point="isaaclab.envs:ManagerBasedRLEnv",    kwargs={        "env_cfg_entry_point":f"{\__name_\_}.dual_nero_joint_pos_env_cfg:Dual_Nero_ReachEnvCfg",        "rsl_rl_cfg_entry_point": f"{agents.\__name_\_}.rsl_rl_ppo_cfg:ReachPPORunnerCfg",        "rl_games_cfg_entry_point": f"{agents.\__name_\_}:rl_games_ppo_cfg.yaml",    },    disable_env_checker=True,)

Training the Reach PolicyStep 1.Activate the Isaac Lab Environment

conda activate env_isaaclab

Step 2.Navigate to the project directory:

cd isaac_so_arm101

Step 3.Train the whole project

Option 1:RSL-RL

Train:

uv run train \    --task Isaac-Dual-Nero-Reach-v0 \    --headless

Evaluate:

uv run play \    --task Isaac-Dual-Nero-Reach-v0

Training result:

dual_nero_rsl_rl.gif

Option 2: RL-Games

Train:

python3 scripts/rl_games/train.py \    --task Isaac-Dual-Nero-Reach-v0 \    --headless

Evaluate:

python3 scripts/rl_games/play.py \    --task Isaac-Dual-Nero-Reach-v0

Training result:

dual_nero_rl_games.gif

Both frameworks successfully learn the dual-arm reaching task.

However, in our experiments:

RL-Games converges faster
Motion trajectories appear smoother
Final reaching accuracy is generally higher

For relatively complex robot morphologies such as dual-arm manipulators, RL-Games currently provides more stable performance and is recommended as the default training backend.

FAQQ1: Why is my Dual-Nero robot collapsing or shaking violently after the URDF?

This is usually caused by incorrect actuator parameters or unrealistic inertial properties.

Common causes include:

Joint stiffness set too high
Damping values too low
Incorrect mass distribution in the URDF
Self-collision configuration issues
Unstable simulation timestep

Before starting RL training, verify that the robot can remain stable under gravity using only PD control.

Quick check: If the robot cannot stand still without RL, the issue is likely in the robot model rather than the training algorithm.

Q2: Why does the reward improve, but the robot never reaches the target accurately?

A rising reward does not always indicate successful task completion.

Typical reasons include:

Reward weights are unbalanced
Orientation rewards dominate position rewards
Action penalties are too strong
Command sampling range is too large
End-effector link is incorrectly configured

In most reach tasks, incorrect reward shaping is the primary reason for poor final accuracy.

Q3: How do I verify that the end-effector link is configured correctly?

One of the most common mistakes in Isaac Lab reach tasks is assigning the wrong end-effector body.

For Dual-Nero, the target link should be:

left_gripper_baseright_gripper_base

Symptoms of an incorrect configuration include:

Reward remains low
Robot moves randomly
Training appears to converge but fails visually
End-effector does not move toward the target marker

Always verify the body name in Isaac Sim before launching large-scale training.

Q4: Why does RL-Games perform better than RSL-RL for this task?

Both frameworks are PPO-based, but their implementations differ.

For large-scale manipulation environments:

RL-Games generally scales better with thousands of parallel environments

PPO updates are often more stable

Training throughput is higher on modern GPUs

For Dual-Nero reach experiments, RL-Games typically achieves smoother trajectories and faster convergence.

However, results may vary depending on reward design and task complexity.

Q5: My policy works in simulation but fails on the real robot. Why?

This is the most common Sim-to-Real issue.

Possible causes include:

Joint friction mismatch
Encoder noise
Latency differences
Payload variations
Inaccurate motor models
Unmodeled cable effects

To improve transfer performance:

Apply domain randomization
Add observation noise
Randomize dynamics parameters
Validate trajectories at low speed first

Successful simulation training is only the first step toward real-world deployment.

📷 Have Question?If you encounter any issues with environment installation, parameter configuration, or RL training, feel free to leave your questions for further discussion.

[Read more](javascript:void(0))

source & further reading

hackster.io — original article A Robot Arm with Eyes, Ears & Brain — Runs on MacBook RK3576 + Hailo-8: 12x AI Performance Boost Edge AI Power 6: Measuring the Power Efficiency of MemryX

Train a Dual-Arm Nero Reach Task in Isaac Lab

Run your AI side-project on zahid.host