Demonstrations are captured using an OptiTrack motion capture system (10 cameras, 24 markers) mapped to a kinematic hand model inside an Unreal Engine VR environment. At every timestep we record the tuple (Bt, Vt, Ct) — hand body pose, object pose, and the full contact set. Each contact in Ct contains a surface location c ∈ ℝ³ and a force vector f ∈ ℝ³. Collecting in simulation yields precise, noise-free contact measurements that are difficult to obtain in the real world.
The goal is to learn a robot control policy π(at | ot) that achieves functionally equivalent object interactions, using the rich contact and force information captured during human demonstration.
Human demonstrations across the six manipulation tasks, captured in VR with contact force heatmaps overlaid on the hand.
Human demonstrations of the manipulation tasks performed in the VR environment.
Contact information must propagate along the hand surface, not through free space — Euclidean distance would incorrectly couple physically unrelated regions. We measure surface distances using mesh geodesics and diffuse contact forces over the hand mesh with a geodesic heat kernel: each vertex accumulates a heat value weighted by nearby contact force magnitudes and geodesic proximity.
Heat values are summed within each skeleton finger region to estimate the total contact load per finger. Robot fingers are then allocated by solving a minimax assignment — distributing load as evenly as possible across the available robot fingers, allowing many-to-one mappings that remain fixed throughout execution.
During execution each soft finger tracks its assigned human fingertip while incorporating local contact geometry. Each demonstrated contact contributes a geodesic-weighted influence based on surface proximity and force magnitude. The fingertip target is then adjusted by the force-weighted mean displacement toward nearby contacts, clamped to a maximum step size, producing contact-informed trajectories used for imitation learning.
VR Demo (Left) Average contact force distribution on the human hand for each task. Warmer colors indicate higher contact load. (Right) Representative demonstration frames from the VR environment.
We use a custom non-anthropomorphic pneumatic soft robot hand with four three-chambered fingers arranged in a square configuration. Each finger is fabricated from elastomer with embedded rigid rings, and bends via differential pressurization of its three chambers — producing planar constant-curvature deformation. The hand mounts on a 7-DoF robot arm and is driven by a pneumatic pressure controller.
Single Finger Module: elastomer body, rigid rings, and three-chamber pressurization.
Simulation model: virtual spine structure with torque-driven actuation used for policy training.
Pressure–deformation relationships in soft fingers are highly nonlinear and vary across the workspace. We learn a per-finger forward kinematic MLP mapping chamber pressures to fingertip displacement, then invert it at runtime via gradient descent to find the pressures that achieve a desired tip position. Actuator limits are enforced with a sigmoid reparameterization, and each solve is warm-started from the previous solution. This yields significantly more accurate tracking than direct prediction baselines.
Select a task and a retargeting stage to see the 3D scene at the key frame with maximum contact density. Drag to orbit · Scroll to zoom.
Demo Human Key Frame
Human hand point cloud at the timestep with peak contact density. Color encodes contact force magnitude from the geodesic heat kernel htv. Bright yellow markers show individual contact force locations and their magnitudes.
Stage 1 Finger Assignment
Dashed colored lines connect human fingertips to their assigned robot fingers. Each color group shows which human finger regions are covered by a single robot finger, based on the minimax load-balancing optimization.
Stage 2 Contact Refinement
Colored arrows show the geodesic-weighted adjustment δi applied to each robot fingertip. The arrow direction and magnitude reflect the contact-informed correction that aligns the robot's contact surface with the demonstrated contact geometry.
Full SoftAct Final Policy
The soft robot hand in its final retargeted configuration, ready for policy execution. The three-chambered soft fingers (shown with chamber color coding) are positioned to reproduce the functional intent of the human demonstration despite extreme morphological mismatch.
Retargeting stages. From recorded contact forces on the demonstrator hand (left), through kinematic retargeting with soft fingers, to the contact-informed SoftAct refinement and final real-robot execution (right).
Low-Level Trajectory Tracking. SoftAct achieves the lowest RMSE (2.28 mm) across planar reference trajectories compared to Direct KNN, MLP, and Linear baselines.
Translational (m) and rotational (°) error vs. ground-truth demonstrations. Lower is better. Bold = best per task.
| Task | Method | Pos. (m) | Rot. (°) |
|---|---|---|---|
| Light Bulb Insertion | |||
| Kinematic | 1.52 ± 0.61 | 11.8 ± 3.4 | |
| SoftAct-stage 2 only | 0.97 ± 0.42 | 25.1 ± 7.9 | |
| SoftAct (Ours) | 0.48 ± 0.21 | 12.4 ± 8.1 | |
| Light Bulb Twisting | |||
| Kinematic | 0.05 ± 0.07 | 18.2 ± 6.5 | |
| SoftAct-stage 2 only | 0.07 ± 0.03 | 9.4 ± 3.4 | |
| SoftAct (Ours) | 0.02 ± 0.005 | 2.9 ± 1.3 | |
| Cup Pouring | |||
| Kinematic | 1.29 ± 0.50 | 12.5 ± 4.1 | |
| SoftAct-stage 2 only | 0.81 ± 0.35 | 6.4 ± 2.2 | |
| SoftAct (Ours) | 0.51 ± 0.21 | 3.5 ± 1.3 | |
| Marker Grasping | |||
| Kinematic | 0.97 ± 0.41 | 14.3 ± 5.0 | |
| SoftAct-stage 2 only | 0.56 ± 0.25 | 7.3 ± 2.5 | |
| SoftAct (Ours) | 0.31 ± 0.13 | 3.9 ± 1.5 | |
| Bottle Unscrewing | |||
| Kinematic | 0.26 ± 0.11 | 21.3 ± 7.2 | |
| SoftAct-stage 2 only | 0.11 ± 0.05 | 11.1 ± 3.8 | |
| SoftAct (Ours) | 0.07 ± 0.12 | 6.7 ± 2.6 | |
| Box Reorienting | |||
| Kinematic | 1.34 ± 0.54 | 17.6 ± 6.0 | |
| SoftAct-stage 2 only | 0.89 ± 0.37 | 9.0 ± 3.1 | |
| SoftAct (Ours) | 0.62 ± 0.26 | 5.1 ± 2.0 | |
Real-world rollouts. SoftAct policy executing paper cup pouring, box reorientation, and light bulb screwing on the physical soft robot hand. Zero-shot transfer from simulation.
Qualitative comparison. Task progression frames for (top) the virtual human demonstration, (middle) kinematic retargeting baseline, and (bottom) SoftAct. SoftAct achieves more natural contact configurations that better replicate the functional intent of the human demonstration.
30 rollouts (simulation) / 20 rollouts (real world) per method per task. Higher is better.
| Task | Domain | SoftAct | Kinematic Baseline |
|---|---|---|---|
| Paper Cup Pouring | Simulation | 90% | 40% |
| Light Bulb Insertion | Simulation | 47% | 0% |
| Marker Grasping | Simulation | 97% | 73% |
| Bottle Unscrewing | Simulation | 80% | 47% |
| Box Reorienting | Simulation | 90% | 30% |
| Light Bulb Screwing | Simulation | 100% | 77% |
| Paper Cup Pouring | Real World | 85% | 35% |
| Light Bulb Screwing | Real World | 95% | 30% |
| Box Reorienting | Real World | 70% | 10% |
@article{softact2026,
title={Functional Force-Aware Retargeting from Virtual Human Demos to Soft Robot Policies},
author={Yoo, Uksang and Zhu, Mengjia and Pezent, Evan and Preechayasomboon, Jom
and Oh, Jean and Ichnowski, Jeffrey and Memar, Amir and Abbatematteo, Ben
and Bharadhwaj, Homanga and Deshpande, Ashish and Prahlad, Harsha},
year={2026},
eprint={2604.01224},
archivePrefix={arXiv},
}