Comparison with RTC
We evaluate Legato and RTC across five real-world manipulation tasks under strictly controlled settings. Both methods are initialized from the same pretrained checkpoint, trained on identical datasets, and optimized with the same hyperparameters. We report task score, completion time, and three smoothness metrics. Values are mean ± standard error.
| Task | Score ↑ | Time (s) ↓ | NLDLJ ↓ | NSPARC ↓ | Overlap RMSE (×10³) ↓ | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| RTC | Legato | RTC | Legato | RTC | Legato | RTC | Legato | RTC | Legato | |
| Bowls | 8.68 ± 0.35 | 9.08 ± 0.33 | 52.88 ± 3.54 | 42.66 ± 2.68 | 36.00 ± 0.34 | 35.86 ± 0.38 | 1.82 ± 0.04 | 1.63 ± 0.02 | 6.83 ± 0.50 | 4.58 ± 0.17 |
| Pour | 9.34 ± 0.18 | 9.72 ± 0.13 | 95.07 ± 2.86 | 75.73 ± 1.51 | 39.82 ± 0.15 | 39.50 ± 0.13 | 2.85 ± 0.24 | 1.65 ± 0.08 | 7.64 ± 0.70 | 5.14 ± 0.17 |
| PickPlace | 9.47 ± 0.15 | 9.53 ± 0.12 | 35.53 ± 1.24 | 30.37 ± 0.65 | 34.42 ± 0.18 | 34.34 ± 0.14 | 2.10 ± 0.08 | 1.89 ± 0.05 | 10.17 ± 0.66 | 5.98 ± 0.40 |
| Drawer | 9.20 ± 0.16 | 9.50 ± 0.13 | 25.97 ± 0.74 | 21.80 ± 0.72 | 32.73 ± 0.13 | 28.55 ± 0.26 | 2.24 ± 0.05 | 1.99 ± 0.08 | 12.11 ± 0.66 | 11.74 ± 0.55 |
| Towel | 7.33 ± 0.62 | 8.17 ± 0.56 | 25.93 ± 0.98 | 20.00 ± 0.78 | 32.79 ± 0.20 | 32.43 ± 0.24 | 2.17 ± 0.07 | 1.97 ± 0.05 | 11.28 ± 0.55 | 6.22 ± 0.66 |
Legato consistently outperforms RTC across all tasks and metrics. It achieves shorter task completion time by suppressing spurious multimodal switching, and produces smoother trajectories as measured by NLDLJ, NSPARC, and overlap RMSE.
Comparison with Training-Time RTC
We also compare Legato with Training-Time RTC on the pour task. Values are mean ± standard error.
| Metric | Training-Time RTC | Legato |
|---|---|---|
| Score ↑ | 9.46 ± 0.16 | 9.72 ± 0.13 |
| Completion Time (s) ↓ | 81.73 ± 1.12 | 75.73 ± 1.51 |
| NSPARC ↓ | 2.46 ± 0.14 | 1.65 ± 0.08 |
| NLDLJ ↓ | 39.95 ± 0.13 | 39.50 ± 0.13 |