IROS 2026

Geometric Entropy: When Trajectory Diversity Helps and Hurts in Imitation Learning

Qian Luo1,2, Ruizhe Liu1, Pei Zhou1,2, Xunzhe Zhou1,2, and Yanchao Yang1,2

1The University of Hong Kong 2Transcengram

Paper PDF arXiv Videos Code coming soon
Trajectory manifolds expanding from low diversity to balanced diversity and high ambiguity.
Geometric diversity helps only up to the learner's current tolerance for strategy ambiguity.

Abstract

We study how trajectory-shape diversity in demonstrations affects imitation learning performance across models, tasks, and data scales. We introduce HG, a task-agnostic metric that quantifies intrinsic diversity of transit trajectories after normalizing away extrinsic variation such as goal pose and workspace scale. Across multiple IL architectures and both simulated and real-robot contact-rich manipulation tasks, success follows an inverted-U pattern: increasing geometric diversity improves robustness in low-diversity regimes, but degrades performance once diversity induces strategy ambiguity.

The optimal entropy shifts lower as task mastery increases through more data, easier tasks, or stronger model priors. For a pretrained VLA, the trend becomes effectively monotonic decreasing, suggesting that HG can act as a fast pre-training audit for calibrating demonstrations toward the learnable regime.

Core Idea

More diverse demonstrations help only while the policy can disambiguate them.

Metric

Geometric Entropy

HG measures trajectory-shape diversity after target-frame alignment, focusing on motion geometry rather than extrinsic pose.

Definition

Task mastery

Task mastery is the learner's current proficiency on a task, shaped by model capacity, dataset size, task difficulty, and pretrained priors.

Low mastery

Diversity helps

Additional geometric coverage improves recovery when demonstrations are too narrow.

High mastery

Diversity hurts

Once the motion corridor is already learnable, extra modes mostly add ambiguity.

Simulation Evidence

Success peaks where diversity is useful but still learnable.

StackCube illustrates the inverted-U effect, while PegInsertion reaches a higher-mastery regime where added diversity turns into interference earlier.

StackCube: inverted-U

Low entropy is brittle, moderate entropy improves robustness, and high entropy mixes incompatible strategies.

Diffusion Policy StackCube success rate versus Geometric Entropy.
Diffusion Policy on StackCube.

PegInsertion: high mastery

Consistent low-entropy demonstrations already cover the narrow insertion corridor, so extra geometric variation primarily adds interference.

Diffusion Policy PegInsertion success rate versus Geometric Entropy.
Diffusion Policy on PegInsertion.
Stack low · fail

Too narrow

Stack mid · success

Best trade-off

Stack high · fail

Too ambiguous

Peg low · success

Consistent

Peg high · fail

Interference

Model Priors

π0.5 shifts the useful diversity range left.

We fine-tune the Physical Intelligence π0.5 VLA on StackCube-v1. Its pretrained priors induce a high-mastery regime where added trajectory-shape diversity becomes redundant or conflicting.

pi0.5 VLA StackCube success rate versus Geometric Entropy.

Fast Audit

HG stabilizes with practical sample sizes.

The metric converges quickly enough to audit a pilot collection before training a policy, making it useful for data curation rather than only post-hoc analysis.

Geometric Entropy convergence curves on StackCube.

Real Robot Validation

The same entropy-mastery behavior appears on hardware.

ACT policies are evaluated on an ARX arm across StackCube, PlacePanda, and OpenDrawer.

Real robot success rate versus Geometric Entropy for StackCube, PlacePanda, and OpenDrawer.
Real-world success rate versus HG.
ARX

StackCube

ARX

PlacePanda

ARX

OpenDrawer

Metric Robustness

Common diversity proxies miss the geometry that matters.

Mean variance, covariance volume, effective rank, and local-density entropy can alias distinct strategy organizations or drift with sample size. HG combines shape-mode dimensionality with spread in aligned trajectory space.

Comparison of baseline diversity metrics on StackCube.
Baseline metrics can mis-order learning-relevant trajectory diversity.

Citation

BibTeX

@misc{luo2026geometricentropy,
  title     = {Geometric Entropy: When Trajectory Diversity Helps and Hurts in Imitation Learning},
  author    = {Qian Luo and Ruizhe Liu and Pei Zhou and Xunzhe Zhou and Yanchao Yang},
  year      = {2026},
  eprint    = {2606.20871},
  archivePrefix = {arXiv},
  primaryClass  = {cs.RO},
  url       = {https://arxiv.org/abs/2606.20871},
  note      = {Accepted to IROS 2026}
}