Existing methods for unsupervised physical parameter estimation from video lack standardized
evaluation protocols and rely on non-overlapping synthetic datasets or limited real-world data.
We address this gap by introducing IRIS, a new benchmark comprising 220 high-resolution
videos capturing both single and multi-body physical dynamics with measured ground-truth
parameters. IRIS establishes evaluation criteria spanning parameter accuracy, identifiability,
extrapolation, robustness, and equation-family selection. We test multiple baseline approaches —
including physics-informed loss functions and four equation-identification strategies — and release
the dataset, annotations, evaluation toolkit, and all baseline implementations publicly.