IRIS: A Real-World Benchmark for Inverse Recovery and Identification of Physical Dynamic Systems from Monocular Video

We introduce IRIS — a real-world benchmark of 220 high-resolution videos across 8 physical dynamics classes, each with precisely measured ground-truth parameters. IRIS enables standardized evaluation of unsupervised physical parameter estimation from monocular video, covering single and multi-body systems.

Dropping Ball
Dropping Ball
Falling Ball
Falling Ball
Sliding Cone
Sliding Cone
Pendulum
Pendulum
Rotation
Rotation
Hitting Cones
Hitting Cones
Two Moving Pendulums
Two Moving Pendulums
Two Pendulums (One Static)
Two Pendulums (One Static)

Abstract

Existing methods for unsupervised physical parameter estimation from video lack standardized evaluation protocols and rely on non-overlapping synthetic datasets or limited real-world data. We address this gap by introducing IRIS, a new benchmark comprising 220 high-resolution videos capturing both single and multi-body physical dynamics with measured ground-truth parameters. IRIS establishes evaluation criteria spanning parameter accuracy, identifiability, extrapolation, robustness, and equation-family selection. We test multiple baseline approaches — including physics-informed loss functions and four equation-identification strategies — and release the dataset, annotations, evaluation toolkit, and all baseline implementations publicly.

Key Contributions

📹

220-Video Real-World Benchmark

High-resolution videos of single and multi-body physical systems with precisely measured ground-truth parameters — not estimated from simulations.

📐

Standardized Evaluation Protocol

Evaluation criteria for parameter accuracy, identifiability, extrapolation capability, robustness to noise, and equation-family selection accuracy.

🧠

Equation-Family Selection

Four strategies for automatically identifying which ODE governs the observed dynamics — including VLM-based and classifier-based approaches — benchmarked head-to-head.

🔬

Physics-Informed Baselines

Full baseline implementations with 1-step and multi-step physics loss, unified single/multi-body model, and a corrected gradient-compatible Euler integrator.

The IRIS Dataset

220
Real-World Videos
8
Dynamics Classes
GT
Measured Parameters
HD
High Resolution
Dynamics Class Physical System Estimated Parameters
dropping_ball Ball released from rest under gravity gravitational acceleration g
falling_ball Projectile / free-falling ball gravitational acceleration g
sliding_cone Cone sliding on an inclined surface friction coefficient μ
pendulum Single pendulum oscillation rope length, damping coefficient
rotation Rotating object (fixed camera) camera-to-object distance
hitting_cones Collision between cones friction / restitution params
two_moving_pendulums Two independently swinging pendulums rope lengths, damping
two_moving_pendulum_one_static Two-pendulum system, one at rest rope lengths, coupling

Method Overview

The pipeline operates in two stages without requiring manual labelling or prior knowledge of the governing ODE.

🔍

Stage 1 — Equation-Family Selection

A Vision-Language Model (VLM) or a fine-tuned ResNet-18 video classifier watches the video frames and selects the governing ODE from the IRIS library. Four strategies are benchmarked: path-based keywords, basic VLM, enhanced VLM (5-frame temporal reasoning), and the fine-tuned classifier — which achieves 100% accuracy on the evaluation set.

⚙️

Stage 2 — Parameter Estimation

An MLP encoder maps video frames into a latent state. A physics block rolls out the selected ODE and optimizes physical parameters (α, β) by minimizing a multi-step physics loss over rollout horizons 1–5. A unified graph-structured model handles both single-body and multi-body systems with symplectic or corrected Euler integration.

BibTeX

@misc{khanbayov2026iris,
  title         = {{IRIS}: A Real-World Benchmark for Inverse Recovery and
                   Identification of Physical Dynamic Systems from Monocular Video},
  author        = {Rasul Khanbayov and Mohamed Rayan Barhdadi and
                   Erchin Serpedin and Hasan Kurban},
  year          = {2026},
  eprint        = {2603.16432},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2603.16432},
}