IRIS: A Real-World Benchmark for Inverse Recovery and Identification of Physical Dynamic Systems from Monocular Video

Rasul Khanbayov, Mohamed Rayan Barhdadi, Dr. Erchin Serpedin, Dr. Hasan Kurban

We introduce IRIS — a real-world benchmark of 220 high-resolution videos across 8 physical dynamics classes, each with precisely measured ground-truth parameters. IRIS enables standardized evaluation of unsupervised physical parameter estimation from monocular video, covering single and multi-body systems.

Dropping Ball

Falling Ball

Sliding Cone

Pendulum

Rotation

Hitting Cones

Two Moving Pendulums

Two Pendulums (One Static)

Abstract

Existing methods for unsupervised physical parameter estimation from video lack standardized evaluation protocols and rely on non-overlapping synthetic datasets or limited real-world data. We address this gap by introducing IRIS, a new benchmark comprising 220 high-resolution videos capturing both single and multi-body physical dynamics with measured ground-truth parameters. IRIS establishes evaluation criteria spanning parameter accuracy, identifiability, extrapolation, robustness, and equation-family selection. We test multiple baseline approaches — including physics-informed loss functions and four equation-identification strategies — and release the dataset, annotations, evaluation toolkit, and all baseline implementations publicly.

Key Contributions

📹

220-Video Real-World Benchmark

High-resolution videos of single and multi-body physical systems with precisely measured ground-truth parameters — not estimated from simulations.

📐

Standardized Evaluation Protocol

Evaluation criteria for parameter accuracy, identifiability, extrapolation capability, robustness to noise, and equation-family selection accuracy.

🧠

Equation-Family Selection

Four strategies for automatically identifying which ODE governs the observed dynamics — including VLM-based and classifier-based approaches — benchmarked head-to-head.

🔬

Physics-Informed Baselines

Full baseline implementations with 1-step and multi-step physics loss, unified single/multi-body model, and a corrected gradient-compatible Euler integrator.

The IRIS Dataset

220

Real-World Videos

Dynamics Classes

Measured Parameters

High Resolution

Dynamics Class	Physical System	Estimated Parameters
dropping_ball	Ball released from rest under gravity	gravitational acceleration g
falling_ball	Projectile / free-falling ball	gravitational acceleration g
sliding_cone	Cone sliding on an inclined surface	friction coefficient μ
pendulum	Single pendulum oscillation	rope length, damping coefficient
rotation	Rotating object (fixed camera)	camera-to-object distance
hitting_cones	Collision between cones	friction / restitution params
two_moving_pendulums	Two independently swinging pendulums	rope lengths, damping
two_moving_pendulum_one_static	Two-pendulum system, one at rest	rope lengths, coupling

Download on Hugging Face

Method Overview

The pipeline operates in two stages without requiring manual labelling or prior knowledge of the governing ODE.

🔍

Stage 1 — Equation-Family Selection

A Vision-Language Model (VLM) or a fine-tuned ResNet-18 video classifier watches the video frames and selects the governing ODE from the IRIS library. Four strategies are benchmarked: path-based keywords, basic VLM, enhanced VLM (5-frame temporal reasoning), and the fine-tuned classifier — which achieves 100% accuracy on the evaluation set.

⚙️

Stage 2 — Parameter Estimation

An MLP encoder maps video frames into a latent state. A physics block rolls out the selected ODE and optimizes physical parameters (α, β) by minimizing a multi-step physics loss over rollout horizons 1–5. A unified graph-structured model handles both single-body and multi-body systems with symplectic or corrected Euler integration.

BibTeX

@misc{khanbayov2026iris,
  title         = {{IRIS}: A Real-World Benchmark for Inverse Recovery and
                   Identification of Physical Dynamic Systems from Monocular Video},
  author        = {Rasul Khanbayov and Mohamed Rayan Barhdadi and
                   Erchin Serpedin and Hasan Kurban},
  year          = {2026},
  eprint        = {2603.16432},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url           = {https://arxiv.org/abs/2603.16432},
}