VideoPDE: Unified Generative PDE Solving via Video Inpainting Diffusion Models

University of Michigan, Ann Arbor
*Denotes Equal Contribution

TL;DR: We present a unified framework for predicting forward/inverse/partial PDE soltuions using a video inpainting diffusion model.

Input Video (1% pixels)
PINO
Ours
GT

Flexible PDE solution predictions. From sparse spatiotemporal observations (left), our method can predict future/past and reconstruct the full field solutions more flexibly and accurately compared to existing state-of-the-art methods, e.g., PINO (Li et. al.).

VideoPDE pipeline. We cast PDE solving as a video inpainting task. Our Hierarchical Video Diffusion Transformer (HV-DiT) denoises initial noise into a full video, conditioned on pixel-level sparse measurements. Its ability to handle arbitrary input patterns enables flexible application to diverse PDE scenarios, including forward, inverse, and continuous measurement tasks.

Conceptual comparison of PDE-solving methods. Neural operator methods struggle with partial inputs. Only PINN and VideoPDE handle forward, inverse, and continuous measurements flexibly. Generative baselines focus on reconstructing one or two frames (instead of dense temporal frames) and are often not designed for forward prediction, where VideoPDE excels. The forward error is measured on the Navier-Stokes dataset.

Kolmogorov Flow Forward Prediction

Using our video inpainting framework, we can predict the future frames from the first frame initial condition. For the complex Kolmogorov Flow, VideoPDE performs noticeably better than prior ML-based methods.
First image
Input (First Frame)
DeepONet
FNO
PINO
Ours
GT

Kolmogorov Flow Forward Prediction from 3% Observation

Thanks to our flexible video inpainting framework, VideoPDE can predict the full field future frames from a partial pixels of the initial condition frame, 3% shown here. The SOTA in the forward modeling, PINO, is given the interpolated first frame, which performs significantly worse than our generative approach.
First image
Input (First Frame)
PINO
Ours
GT

Wave-Equation Inverse Modeling

Similarly, our unified framework allows for inverse prediction, where we predict the past from the future observations. Here, from the last frame, we accurately predict the previous frames.
First image
Input (Last Frame)
DeepONet
FNO
PINO
Ours
GT

Navier–Stokes Continuous 1% observations

In this Navier-Stokes experiments, similar to the teaser videos, 1% of the pixels provide continuous sensor readings, from which VideoPDE almost perfectly reconstructs the full field solution, noticeably better than SOTA methods for this task.
Input Video (1% pixels)
DiffusionPDE (Ext.)
Shu et al.
Zhuang et al.
Ours
GT

BibTeX

@article{li2025videopde,
    author    = {Edward Li and Zichen Wang and Jiahe Huang and Jeong Joon Park},
    title     = {VideoPDE: Unified Generative PDE Solving via Video},
    journal   = {arXiV},
    year      = {2025},
}