TL;DR: We present a unified framework for predicting forward/inverse/partial PDE soltuions using a video inpainting diffusion model.
Flexible PDE solution predictions. From sparse spatiotemporal observations (left), our method can predict future/past and reconstruct the full field solutions more flexibly and accurately compared to existing state-of-the-art methods, e.g., PINO (Li et. al.).
VideoPDE pipeline. We cast PDE solving as a video inpainting task. Our Hierarchical Video Diffusion Transformer (HV-DiT) denoises initial noise into a full video, conditioned on pixel-level sparse measurements. Its ability to handle arbitrary input patterns enables flexible application to diverse PDE scenarios, including forward, inverse, and continuous measurement tasks.
Conceptual comparison of PDE-solving methods. Neural operator methods struggle with partial inputs. Only PINN and VideoPDE handle forward, inverse, and continuous measurements flexibly. Generative baselines focus on reconstructing one or two frames (instead of dense temporal frames) and are often not designed for forward prediction, where VideoPDE excels. The forward error is measured on the Navier-Stokes dataset.
@article{li2025videopde,
author = {Edward Li and Zichen Wang and Jiahe Huang and Jeong Joon Park},
title = {VideoPDE: Unified Generative PDE Solving via Video},
journal = {arXiV},
year = {2025},
}