Neural Atlas Graphs
for Dynamic Scene Decomposition and Editing
NeurIPS 2025 (spotlight)
Abstract
Learning editable high-resolution scene representations for dynamic scenes is an open problem with applications across the domains from autonomous driving to creative editing - the most successful approaches today make a trade-off between editability and supporting scene complexity: neural atlases represent dynamic scenes as two deforming image layers, foreground and background, which are editable in 2D, but break down when multiple objects occlude and interact. In contrast, scene graph models make use of annotated data such as masks and bounding boxes from autonomous‑driving datasets to capture complex 3D spatial relationships, but their implicit volumetric node representations are challenging to edit view-consistently. We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view‑dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. Fit at test‑time, NAGs achieve state‑of‑the‑art quantitative results on the Waymo Open Dataset - by 5 dB PSNR increase compared to existing methods - and make environmental editing possible in high resolution and visual quality - creating counterfactual driving scenarios with new backgrounds and edited vehicle appearance. We find that the method also generalizes beyond driving scenes and compares favorably - by more than 7 dB in PSNR - to recent matting and video editing baselines on the DAVIS video dataset with a diverse set of human and animal-centric scenes.
Neural Atlas Graphs
Neural Atlas Graphs (NAG) are a hybrid scene representation that decomposes a dynamic scene into a graph of moving planes in 3D space, supporting 3D ordering and flexible 2D texture editing, while simultaneously maintaining view-consistent temporal coherence.
Neural Atlas Graphs (NAGs) Concept. A NAG represents dynamic scenes as a graph of moving 3D planes (one per object/background). Each plane acts as an editable, view-dependent Neural Atlas encoding appearance and flow $f_i$ along a learned trajectory $g_i$ within dedicated neural fields, enabling view-consistent rendering via depth-ordered ray casting.
Texture Edits
A key advantage of decomposing the dynamic scene into editable
Neural Atlas nodes is the ease of direct appearance
modification. Unlike implicit volumetric representations which
are challenging to edit, our method allows us to treat each
atlas node as a 2D texture map for foreground or background
elements.
We showcase the system's capacity to perform complex, realistic
texture transfers and color alterations on major environmental
components. Given a new texture for one frame in the input
video, we propagate this information to the corresponding Neural
Atlas node. The texture is then accurately projected and
deformed across the object's planar surface for all novel views,
thanks to the learned planar flow. This fine-grained control
allows for both illustrative edits, such as simply recoloring
the swan to a white or rainbow color, and more practical,
realistic edits, such as putting speed limits or traffic control
instructions directly onto streets. This capability is
especially valuable for creating counterfactual driving
scenarios and testing the safety and robustness of autonomous
driving simulators.
Ground Truth
White Edit
Rainbow Edit
Ground Truth
Red Edit
Rainbow Edit
Ground Truth
Edited Scene
Ground Truth
Edited Scene
Qualitative Comparisons
We evaluate our NAGs qualitatively against four state-of-the-art
baselines. To demonstrate our method's versatility, we utilize
two distinct datasets for our comparisons. For automotive scenes
with many actors, we use a subset of the Waymo Open Dataset.
Here, we compare our approach with
OmniRe, a recent 3D
Gaussian Splatting method, and
EmerNeRF, a Neural
Radiance Field method.
For single-object videos, we evaluate on the DAVIS dataset,
where we provide comparisons against
OmnimatteRF and
Layered Neural Atlases. As the videos below illustrate, our approach is able to
capture fine, high-resolution details in both dynamic automotive
scenarios and single-object scenes.
Our NAG formulation precisely models complex elements such as
spinning wheels, pedestrian motion, and reflections, even under
rapid movement. The following carousels showcase these
qualitative differences, with each video segment representing a
single reconstruction from the respective dataset.
Automotive Scenes - Waymo Open Dataset
Ground Truth
Ours
OmniRe
EmerNeRF
Ground Truth
Ours
OmniRe
EmerNeRF
Ground Truth
Ours
OmniRe
EmerNeRF
Ground Truth
Ours
OmniRe
EmerNeRF
Ground Truth
Ours
OmniRe
EmerNeRF
Ground Truth
Ours
OmniRe
EmerNeRF
Ground Truth
Ours
OmniRe
EmerNeRF
Object-centric Scenes - DAVIS Dataset
Ground Truth
Ours
OmnimatteRF
Layered Neural Atlases
Ground Truth
Ours
OmnimatteRF
Layered Neural Atlases
Ground Truth
Ours
OmnimatteRF
Layered Neural Atlases
Ground Truth
Ours
OmnimatteRF
Layered Neural Atlases
Scene Decomposition
By individually rendering the Atlas Nodes in our NAG, we can achieve a detailed scene decomposition. For the object-centric videos, we refer to them as foreground and background. We can also qualitatively compare these against the OmnimatteRF and Layered Neural Atlas baselines. As the videos below demonstrate, our NAG shows similar decomposition performance to these methods while capturing an increased level of fine detail. Yet, some flickering may remain due to improper positional initialization and flow discontinuities as mentioned within our limitations.
Ours Foreground
Ours Background
OmnimatteRF Foreground
OmnimatteRF Background
Layered Neural Atlases Foreground
Layered Neural Atlases Background
Ours Foreground
Ours Background
OmnimatteRF Foreground
OmnimatteRF Background
Layered Neural Atlases Foreground
Layered Neural Atlases Background
Ours Foreground
Ours Background
OmnimatteRF Foreground
OmnimatteRF Background
Layered Neural Atlases Foreground
Layered Neural Atlases Background
Ours Foreground
Ours Background
OmnimatteRF Foreground
OmnimatteRF Background
Layered Neural Atlases Foreground
Layered Neural Atlases Background
BibTeX
@article{Schneider2025NAG,
author = {Jan Philipp Schneider and
Pratik Singh Bisht and
Ilya Chugunov and
Andreas Kolb and
Michael Moeller and
Felix Heide},
title = {Neural Atlas Graphs for Dynamic Scene Decomposition and Editing},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
volume = {38},
url = {https://neurips.cc/virtual/2025/poster/115926},
}