Neural Atlas Graphs
for Dynamic Scene Decomposition and Editing

NeurIPS 2025 (spotlight)

Neural Atlas Graphs (NAGs) are a hybrid 2.5D representation for high-resolution and editable dynamic scenes. Features include: counterfactual autonomous driving (texturized background), quantitative quality comparisons (vs. OmniRe/EmerNeRF), and robust 3D scene manipulation (decomposition, car manipulation). Crucially, appearance changes are applied via single-frame texture editing, maintaining view-consistent temporal coherence.

Abstract

Learning editable high-resolution scene representations for dynamic scenes is an open problem with applications across the domains from autonomous driving to creative editing - the most successful approaches today make a trade-off between editability and supporting scene complexity: neural atlases represent dynamic scenes as two deforming image layers, foreground and background, which are editable in 2D, but break down when multiple objects occlude and interact. In contrast, scene graph models make use of annotated data such as masks and bounding boxes from autonomous‑driving datasets to capture complex 3D spatial relationships, but their implicit volumetric node representations are challenging to edit view-consistently. We propose Neural Atlas Graphs (NAGs), a hybrid high-resolution scene representation, where every graph node is a view‑dependent neural atlas, facilitating both 2D appearance editing and 3D ordering and positioning of scene elements. Fit at test‑time, NAGs achieve state‑of‑the‑art quantitative results on the Waymo Open Dataset - by 5 dB PSNR increase compared to existing methods - and make environmental editing possible in high resolution and visual quality - creating counterfactual driving scenarios with new backgrounds and edited vehicle appearance. We find that the method also generalizes beyond driving scenes and compares favorably - by more than 7 dB in PSNR - to recent matting and video editing baselines on the DAVIS video dataset with a diverse set of human and animal-centric scenes.

Neural Atlas Graphs

Neural Atlas Graphs (NAG) are a hybrid scene representation that decomposes a dynamic scene into a graph of moving planes in 3D space, supporting 3D ordering and flexible 2D texture editing, while simultaneously maintaining view-consistent temporal coherence.

Figure: Neural Atlas Graphs Concept

Neural Atlas Graphs (NAGs) Concept. A NAG represents dynamic scenes as a graph of moving 3D planes (one per object/background). Each plane acts as an editable, view-dependent Neural Atlas encoding appearance and flow $f_i$ along a learned trajectory $g_i$ within dedicated neural fields, enabling view-consistent rendering via depth-ordered ray casting.

Texture Edits

A key advantage of decomposing the dynamic scene into editable Neural Atlas nodes is the ease of direct appearance modification. Unlike implicit volumetric representations which are challenging to edit, our method allows us to treat each atlas node as a 2D texture map for foreground or background elements.
We showcase the system's capacity to perform complex, realistic texture transfers and color alterations on major environmental components. Given a new texture for one frame in the input video, we propagate this information to the corresponding Neural Atlas node. The texture is then accurately projected and deformed across the object's planar surface for all novel views, thanks to the learned planar flow. This fine-grained control allows for both illustrative edits, such as simply recoloring the swan to a white or rainbow color, and more practical, realistic edits, such as putting speed limits or traffic control instructions directly onto streets. This capability is especially valuable for creating counterfactual driving scenarios and testing the safety and robustness of autonomous driving simulators.

Qualitative Comparisons

We evaluate our NAGs qualitatively against four state-of-the-art baselines. To demonstrate our method's versatility, we utilize two distinct datasets for our comparisons. For automotive scenes with many actors, we use a subset of the Waymo Open Dataset. Here, we compare our approach with OmniRe, a recent 3D Gaussian Splatting method, and EmerNeRF, a Neural Radiance Field method.
For single-object videos, we evaluate on the DAVIS dataset, where we provide comparisons against OmnimatteRF and Layered Neural Atlases. As the videos below illustrate, our approach is able to capture fine, high-resolution details in both dynamic automotive scenarios and single-object scenes.
Our NAG formulation precisely models complex elements such as spinning wheels, pedestrian motion, and reflections, even under rapid movement. The following carousels showcase these qualitative differences, with each video segment representing a single reconstruction from the respective dataset.

Automotive Scenes - Waymo Open Dataset

Object-centric Scenes - DAVIS Dataset

Scene Decomposition

By individually rendering the Atlas Nodes in our NAG, we can achieve a detailed scene decomposition. For the object-centric videos, we refer to them as foreground and background. We can also qualitatively compare these against the OmnimatteRF and Layered Neural Atlas baselines. As the videos below demonstrate, our NAG shows similar decomposition performance to these methods while capturing an increased level of fine detail. Yet, some flickering may remain due to improper positional initialization and flow discontinuities as mentioned within our limitations.

BibTeX

@article{Schneider2025NAG,
  author    = {Jan Philipp Schneider and
              Pratik Singh Bisht and
              Ilya Chugunov and
              Andreas Kolb and
              Michael Moeller and
              Felix Heide},
  title     = {Neural Atlas Graphs for Dynamic Scene Decomposition and Editing},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  volume    = {38},
  url       = {https://neurips.cc/virtual/2025/poster/115926},
}