End-to-end Fusion3DGS
Researchers from Southwest Jiaotong University have unveiled Fusion3DGS, a new end-to-end framework that shakes up the world of 3D instance segmentation. Moving away from the industry’s heavy reliance on expensive, densely annotated 3D point clouds, this novel approach harnesses the power of 3D Gaussian Splatting (3DGS) combined with multi-view RGB images. By utilizing 2D instance masks the system optimizes a compact scene representation. This method effectively “lifts” widely available 2D data into the 3D realm, creating a geometry-aware substrate that identifies and segments objects without needing a single manual 3D label.


Traditional pipelines often struggle with the high computational cost and massive data requirements of pure 3D supervision. This framework bridges that gap by employing a “weight-sharing lock” that ties early 2D and 3D neural kernels together, ensuring spatial consistency while preventing the model from drifting. It also features an occlusion-aware cross-attention mechanism that fuses features only where they physically align, filtering out noise from blocked views. The result is a system that rivals state-of-the-art pure 3D methods on benchmarks but operates with significantly lower memory and hardware demands.
For those diving into the technical weeds, the pipeline starts with sparse point clouds from structure-from-motion (SfM). These are densified using a Voronoi-based method and refined with bilateral filtering to smooth high-frequency noise while preserving sharp object boundaries. To keep things efficient, the system groups primitives into “SuperPoints” to reduce the token count before fusion. The team also released OIRGB, a new in-house dataset for RGB-only indoor scenes, proving the model’s robustness in natural light conditions where 2D evidence often degrades. This blend of rendering-coupled optimization and lightweight architecture marks a promising step forward for autonomous navigation and robotics.
Read More: https://www.nature.com/articles/s41598-025-33840-8
Written by Adam Clark. Adam has spent the past 13 years exploring the world from above by using drones, satellites, and mapping tools to better understand our landscapes. Connect with him on LinkedIn: Adam Clark















