TL;DR: TripleSplat is a feed-forward framework that generates 3D Gaussians from a 3D-native adaptive triplane representation, enabling sparse-view large-scale scene reconstruction beyond camera frustums.
Reconstructing large-scale scenes from sparse and widely spaced viewpoints remains challenging. Existing feed-forward approaches rely on 2D-centric depth-unprojection pipelines that constrain geometry to camera frustums and assume densely sampled views for reliable pixel-wise matching, leading to degraded performance under sparse inputs. We introduce TripleSplat, a feed-forward framework that generates 3D Gaussians from a 3D-native triplane representation. Our key contribution is an adaptive triplane that predicts the optimal position and scale for each view, enabling reconstruction beyond camera frustums and handling large camera translations and scale variations. Extensive experiments demonstrate that TripleSplat achieves state-of-the-art performance on sparse-view large-scale scene benchmarks, while also generalizing to object-level reconstruction.
TripleSplat is a feed-forward framework that predicts 3D Gaussians directly from sparse input views. Standard pixel-aligned depth unprojection (a) ties the reconstructed geometry to back-projected depth, confining it to the camera frustums. In contrast, TripleSplat (b) dynamically adjusts the position and scale of its triplane, allowing it to handle large camera translations and scale variations.