Generalizable Human Gaussians for Sparse View Synthesis

Generalizable Human Gaussians (GHG)

for Sparse View Synthesis

ECCV 2024

Generalizable Human Gaussians (GHG) can synthesize high-quality novel view renderings of arbitrary human subject from sparse multi-view images.

Abstract

Recent progress in neural rendering have brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating within the training data, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D humans from sparse views presents formidable hurdles due to the inherent complexity of human geometry, resulting in inaccurate reconstructions of geometry and textures. To tackle this challenge, this paper leverages recent advancements in Gaussian splatting and introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views in a feed-forward manner. A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template, which allows leveraging the strong geometry prior and the advantages of 2D convolutions. Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.

Overview

(a) We focus on generalizable human rendering under very sparse view setting. (b) We first construct the multi-scaffolds by dilating the human template surface. The 2D UV space of each scaffold serves to collect the geometry and appearance information from the corresponding 3D locations. (c) The aggregated multi-scaffold input is fed into the network, which generates multi-Gaussian parameter maps. (d) Finally, Gaussians are anchored on the corresponding surface of each scaffold, and rasterized into novel views.

pipeline

In-domain Generalization - Trained and Tested on THuman 2.0

Our method utilizes a 3D human prior to achieve robust and multi-view consistent novel view renderings under very sparse view setting. In addition, our method collects visual information from the multi-scaffold representations and thus recovers sharp and high-frequency details including hair, wrinkles, and logos.

Cross-domain Generalization - Trained on THuman 2.0 Tested on RenderPeople

We achieve high-quality synthesis under the challenging cross-domain generalization setting.

Effectiveness of Multi-Scaffold

Each column shows different scaffold levels, with the last column illustrating their combined effect. The top part shows the RGB representation, while the bottom part highlights affected regions, with grey indicating unaffected areas.

Main Video

In this video, in-domain and cross-domain generalization results are shown. Note that GPS-Gaussian* is trained and tested with 5 input views due to the rectification requirement, whereas NHP, NIA and our method are trained and tested with 3 input views.