Traditional methods for constructing high-quality, personalized head avatars from monocular videos demand extensive face captures and training time, posing a significant challenge for scalability. This paper introduces a novel approach to create high quality head avatar utilizing only a single or a few images per user. We learn a generative model for 3D animatable photo-realistic head avatar from a multi-view dataset of expressions from 2407 subjects, and leverage it as a prior for creating personalized avatar from few-shot images. Different from previous 3D-aware face generative models, our prior is built with a 3DMManchored neural radiance field backbone, which we show to be more effective for avatar creation through auto-decoding based on few-shot inputs. We also handle unstable 3DMM fitting by jointly optimizing the 3DMM fitting and camera calibration that leads to better few-shot adaptation. Our method demonstrates compelling results and outperforms existing state-of-the-art methods for few-shot avatar adaptation, paving the way for more efficient and personalized avatar creation.
Labels - From left to right: (1) Next3D [1], (2) MonoAvatar [2], (3) Ours, (4) Ground Truth (Input Driving Video)
(Since Next3D is trained with cropped facial images, we apply the same cropping for all methods here for a fair comparison. Note that the "jitters" in the videos below is caused by inconsistent cropping bounding boxes generated during data processing of Next3D, which are compared based on five estimated facial landmarks.)
Comparison on Five-shot avatar creation:
Comparison on One-shot avatar creation:
Avatar generation in novel identity by interpolating latent code of two subjects.
[1] Jingxiang Sun, Xuan Wang, Lizhen Wang, Xiaoyu Li, Yong Zhang, Hongwen Zhang, and Yebin Liu. "Next3d: Generative neural texture rasterization for 3d-aware head avatars." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[2] Ziqian Bai, Feitong Tan, Zeng Huang, Kripasindhu Sarkar, Danhang Tang, Di Qiu, Abhimitra Meka et al. "Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.