We introduce GarmentCrafter, a new approach to enable non-professional users to create and modify 3D garments from a single-view image. While recent advances in image generation have facilitated 2D garment design, creating and editing 3D garments remains challenging for non-professional users. Existing methods for single-view 3D reconstruction often rely on pre-trained generative models to hallucinate novel views conditioning on the reference image and camera pose, yet they lack cross-view consistency, failing to capture the internal relationships across different views. In this paper, we tackle this challenge through progressive depth prediction and image warping to approximate novel views. Subsequently, we train a multi-view diffusion model to complete occluded and unknown clothing regions, informed by the evolving camera pose. By jointly inferring RGB and depth, GarmentCrafter enforces inter-view coherence and reconstructs precise geometries and fine details. Extensive experiments demonstrate that our method achieves superior visual fidelity and inter-view coherence compared to state-of-the-art single-view 3D garment reconstruction methods. Our model will be publicly available.
Given a garment image, our method performs depth-aware novel view synthesis along a predefined zigzag camera trajectory. At each camera rotation, the current point cloud is projected into the image space, resulting in incomplete RGB and depth images. Our diffusion model completes the RGB image using the warped view, input image, and camera pose as conditions, while a depth completion network refines the depth map using the completed RGB, warped depth, and camera pose. The re-projected point cloud is then merged with the previous one to produce an updated point cloud. This iterative process continues until a full 3D representation of the garment is achieved.