PointInfinity: Resolution-Invariant Point Diffusion Models

Zixuan Huang^1,2, Justin Johnson¹, Shoubhik Debnath¹, James M. Rehg², Chao-Yuan Wu¹

¹FAIR at Meta, ²UIUC

CVPR 2024

PointInfinity: a point diffusion model that trains on low resolution point clouds, while generates faithful high resolution point clouds. Performance continuously improves as inference resolution increases.

Abstract

We present PointInfinity, an efficient family of point cloud diffusion models. Our core idea is to use a transformer-based architecture with a fixed-size, resolution-invariant latent representation. This enables efficient training with low-resolution point clouds, while allowing high-resolution point clouds to be generated during inference. More importantly, we show that scaling the test-time resolution beyond the training resolution improves the fidelity of generated point clouds and surfaces. We analyze this phenomenon and draw a link to classifier-free guidance commonly used in diffusion models, demonstrating that both allow trading off fidelity and variability during inference. Experiments on CO3D show that PointInfinity can efficiently generate high-resolution point clouds (up to 131k points, 31 times more than Point-E) with state-of-the-art quality.

How Does PointInfinity Work?

PointInfinity is a conditional point diffusion model that generates point clouds based on RGBD images. During training, PointInfinity learns to denoise low-resolution point clouds, while during inference it generates point clouds of much higher resolution. The figure below is an overview of PointInfinity's setup.

The core of PointInfinity is the two-stream block. This is the main building block of our denoiser and decouples the underlying surface representation from the raw point cloud representation. As shown below, the two-stream block consists of 1) a fixed-sized latent surface stream, 2) a variable-sized raw data stream, and 3) the lightweight read/write modules exchanging information between the two streams. The read module loads information from the data stream to the latent surface stream, and perform the main computation there. The per-point noise prediction is then obtained by writing back to the data stream. The key idea is to execute most of the computation in the resolution-invariant latent space, which makes the denoiser robust to the change of resolution.

Results

We evaluate PointInfinity on the CO3D dataset. The figure below shows the qualitative results of PointInfinity. When generating higher resolution point clouds, PointInfinity's performance improves. We hypothesize that this is related to more information being carried across the diffusion steps. Please refer to our paper for a thorough analysis, together with more quantitative results and ablations.

BibTeX

@inproceedings{huang2024pointinfinity,
  author    = {Huang, Zixuan and Johnson, Justin and Debnath, Shoubhik and Rehg, James M and Wu, Chao-Yuan},
  title     = {PointInfinity: Resolution-Invariant Point Diffusion Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages     = {10050--10060},
  year      = {2024},
}