Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

Yuxuan Wang*1, Xuanyu Yi*1, Haohan Weng*2, Qingshan Xu1, Xiaokang Wei3, Xianghui Yang2, Chunchao Guo2 Long Chen4, Hanwang Zhang1
1Nanyang Technological University, 2Tencent Hunyuan,
3The Hong Kong Polytechnic University, 4Hong Kong University of Science and Technology
(*Equal Contribution)
Interpolate start reference image.

Mesh assets generated by our Nautilus: Given a point cloud or a single image as input, Nautilus enables the direct generation of aesthetic, artist-like mesh assets conditioned on the input.

Abstract

We propose Nautilus, a locality-aware autoregressive autoencoder for artist-like mesh generation. By leveraging the local properties of manifold meshes, it achieves structural fidelity and efficient representation, enabling the generation of meshes with an unprecedented scale of up to 5,000 faces. Extensive experiments demonstrate that Nautilus significantly outperforms state-of-the-art methods in both fidelity and scalability.

Mesh assets generated by our Nautilus: We’ve added some basic textures to highlight their geometry, showcasing their potential for 3D applications like AR/VR, gaming, and digital design.

Autoregressive Modeling

Tokenize Meshes into Sequence

Most previous methods employ vanilla tokenization. It processes each face in isolation, simply flattening the three vertices of each triangle face into a 1D sequence.

We introduce a novel Nautilus-style Tokenization that traverses mesh faces in the form of Nautilus shells. Each shell organizes faces around a central vertex O with an ordered sequence of surrounding vertices P. This representation preserves the adjacency of neighboring vertices in sequence and achieves effective compression of sequence length.



Autoregressive Generation

Given the input condition, the transformer decoder autoregressively generates the mesh sequences tokenized by the Nautilus-style algorithm. Then in detokenization, the output tokens are converted to interconnected mesh faces, constructing our generated mesh assets. For point cloud condition, we introduce a Dual-stream Point Conditioner to capture global shape information and fine-grained local geometry, ensuring global shape consistency while enhancing local structure fidelity.

Achieve Artist-like Mesh Results

Conditioned on Point cloud

Our default input condition for mesh generation is point cloud, given its easy accessibility and rich geometric information. The videos below present the point cloud conditioned generation results of our Nautilus.

Our test conditions are significantly challenging, including thin structures, anisotropic faces, and intricate geometric details, while our Nautilus successfully addresses these challenges and achieves superior quality.



Conditioned on Image

To further expand practical applications, we extend our Nautilus to support image-conditioned mesh generation, where our Nautilus generates detailed, manifold meshes with sharp features that accurately align with the input conditions.

BibTeX

@article{wang2025nautilus,
  title={Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation},
  author={Wang, Yuxuan and Yi, Xuanyu and Weng, Haohan and Xu, Qingshan and Wei, Xiaokang and Yang, Xianghui and Guo, Chunchao and Chen, Long and Zhang, Hanwang},
  journal={arXiv preprint arXiv:2501.14317},
  year={2025}
}