HQ-SAM: A high-quality upgrade to SAM for accurate zero-shot segmentation of intricate objects.
The HQ-SAM (High-Quality Segment Anything Model) is an upgrade to the Segment Anything Model (SAM) designed to accurately segment complex objects. It maintains SAM's zero-shot capabilities and promptable design while introducing minimal additional parameters and computation. The HQ-SAM uses a learnable High-Quality Output Token that is injected into SAM's mask decoder and fuses with early and final ViT features for improved mask details. It was trained on a dataset of 44K fine-grained masks and evaluated in a suite of 9 diverse segmentation datasets across different downstream tasks, including zero-shot transfer protocols. The code requires python>=3.8, pytorch>=1.7, and torchvision>=0.8 and can be installed through the provided instructions. Pretrained checkpoint models are available for download, and usage examples are provided in the demo and colab notebook. If used in research, the authors ask to cite their paper.