Pilotann: A Hybrid Cpu-gpu System For Graph-based Anns

Trending 5 days ago
ARTICLE AD BOX

Approximate Nearest Neighbor Search (ANNS) is simply a basal vector hunt method that efficiently identifies akin items successful high-dimensional vector spaces. Traditionally, ANNS has served arsenic nan backbone for retrieval engines and proposal systems, however, it struggles to support gait pinch modern Transformer architectures that employment higher-dimensional embeddings and larger datasets. Unlike deep learning systems that tin beryllium horizontally scaled owed to their stateless nature, ANNS remains centralized, creating a terrible single-machine throughput bottleneck. Empirical testing pinch 100-million standard datasets reveals that moreover state-of-the-art CPU implementations of nan Hierarchical Navigable Small World (HNSW) algorithm can’t support capable capacity arsenic vector dimensions increase.

Previous investigation connected large-scale ANNS has explored 2 optimization paths: scale building improvements and hardware acceleration. The Inverted MultiIndex (IMI) enhanced abstraction partitioning done multi-codebook quantization, while PQFastScan improved capacity pinch SIMD and cache-aware optimizations. DiskANN and SPANN introduced disk-based indexing for billion-scale datasets, addressing representation level challenges done different approaches. SONG and CAGRA achieved awesome speedups done GPU parallelization but stay constrained by GPU representation capacity. BANG handled billion-scale datasets via hybrid CPU-GPU processing but lacked captious CPU baseline comparisons. These methods often sacrifice compatibility, accuracy aliases require specialized hardware.

Researchers from nan Chinese University of Hong Kong, Centre for Perceptual and Interactive Intelligence, and Theory Lab of Huawei Technologies person projected PilotANN, a hybrid CPU-GPU strategy designed to flooded nan limitations of existing ANNS implementations. PilotANN addresses nan challenge: CPU-only implementations struggle pinch computational demands, while GPU-only solutions are constrained by constricted representation capacity. It solves this rumor by utilizing some nan abundant RAM of CPUs and nan parallel processing capabilities of GPUs. Moreover, it employs a three-stage chart traversal process, GPU-accelerated subgraph traversal utilizing dimensionally-reduced vectors, CPU refinement, and precise hunt pinch complete vectors.

PilotANN fundamentally reimagines nan vector hunt process done a “staged information fresh processing” paradigm. It minimizes information activity crossed processing stages alternatively than adhering to accepted “move information for computation” models. It besides consists of 3 stages: GPU piloting pinch subgraph and dimensionally-reduced vectors, residual refinement utilizing subgraph pinch afloat vectors, and last traversal employing afloat chart and complete vectors. The creation shows cost-effectiveness pinch only a azygous commodity GPU while scaling efficaciously crossed vector dimensions and chart complexity. Data transportation overhead is minimized to conscionable nan first query vector activity to GPU and a mini campaigner group returning to CPU aft GPU piloting.

Experimental results show PilotANN’s capacity advantages crossed divers large-scale datasets. PilotANN achieves a 3.9 times throughput speedup connected nan 96-dimensional DEEP dataset compared to nan HNSW-CPU baseline, pinch moreover much awesome gains of 5.1-5.4 times connected higher-dimensional datasets. PilotANN delivers important speedups moreover connected nan notoriously challenging T2I dataset contempt nary circumstantial optimizations for this benchmark. Moreover, it shows singular cost-effectiveness contempt utilizing much costly hardware. While nan GPU-based level costs 2.81 USD/hour compared to nan CPU-only solution astatine 1.69 USD/hour, PilotANN achieves 2.3 times cost-effectiveness for DEEP and 3.0-3.2 times for T2I, WIKI, and LAION datasets erstwhile measuring throughput per dollar.

In conclusion, researchers introduced PilotANN, an advancement successful graph-based ANNS that efficaciously utilizes CPU and GPU resources for emerging workloads. It shows awesome capacity complete existing CPU-only approaches done nan intelligent decomposition of top-k hunt into a multi-stage CPU-GPU pipeline and implementation of businesslike introduction selection. It democratizes high-performance nearest neighbour hunt by achieving competitory results pinch a azygous commodity GPU, making precocious hunt capabilities accessible to researchers and organizations pinch constricted computing resources. Unlike replacement solutions requiring costly high-end GPUs, PilotANN enables businesslike ANNS deployment connected communal hardware configurations while maintaining hunt accuracy.


Check out the Paper and GitHub Page. All in installments for this investigation goes to nan researchers of this project. Also, feel free to travel america on Twitter and don’t hide to subordinate our 85k+ ML SubReddit.

Sajjad Ansari is simply a last twelvemonth undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into nan applicable applications of AI pinch a attraction connected knowing nan effect of AI technologies and their real-world implications. He intends to articulate analyzable AI concepts successful a clear and accessible manner.

More