Flash-KMeans

Flash-KMeans: The GPU-Powered Clustering Algorithm That Beats FAISS by 200x

17 June 2026 by Daniel

Flash-KMeans is an open-source, IO-aware implementation of standard Lloyd’s k-means in Triton GPU kernels. It does not change the math or approximate. FlashAssign removes distance-matrix materialization; Sort-Inverse Update eliminates atomic contention. On an NVIDIA H200, it reports 17.9× end-to-end, 33× over cuML, and over 200× over FAISS.

The post Meet Flash-KMeans: An IO-Aware, Exact K-Means That Runs Over 200× Faster Than FAISS on GPUs appeared first on MarkTechPost.