
NexusQuant is now on PyPI, HuggingFace, and 9 awesome lists
This week we shipped everything. Here is the full list. What went out the door PyPI package — pip install nexusquant works. One line, no retraining, drop-in KV cache compression for any HuggingFace model. HuggingFace Space — live interactive demo at huggingface.co/spaces/jagmarques/nexusquant . Upload a model, pick a compression ratio, see perplexity in real time. Google Colab notebook — zero-setup walkthrough. Run the full pipeline in your browser without a local GPU. 13 blog posts — covering everything from E8 lattice quantization to attention-aware eviction, each with reproducible numbers and code. 9 awesome list PRs — submitted to awesome-llm, awesome-efficient-transformers, awesome-kv-cache, and six others. Four already merged. 5 GitHub issues — filed against PyTorch, vLLM, HuggingFace Transformers, LiteLLM, and llama.cpp to track upstream integration roadmap items. NeurIPS paper draft — the research that underpins all of this: NSN + Hadamard + E8 Lattice VQ + TCC giving 7x compre
Continue reading on Dev.to
Opens in a new tab


