
Distributed LLM Inference Across NVIDIA Blackwell and Apple Silicon Over 10GbE
I connected an NVIDIA DGX Spark to a Mac Studio with a direct 10-gigabit Ethernet cable and split a large language model across both GPUs. Here's what actually happened. The Problem I have two machines that are excellent at different things: NVIDIA DGX Spark (GB10 Blackwell, 120 GB unified memory) — screaming fast tensor cores, CUDA 13 Mac Studio (M2 Ultra, 128 GB unified memory) — great Metal GPU, massive memory bandwidth Combined: 248 GB of GPU-accessible memory. Enough to run models that don't fit on either machine alone — 100B+ parameter models at reasonable quantization levels. The question: can you actually get useful performance by splitting a model across heterogeneous GPUs over a network link? The Physical Setup I connected both machines with a direct 10GbE cable — no switch, no router. Just a CAT6A cable between: DGX: Realtek 10GbE NIC ( enP7s7 ) → 192.168.100.2/24 Mac Studio: 10GbE port ( en0 ) → 192.168.100.1/24 Measured throughput: 9.41 Gbps . Both machines keep WiFi for L
Continue reading on Dev.to
Opens in a new tab




