Distributed LLM Inference Across NVIDIA Blackwell and Apple Silicon Over 10GbE

I connected an NVIDIA DGX Spark to a Mac Studio with a direct 10-gigabit Ethernet cable and split a large language model across both GPUs. Here's what actually happened. The Problem I have two machines that are excellent at different things: NVIDIA DGX Spark (GB10 Blackwell, 120 GB unified memory) — screaming fast tensor cores, CUDA 13 Mac Studio (M2 Ultra, 128 GB unified memory) — great Metal GPU, massive memory bandwidth Combined: 248 GB of GPU-accessible memory. Enough to run models that don't fit on either machine alone — 100B+ parameter models at reasonable quantization levels. The question: can you actually get useful performance by splitting a model across heterogeneous GPUs over a network link? The Physical Setup I connected both machines with a direct 10GbE cable — no switch, no router. Just a CAT6A cable between: DGX: Realtek 10GbE NIC ( enP7s7 ) → 192.168.100.2/24 Mac Studio: 10GbE port ( en0 ) → 192.168.100.1/24 Measured throughput: 9.41 Gbps . Both machines keep WiFi for L

Distributed LLM Inference Across NVIDIA Blackwell and Apple Silicon Over 10GbE

Related Articles

Belkin’s battery-equipped Switch 2 case is more than 35 percent off right now

Why this Marshall is the first soundbar I've tested that truly challenges my Sonos Arc Ultra

This App Makes Even the Sketchiest PDF or Word Doc Safe to Open

References: The Alias You Didn’t Know You Needed

Pointers: The Concept Everyone Says Is Hard

Related Articles

How-To
Belkin’s battery-equipped Switch 2 case is more than 35 percent off right now
The Verge • 4h ago

How-To
Why this Marshall is the first soundbar I've tested that truly challenges my Sonos Arc Ultra
ZDNet • 5h ago

How-To
This App Makes Even the Sketchiest PDF or Word Doc Safe to Open
Wired • 5h ago

How-To
References: The Alias You Didn’t Know You Needed
Medium Programming • 7h ago

How-To
Pointers: The Concept Everyone Says Is Hard
Medium Programming • 7h ago