
Understanding Cache Coherency
Modern high-performance devices communicate with the CPU through shared memory structures such as DMA Rings . When one side updates memory, the other side must see the latest value. On cache-coherent systems this happens automatically. On many ARM platforms it does not. This post explains what breaks, why it breaks, and how the Linux DMA API solves it. Why DMA Fails on Non-Coherent Systems Consider the completion flow from the earlier ring design in How Hardware and Software Share a Queue: Understanding DMA Rings : Device DMA-writes a completion entry Device updates WR_IDX CPU reads WR_IDX and processes new entries On a non-coherent system the driver may: read an old WR_IDX read a partially updated descriptor never observe new completions This happens because the CPU and the DMA engine do not observe memory through the same path. System Hardware View +----------------------+ | CPU | | Driver (load/store)| +----------+-----------+ | +----v----+ | Cache | (L1/L2) +----+----+ | | +------v
Continue reading on Dev.to
Opens in a new tab



