Back to articles
Memory Coalescing: Same computation, 6x Performance Difference

Memory Coalescing: Same computation, 6x Performance Difference

via Dev.toMyoungho Shin

In software engineering, if two approaches are both O(n), that is often good enough for the discussion. But in low-level or performance engineering, that is not the end of the story. Even when two algorithms have the same time complexity, the actual performance can be very different depending on how they access memory. A simple example is iterating through an array versus a linked list. Both are O(n), but arrays are usually much faster in practice because their memory layout is contiguous, which allows the CPU to use caches much more efficiently. The same idea applies on GPUs too, but the effect is often much bigger because many threads are accessing memory at the same time. What is Memory Coalescing? On NVIDIA GPUs, threads execute in groups called warps, which contain 32 threads. When those threads access memory in a well-structured way, the GPU can combine their requests into a small number of memory transactions. That is called memory coalescing. When the access pattern is poor, th

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles