Memory Coalescing: Same computation, 6x Performance Difference

In software engineering, if two approaches are both O(n), that is often good enough for the discussion. But in low-level or performance engineering, that is not the end of the story. Even when two algorithms have the same time complexity, the actual performance can be very different depending on how they access memory. A simple example is iterating through an array versus a linked list. Both are O(n), but arrays are usually much faster in practice because their memory layout is contiguous, which allows the CPU to use caches much more efficiently. The same idea applies on GPUs too, but the effect is often much bigger because many threads are accessing memory at the same time. What is Memory Coalescing? On NVIDIA GPUs, threads execute in groups called warps, which contain 32 threads. When those threads access memory in a well-structured way, the GPU can combine their requests into a small number of memory transactions. That is called memory coalescing. When the access pattern is poor, th

Memory Coalescing: Same computation, 6x Performance Difference

Related Articles

Elastic tabstops (2006)

A Survey and Taxonomy of Graph Sampling

I developed an app to download media from social media, check it out.

Wastrel milestone: full hoot support, with generational gc as a treat

Environment variables are a legacy mess: Let's dive deep into them

Related Articles

News
Elastic tabstops (2006)
Lobsters • 2h ago

News
A Survey and Taxonomy of Graph Sampling
Dev.to • 3h ago

News
I developed an app to download media from social media, check it out.
Reddit Programming • 6h ago

News
Wastrel milestone: full hoot support, with generational gc as a treat
Lobsters • 7h ago

News
Environment variables are a legacy mess: Let's dive deep into them
Reddit Programming • 7h ago