definition - In CUDA, what is memory coalescing, and how is it achieved? -


what "coalesced" in cuda global memory transaction? couldn't understand after going through cuda guide. how it? in cuda programming guide matrix example, accessing matrix row row called "coalesced" or col.. col.. called coalesced? correct , why?

it's information applies compute capabality 1.x, or cuda 2.0. more recent architectures , cuda 3.0 have more sophisticated global memory access , in fact "coalesced global loads" not profiled these chips.

also, logic can applied shared memory avoid bank conflicts.


a coalesced memory transaction 1 in of threads in half-warp access global memory @ same time. oversimple, correct way have consecutive threads access consecutive memory addresses.

so, if threads 0, 1, 2, , 3 read global memory 0x0, 0x4, 0x8, , 0xc, should coalesced read.

in matrix example, keep in mind want matrix reside linearly in memory. can want, , memory access should reflect how matrix laid out. so, 3x4 matrix below

0 1 2 3 4 5 6 7 8 9 b 

could done row after row, this, (r,c) maps memory (r*4 + c)

0 1 2 3 4 5 6 7 8 9 b 

suppose need access element once, , have 4 threads. threads used element? either

thread 0:  0, 1, 2 thread 1:  3, 4, 5 thread 2:  6, 7, 8 thread 3:  9, a, b 

or

thread 0:  0, 4, 8 thread 1:  1, 5, 9 thread 2:  2, 6, thread 3:  3, 7, b 

which better? result in coalesced reads, , not?

either way, each thread makes 3 accesses. let's @ first access , see if threads access memory consecutively. in first option, first access 0, 3, 6, 9. not consecutive, not coalesced. second option, it's 0, 1, 2, 3. consecutive! coalesced! yay!

the best way write kernel , profile see if have non-coalesced global loads , stores.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -