cuda - Increasing block size decreases performance -



in cuda code if increase blocksizex ,blocksizey taking more time .[therefore run @ 1x1]also chunk of execution time ( eg 7 out of 9 s ) taken call kernel .infact quite amazed if comment out entire kernel time same.any suggestions , how optimize?

p.s. have edited post actual code .i downsampling image every 4 neighoring pixels (so eg 1,2 row 1 , 1,2 row 2) give output pixel.i effective bw of .5gb/s compared theoretical maximum of 86.4 gb/s.the time use difference in calling kernel instructions , calling empty kernel. looks pretty bad me right cant figure out doing wrong.

 __global__ void streamkernel(int *r_d,int *g_d,int *b_d,int height ,int width,int *f_r,int *f_g,int *f_b){       int id=blockidx.x * blockdim.x*blockdim.y+ threadidx.y*blockdim.x+threadidx.x+blockidx.y*griddim.x*blockdim.x*blockdim.y;     int number=2*(id%(width/2))+(id/(width/2))*width*2;       if (id<height*width/4)     {          f_r[id]=(r_d[number]+r_d[number+1];+r_d[number+width];+r_d[number+width+1];)/4;                                       f_g[id]=(g_d[number]+g_d[number+1]+g_d[number+width]+g_d[number+width+1])/4;                      f_b[id]=(g_d[number]+g_d[number+1]+g_d[number+width]+g_d[number+width+1];)/4;       }     } 

try looking matrix multiplication example in cuda sdk examples how use shared memory.

the problem current kernel it's doing 4 global memory reads , 1 global memory write each 3 additions , 1 division. each global memory access costs 400 cycles. means you're spending vast majority of time doing memory access (what gpus bad at) rather compute (what gpus at).

shared memory in effect allows cache amortized, 1 read , 1 write @ each pixel 3 additions , 1 division. still not doing great on cgma ratio (compute global memory access ratio, holy grail of gpu computing).

overall, think simple kernel this, cpu implementation going faster given overhead of transferring data across pci-e bus.


Comments

Popular posts from this blog

apache - Add omitted ? to URLs -

redirect - bbPress Forum - rewrite to wwww.mysite prohibits login -

php - How can I stop spam on my custom forum/blog? -