How to write CUDA global function for this? -
i want convert following function cuda.
void fun() { for(i = 0; < terraingridlength; i++) { for(j = 0; j < terraingridwidth; j++) { //code of function } } }
i wrote function :
__global__ void fun() { int = blockidx.x * blockdim.x + threadidx.x; int j = blockidx.y * blockdim.y + threadidx.y; if((i < terraingridlength)&&(j<terraingridwidth)) { //code of function } }
i declared both terraingridlength , terraingridwidth constants , assigned value 120 both. , calling function like
fun<<<30,500>>>()
but not getting correct output.
is code wrote correct?.i didn't understood parellel execution of code.please explain me how code work , correct me if done mistakes.
you use y dimension means using 2d array threads, cannot invoke kernel only:
int numblock = 30; int numthreadsperblock = 500; fun<<<numblock,numthreadsperblock>>>()
the invocation should be: (note blocks have 2d threads)
dim3 dimgrid(grid_size, grid_size); // 2d grids size = grid_size*grid_size dim3 dimblocks(block_size, block_size); //2d blocks size = block_size*block_size fun<<<dimgrid, dimblocks>>>()
refer cuda programming guide further info, , if want 2d array or 3d, better use cudamalloc3d or cudamallocpitch
as of code, think work (but haven't tried though, hope can grab idea this):
//main dim3 dimgrid(1, 1); // 2d grids size = 1 dim3 dimblocks(width, height); //2d blocks size = height*width fun<<<dimgrid, dimblocks>>>(width, height) //kernel __global__ void fun(int width, int height) { int = blockidx.x * blockdim.x + threadidx.x; int j = blockidx.y * blockdim.y + threadidx.y; if((i < width)&&(j<height)) { //code of function } }
Comments
Post a Comment