CUDA Tutorial

Here is a good introductory article on GPU computing that’s oriented toward CUDA: The GPU Computing Era .

Below is a list of my blog entries that discuss developing parallel programs using CUDA. These are listed in the proper sequence so you can just click through them instead of having to search through the entire blog.

  1. Getting started with CUDA
  2. My first CUDA program!
  3. Threads and blocks and grids, oh my!
  4. Updating to CUDA 2.3
  5. CUDA Gets Easier!

I would be remiss if I didn’t also point you to the great series of articles written by Rob Farber and published in Dr. Dobb’s Journal:

  1. CUDA, Supercomputing for the Masses: Part 1
  2. CUDA, Supercomputing for the Masses: Part 2
  3. CUDA, Supercomputing for the Masses: Part 3
  4. CUDA, Supercomputing for the Masses: Part 4
  5. CUDA, Supercomputing for the Masses: Part 5
  6. CUDA, Supercomputing for the Masses: Part 6
  7. CUDA, Supercomputing for the Masses: Part 7
  8. CUDA, Supercomputing for the Masses: Part 8
  9. CUDA, Supercomputing for the Masses: Part 9
  10. CUDA, Supercomputing for the Masses: Part 10
  11. CUDA, Supercomputing for the Masses: Part 11
  12. CUDA, Supercomputing for the Masses: Part 12
  13. CUDA, Supercomputing for the Masses: Part 13
  14. CUDA, Supercomputing for the Masses: Part 14
  15. CUDA, Supercomputing for the Masses: Part 15
  16. CUDA, Supercomputing for the Masses: Part 16
  17. CUDA, Supercomputing for the Masses: Part 17
  18. CUDA, Supercomputing for the Masses: Part 18

20 Responses to CUDA Tutorial

  1. Anir says:

    Very Informative tutorial, very helpful tips, thanks for sharing guys, its so helpful for a newbie like me.:)

  2. Hey thanks yar.. here i comes to know about what is CUDA.. and started studying now! :)

  3. Pingback: Elsewhere, on January 28th - Once a nomad, always a nomad

  4. We have an extremely wide variety of running a blog websites out there readily available for blog writers to make use of. There are many to pick from. Simply to name some there is live journal …wordpress

  5. Manoj says:

    Hi,

    Thanks for excellent article. Is there any article written for CUDA for C++ by you. If yes, please give me the link.

    Manoj

  6. Vivek Thakur says:

    A perfect guide for beginners.
    The main concept of CUDA.

    http://www.techrefined.com/progamming/cuda-way/

  7. Vivek Thakur says:

    Those who are unaware of what cuda is may visit this link

    http://www.techrefined.com/progamming/parallel-computing/cuda/

  8. sriram says:

    awesome tutorial..

  9. manan dedhia says:

    what the f*** is CUDA

  10. Pingback: O que há de novo no Mathematica 8? « INTEGRALDX

  11. Deepak says:

    Hello,
    I am facing one strange problem. I am very new to CUDA.
    I tried the first example according to CUDA by EXAMPLE.
    I tried to run the program on CPU and then changed the program in accordance to CUDA.
    Strangely, MY cuda program takes 8 times more time than the CPU version.

    Its very strange
    My cpu program is :
    #include
    #include
    #include
    #include
    #define N 10
    void add( float *a, float *b, float *c ) {
    int tid = 0;
    while (tid < N) {
    c[tid] = (a[tid]/(a[tid]*a[tid])) + (b[tid]/(b[tid]*b[tid]));
    tid += 1;
    }
    }
    int main( void ) {
    float elapsed;
    float a[N], b[N], c[N];
    int test;
    int i;
    clock_t timerStart, timerStop;
    for (i=0; i<N; i++) {
    a[i] = (float) (i)/(i+1);
    b[i] = (float) (i)/(i+1);
    c[i] = 0;
    }
    timerStart = clock();
    add( a, b, c );
    timerStop = clock();
    elapsed = (float) ( timerStop – timerStart ) / CLOCKS_PER_SEC;
    printf( "Time elapsed: %f ", elapsed);
    return 0;
    }

    My CUDA version is:

    #include "Common.h"
    #include "cutil.h"
    #include

    //#define TIMECUDA
    #define TIMECPU
    #define N 10

    __global__ void add( float *a, float *b, float *c ) {
    int tid = blockIdx.x; // TID is the block ID
    if (tid < N) {
    c[tid] = (a[tid]/(a[tid]*a[tid])) + (b[tid]/(b[tid]*b[tid]));
    }
    }

    int main( void ) {
    float a[N], b[N], c[N];
    float *temp_a,*temp_b,*temp_c;
    long i;
    #ifdef TIMECUDA
    float elapsed_time_cpu_gpu,elapsed_time_add,elapsed_time_gpu_cpu;
    #else
    #endif
    #ifdef TIMECPU
    float elapsed_time;
    #else
    #endif
    #ifdef TIMECPU
    clock_t timerStart, timerStop;
    #else
    #endif

    #ifdef TIMECUDA
    cudaEvent_t start,stop,startadd,stopadd,startback,stopback;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
    cudaEventCreate(&startadd);
    cudaEventCreate(&stopadd);
    cudaEventCreate(&startback);
    cudaEventCreate(&stopback);
    #else
    #endif

    cudaMalloc((void**)&temp_a,N*sizeof(int));
    cudaMalloc((void**)&temp_b,N*sizeof(int));
    cudaMalloc((void**)&temp_c,N*sizeof(int));

    for (i=0; i<N; i++) {
    a[i] = (float) (i)/(i+1);
    b[i] = (float) (i)/(i+1);
    c[i] = 0;
    }
    #ifdef TIMECUDA
    cudaEventRecord(start,0);
    #else
    #endif
    cudaMemcpy(temp_a,a,N*sizeof(int),cudaMemcpyHostToDevice);
    cudaMemcpy(temp_b,b,N*sizeof(int),cudaMemcpyHostToDevice);
    #ifdef TIMECUDA
    cudaEventRecord(stop,0);
    cudaEventSynchronize(stop);
    cudaEventElapsedTime(&elapsed_time_cpu_gpu,start,stop);
    printf("Time taken CUDA : %f \n",elapsed_time_cpu_gpu);

    cudaEventDestroy(start);
    cudaEventDestroy(stop);

    #else
    #endif

    #ifdef TIMECPU
    timerStart = clock();
    #else
    #endif

    #ifdef TIMECUDA
    cudaEventRecord(startadd,0);
    #else
    #endif
    add<<>>(temp_a,temp_b,temp_c);
    #ifdef TIMECUDA
    cudaEventRecord(stopadd,0);
    cudaEventSynchronize(stopadd);
    cudaEventElapsedTime(&elapsed_time_add,startadd,stopadd);
    printf(“Time taken CUDA : %f \n”,elapsed_time_add);

    cudaEventDestroy(startadd);
    cudaEventDestroy(stopadd);

    #else
    #endif
    #ifdef TIMECPU
    timerStop = clock();
    elapsed_time = (float) ( timerStart – timerStop ) / CLOCKS_PER_SEC;
    printf(“Time taken CPU : %f \n”,elapsed_time);
    #else
    #endif
    #ifdef TIMECUDA
    cudaEventRecord(startback,0);
    #else
    #endif
    cudaMemcpy(c,temp_c, N*sizeof(int),cudaMemcpyDeviceToHost);
    #ifdef TIMECUDA
    cudaEventRecord(stopback,0);
    cudaEventSynchronize(stopback);
    cudaEventElapsedTime(&elapsed_time_gpu_cpu,startback,stopback);
    printf(“Time taken CUDA : %f “,elapsed_time_gpu_cpu);

    cudaEventDestroy(startback);
    cudaEventDestroy(stopback);

    #else
    #endif

    /* for (i=0; i<N; i++) {
    printf ("%f %f %f\n", a[i], b[i], c[i] );
    }
    */

    cudaFree(temp_a);
    cudaFree(temp_b);
    cudaFree(temp_c);
    return 0;
    }

    Also I see that copying from device to host and from host to device takes major part of time.
    Can you please see into this and help me why its showing such strange behavior.

  12. Pingback: Tutorial de CUDA « Cómo aprender cuda en un mes…

  13. mia85 says:

    Hello everybody
    I m here for another time to ask you for help.
    the goal of my project is to use parallelism to reduce program latency time.
    i start with data parallelism and i succes to do it with cuda structure,so the idea now is to improve the program with treatement parallelism,i think to a first solution using thread,the first thread do treatement in GPU and the second one do the inputfrom CPU memory to GPU memory-output from GPU memory to CPU memory.
    but infortunately i don t get wishable result only 0000 that mean that i have problems to pass gpu parameters to thread function even if i do all the allocation & copy!!!!!

    do you have any ideas

  14. Pingback: Pengalaman pertama dengan CUDA « Rudy ngeBlog

  15. Pingback: Coding With GPGPU: Useful Links

  16. jpt says:

    I love CUDA.

  17. Pingback: Coding With GPGPU: CUDA

  18. Pingback: Coding With GPGPU: Useful Links

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 32 other followers

%d bloggers like this: