CUDA Gets Easier!

Several of my readers have had problems creating CUDA projects in Visual Studio, so I thought I’d update how to do it using the current version of CUDA (3.0 at the time of this writing).  The main point: it’s a lot easier than the procedure I outlined two years ago.

For hardware, I’m now using a Zotac GeForce GT240 card with 96 stream processors that I purchased last year for $90. For my software development environment, I downloaded and installed the Microsoft SDK for Windows Server 2008 and Microsoft Visual C++ 2008 Express Edition. Then I downloaded and installed the NVIDIA Driver 197.13, the CUDA Toolkit 3.0 and the CUDA SDK 3.0 for 32-bit Windows XP.

Once everything was set up, the first thing I did was to recompile and run the deviceQuery example in

C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\deviceQuery

I just double-clicked the deviceQuery_vc90.sln file and the project popped-up in the Visual Studio IDE. I hit F7 to rebuild the program, and then I pressed Ctrl+F5 to run it. The program ran and reported the presence of a GeForce GT 240 in my PC. So far, so good.

Next, I created an empty Win32 console application called cuda_example3. I renamed cuda_example3.cpp to cuda_example3.cu because that’s where the CUDA kernel source is going. Then I copied the source from my first CUDA program into the file and saved it. Here’s the code so you can see it:

// cuda_example3.cu : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
__global__ void square_array( float *a, int N )
{
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if ( idx < N )
        a[idx] = a[idx] * a[idx];
}



// main routine that executes on the host
int main( void )
{
    float *a_h, *a_d; // Pointer to host & device arrays
    const int N = 10; // Number of elements in arrays
    size_t size = N * sizeof( float );
    a_h = (float *)malloc( size );    // Allocate array on host
    cudaMalloc( (void **)&a_d, size ); // Allocate array on device
    // Initialize host array and copy it to CUDA device
    for ( int i = 0; i < N; i++ )
        a_h[i] = (float)i;
    cudaMemcpy( a_d, a_h, size, cudaMemcpyHostToDevice );
    // Do calculation on device:
    int block_size = 4;
    int n_blocks   = N / block_size + ( N % block_size == 0 ? 0 : 1 );
    square_array <<< n_blocks, block_size >>> ( a_d, N );
    // Retrieve result from device and store it in host array
    cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost );
    // Print results
    for ( int i = 0; i < N; i++ )
        printf( "%d %f\n", i, a_h[i] ); // Cleanup
    free( a_h );
    cudaFree( a_d );
}

At this point, Visual Studio had no idea how to compile a .cu file. In the past, I crafted a Custom Build Step in the Project Properties page that invoked Nvidia’s nvcc tool with the appropriate compiler options. No more need for that! Instead, I highlighted cuda_example3 in the Solution Explorer pane, and then selected Project→Custom Build Rules… from the menu. Then I clicked on the Find Existing… button in the Custom Build Rule Files window and steered it to this file:

C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\Cuda.rules

Cuda.rules contains all the rules and options needed to merge .cu files into the Visual Studio C++ compilation flow.

The only other changes I needed to make were to indicate the locations of the CUDA libraries in the project properties (I did this for both the Debug and Release configurations):

Configuration Properties → Linker -> General:
Additional Library Directories = C:\CUDA\lib;”C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib”

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib

After doing this, the program compiled and produced the following correct result:

0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000

For those of you who want to try CUDA but don’t have CUDA-enabled GPU card, there is a way to link to a CUDA device emulator. Simply replace cudart.lib with cudartemu.lib in the project properties as follows:

Configuration Properties → Linker → Input:
Additional Dependencies = cudartemu.lib

This supplants the use of the -deviceemu compiler option in earlier versions of CUDA.

Finally, you may want C++ syntax-coloring and Intellisense to work on your .cu source files. To get syntax-coloring, click on the Tools→Options menu. Then in the Options window under Text Editor→File Extension, enter the .cu and .cuh file extensions and select Microsoft Visual C++ as the editor. To enable Intellisense, you’ll have to edit the Windows registry by adding the .cu and .cuh file extensions to the key HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Languages\Language Services\C/C++\NCB Default C/C++ Extensions. That should do it.

Here’s the source code for this example if you want to try it.

Bookmark and Share

Follow

Get every new post delivered to your Inbox.

Join 29 other followers