CUDA Gets Easier!

Several of my readers have had problems creating CUDA projects in Visual Studio, so I thought I’d update how to do it using the current version of CUDA (3.0 at the time of this writing).  The main point: it’s a lot easier than the procedure I outlined two years ago.

For hardware, I’m now using a Zotac GeForce GT240 card with 96 stream processors that I purchased last year for $90. For my software development environment, I downloaded and installed the Microsoft SDK for Windows Server 2008 and Microsoft Visual C++ 2008 Express Edition. Then I downloaded and installed the NVIDIA Driver 197.13, the CUDA Toolkit 3.0 and the CUDA SDK 3.0 for 32-bit Windows XP.

Once everything was set up, the first thing I did was to recompile and run the deviceQuery example in

C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\deviceQuery

I just double-clicked the deviceQuery_vc90.sln file and the project popped-up in the Visual Studio IDE. I hit F7 to rebuild the program, and then I pressed Ctrl+F5 to run it. The program ran and reported the presence of a GeForce GT 240 in my PC. So far, so good.

Next, I created an empty Win32 console application called cuda_example3. I renamed cuda_example3.cpp to because that’s where the CUDA kernel source is going. Then I copied the source from my first CUDA program into the file and saved it. Here’s the code so you can see it:

// : Defines the entry point for the console application.

#include "stdafx.h"

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
__global__ void square_array( float *a, int N )
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if ( idx < N )
        a[idx] = a[idx] * a[idx];

// main routine that executes on the host
int main( void )
    float *a_h, *a_d; // Pointer to host & device arrays
    const int N = 10; // Number of elements in arrays
    size_t size = N * sizeof( float );
    a_h = (float *)malloc( size );    // Allocate array on host
    cudaMalloc( (void **)&a_d, size ); // Allocate array on device
    // Initialize host array and copy it to CUDA device
    for ( int i = 0; i < N; i++ )
        a_h[i] = (float)i;
    cudaMemcpy( a_d, a_h, size, cudaMemcpyHostToDevice );
    // Do calculation on device:
    int block_size = 4;
    int n_blocks   = N / block_size + ( N % block_size == 0 ? 0 : 1 );
    square_array <<< n_blocks, block_size >>> ( a_d, N );
    // Retrieve result from device and store it in host array
    cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost );
    // Print results
    for ( int i = 0; i < N; i++ )
        printf( "%d %f\n", i, a_h[i] ); // Cleanup
    free( a_h );
    cudaFree( a_d );

At this point, Visual Studio had no idea how to compile a .cu file. In the past, I crafted a Custom Build Step in the Project Properties page that invoked Nvidia’s nvcc tool with the appropriate compiler options. No more need for that! Instead, I highlighted cuda_example3 in the Solution Explorer pane, and then selected Project→Custom Build Rules… from the menu. Then I clicked on the Find Existing… button in the Custom Build Rule Files window and steered it to this file:

C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\Cuda.rules

Cuda.rules contains all the rules and options needed to merge .cu files into the Visual Studio C++ compilation flow.

The only other changes I needed to make were to indicate the locations of the CUDA libraries in the project properties (I did this for both the Debug and Release configurations):

Configuration Properties → Linker -> General:
Additional Library Directories = C:\CUDA\lib;”C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib”

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib

After doing this, the program compiled and produced the following correct result:

0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000

For those of you who want to try CUDA but don’t have CUDA-enabled GPU card, there is a way to link to a CUDA device emulator. Simply replace cudart.lib with cudartemu.lib in the project properties as follows:

Configuration Properties → Linker → Input:
Additional Dependencies = cudartemu.lib

This supplants the use of the -deviceemu compiler option in earlier versions of CUDA.

Finally, you may want C++ syntax-coloring and Intellisense to work on your .cu source files. To get syntax-coloring, click on the Tools→Options menu. Then in the Options window under Text Editor→File Extension, enter the .cu and .cuh file extensions and select Microsoft Visual C++ as the editor. To enable Intellisense, you’ll have to edit the Windows registry by adding the .cu and .cuh file extensions to the key HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Languages\Language Services\C/C++\NCB Default C/C++ Extensions. That should do it.

Here’s the source code for this example if you want to try it.

Bookmark and Share