CUDA Gets Easier!
June 18, 2010 24 Comments
Several of my readers have had problems creating CUDA projects in Visual Studio, so I thought I’d update how to do it using the current version of CUDA (3.0 at the time of this writing). The main point: it’s a lot easier than the procedure I outlined two years ago.
For hardware, I’m now using a Zotac GeForce GT240 card with 96 stream processors that I purchased last year for $90. For my software development environment, I downloaded and installed the Microsoft SDK for Windows Server 2008 and Microsoft Visual C++ 2008 Express Edition. Then I downloaded and installed the NVIDIA Driver 197.13, the CUDA Toolkit 3.0 and the CUDA SDK 3.0 for 32-bit Windows XP.
Once everything was set up, the first thing I did was to recompile and run the deviceQuery example in
C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\deviceQuery
I just double-clicked the deviceQuery_vc90.sln file and the project popped-up in the Visual Studio IDE. I hit F7 to rebuild the program, and then I pressed Ctrl+F5 to run it. The program ran and reported the presence of a GeForce GT 240 in my PC. So far, so good.
Next, I created an empty Win32 console application called cuda_example3. I renamed cuda_example3.cpp to cuda_example3.cu because that’s where the CUDA kernel source is going. Then I copied the source from my first CUDA program into the file and saved it. Here’s the code so you can see it:
// cuda_example3.cu : Defines the entry point for the console application. // #include "stdafx.h" #include <stdio.h> #include <cuda.h> // Kernel that executes on the CUDA device __global__ void square_array( float *a, int N ) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if ( idx < N ) a[idx] = a[idx] * a[idx]; } // main routine that executes on the host int main( void ) { float *a_h, *a_d; // Pointer to host & device arrays const int N = 10; // Number of elements in arrays size_t size = N * sizeof( float ); a_h = (float *)malloc( size ); // Allocate array on host cudaMalloc( (void **)&a_d, size ); // Allocate array on device // Initialize host array and copy it to CUDA device for ( int i = 0; i < N; i++ ) a_h[i] = (float)i; cudaMemcpy( a_d, a_h, size, cudaMemcpyHostToDevice ); // Do calculation on device: int block_size = 4; int n_blocks = N / block_size + ( N % block_size == 0 ? 0 : 1 ); square_array <<< n_blocks, block_size >>> ( a_d, N ); // Retrieve result from device and store it in host array cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost ); // Print results for ( int i = 0; i < N; i++ ) printf( "%d %f\n", i, a_h[i] ); // Cleanup free( a_h ); cudaFree( a_d ); }
At this point, Visual Studio had no idea how to compile a .cu file. In the past, I crafted a Custom Build Step in the Project Properties page that invoked Nvidia’s nvcc tool with the appropriate compiler options. No more need for that! Instead, I highlighted cuda_example3 in the Solution Explorer pane, and then selected Project→Custom Build Rules… from the menu. Then I clicked on the Find Existing… button in the Custom Build Rule Files window and steered it to this file:
C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\Cuda.rules
Cuda.rules contains all the rules and options needed to merge .cu files into the Visual Studio C++ compilation flow.
The only other changes I needed to make were to indicate the locations of the CUDA libraries in the project properties (I did this for both the Debug and Release configurations):
Configuration Properties → Linker -> General:
Additional Library Directories = C:\CUDA\lib;”C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib”Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib
After doing this, the program compiled and produced the following correct result:
0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000
For those of you who want to try CUDA but don’t have CUDA-enabled GPU card, there is a way to link to a CUDA device emulator. Simply replace cudart.lib with cudartemu.lib in the project properties as follows:
Configuration Properties → Linker → Input:
Additional Dependencies = cudartemu.lib
This supplants the use of the -deviceemu compiler option in earlier versions of CUDA.
Finally, you may want C++ syntax-coloring and Intellisense to work on your .cu source files. To get syntax-coloring, click on the Tools→Options menu. Then in the Options window under Text Editor→File Extension, enter the .cu and .cuh file extensions and select Microsoft Visual C++ as the editor. To enable Intellisense, you’ll have to edit the Windows registry by adding the .cu and .cuh file extensions to the key HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Languages\Language Services\C/C++\NCB Default C/C++ Extensions
. That should do it.
Here’s the source code for this example if you want to try it.