CUDA Gets Easier!

Several of my readers have had problems creating CUDA projects in Visual Studio, so I thought I’d update how to do it using the current version of CUDA (3.0 at the time of this writing).  The main point: it’s a lot easier than the procedure I outlined two years ago.

For hardware, I’m now using a Zotac GeForce GT240 card with 96 stream processors that I purchased last year for $90. For my software development environment, I downloaded and installed the Microsoft SDK for Windows Server 2008 and Microsoft Visual C++ 2008 Express Edition. Then I downloaded and installed the NVIDIA Driver 197.13, the CUDA Toolkit 3.0 and the CUDA SDK 3.0 for 32-bit Windows XP.

Once everything was set up, the first thing I did was to recompile and run the deviceQuery example in

C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\src\deviceQuery

I just double-clicked the deviceQuery_vc90.sln file and the project popped-up in the Visual Studio IDE. I hit F7 to rebuild the program, and then I pressed Ctrl+F5 to run it. The program ran and reported the presence of a GeForce GT 240 in my PC. So far, so good.

Next, I created an empty Win32 console application called cuda_example3. I renamed cuda_example3.cpp to because that’s where the CUDA kernel source is going. Then I copied the source from my first CUDA program into the file and saved it. Here’s the code so you can see it:

// : Defines the entry point for the console application.

#include "stdafx.h"

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
__global__ void square_array( float *a, int N )
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if ( idx < N )
        a[idx] = a[idx] * a[idx];

// main routine that executes on the host
int main( void )
    float *a_h, *a_d; // Pointer to host & device arrays
    const int N = 10; // Number of elements in arrays
    size_t size = N * sizeof( float );
    a_h = (float *)malloc( size );    // Allocate array on host
    cudaMalloc( (void **)&a_d, size ); // Allocate array on device
    // Initialize host array and copy it to CUDA device
    for ( int i = 0; i < N; i++ )
        a_h[i] = (float)i;
    cudaMemcpy( a_d, a_h, size, cudaMemcpyHostToDevice );
    // Do calculation on device:
    int block_size = 4;
    int n_blocks   = N / block_size + ( N % block_size == 0 ? 0 : 1 );
    square_array <<< n_blocks, block_size >>> ( a_d, N );
    // Retrieve result from device and store it in host array
    cudaMemcpy( a_h, a_d, sizeof( float ) * N, cudaMemcpyDeviceToHost );
    // Print results
    for ( int i = 0; i < N; i++ )
        printf( "%d %f\n", i, a_h[i] ); // Cleanup
    free( a_h );
    cudaFree( a_d );

At this point, Visual Studio had no idea how to compile a .cu file. In the past, I crafted a Custom Build Step in the Project Properties page that invoked Nvidia’s nvcc tool with the appropriate compiler options. No more need for that! Instead, I highlighted cuda_example3 in the Solution Explorer pane, and then selected Project→Custom Build Rules… from the menu. Then I clicked on the Find Existing… button in the Custom Build Rule Files window and steered it to this file:

C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\Cuda.rules

Cuda.rules contains all the rules and options needed to merge .cu files into the Visual Studio C++ compilation flow.

The only other changes I needed to make were to indicate the locations of the CUDA libraries in the project properties (I did this for both the Debug and Release configurations):

Configuration Properties → Linker -> General:
Additional Library Directories = C:\CUDA\lib;”C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib”

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib

After doing this, the program compiled and produced the following correct result:

0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000

For those of you who want to try CUDA but don’t have CUDA-enabled GPU card, there is a way to link to a CUDA device emulator. Simply replace cudart.lib with cudartemu.lib in the project properties as follows:

Configuration Properties → Linker → Input:
Additional Dependencies = cudartemu.lib

This supplants the use of the -deviceemu compiler option in earlier versions of CUDA.

Finally, you may want C++ syntax-coloring and Intellisense to work on your .cu source files. To get syntax-coloring, click on the Tools→Options menu. Then in the Options window under Text Editor→File Extension, enter the .cu and .cuh file extensions and select Microsoft Visual C++ as the editor. To enable Intellisense, you’ll have to edit the Windows registry by adding the .cu and .cuh file extensions to the key HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Languages\Language Services\C/C++\NCB Default C/C++ Extensions. That should do it.

Here’s the source code for this example if you want to try it.

Bookmark and Share

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

24 Responses to CUDA Gets Easier!

  1. Marek says:

    Hi, Dave!
    Thank you a lot, I followed your instructions and everything works fine.
    GPU Computing is the part of my final exams. If you have any good articles or some resources abou it, I would be grateful..
    Thanks a lot again

  2. Hi Dave! Your excellent post inspired me to write my own version of how to set up CUDA for Visual C++. Mine is slightly different from your approach but it boils down to the same thing. Anyway, your post got me started well. Thanks for that! Of course I included a link to your post on mine.

    • Thanks! I haven’t been active here because the subject has become peripheral to how I make my living. I still seem to get a lot of hits even though I consider most of my posts outdated. I hope your blog becomes a magnet for CUDA programmers and presents them with more modern info.

  3. Pingback: Setting up CUDA toolkit 4.0 in Visual C++ 2010

  4. amit says:

    I am using VS 2010 ultimate when i build the this program , it shoe the following error.
    Please help me.

    1>—— Rebuild All started: Project: exap, Configuration: Release Win32 ——
    1>Error: The result “” of evaluating the value “$(CudaBuildTasksPath)” of the “AssemblyFile” attribute in element is not valid. C:\Program Files\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations\CUDA 4.0.targets
    ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

  5. fli10 says:

    Thanks. This helps.

  6. CD says:

    Thank you for your post. It helped me get started.
    I am using VS 2008 on Win 7 64 bit machine. I have installed CUDA GPUkit ver 3.2. I have tested the sample applications. nvcc has been correctly installed and I checked the bandwidth test sample and that also works correctly.
    Now when I start a new project I have done the following
    1. Added CUDA rules as Custom build rules.
    2. My runtime library is in multithreaded debug mode.
    3. Under linker options I have added additional dependencies cudart.lib
    4. Under additional library directories I have added the following 2
    C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK 3.2\C\common\lib
    C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\lib\x64

    For my first program I am using your code in an array. My program compiles fine but throws a linker error with inadequate information.

    This is what I get
    1>LIBCMTD.lib(crt0.obj) : error LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup
    1>C:\Users\CD\Documents\Visual Studio 2008\Projects\cuda_example3\Debug\cuda_example3.exe : fatal error LNK1120: 1 unresolved externals

    Any help will be appreciated.

  7. Hemanth says:

    I’m able to compile the program, but unable to execute it . I got the following link error. Please help me.

    1>—— Build started: Project: cuda_example3, Configuration: Debug Win32 ——
    1>MSVCRTD.lib(crtexe.obj) : error LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup
    1>U:\Visual Studio 2008\Projects\cuda_example3\Debug\cuda_example3.exe : fatal error LNK1120: 1 unresolved externals
    1>Build log was saved at “file://u:\Visual Studio 2008\Projects\cuda_example3\cuda_example3\Debug\BuildLog.htm”
    1>cuda_example3 – 2 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

  8. Hemanth says:

    I got a fatal error when I try to debug the project, but I was able to compile the .cu file successfully. Please help me.

    LINK : fatal error LNK1104: cannot open file ‘C:\Program.obj’

    • Hemanth says:

      This particular issue is caused by specifying a dependency to a lib file that had spaces in its path. The path needs to be surrounded by quotes for the project to compile correctly.

      On the Configuration Properties -> Linker -> Input tab of the project’s properties, there is an Additional Dependencies property. This issue was fixed by changing this property from:



      ” C:\CUDA\lib;”

  9. mariola says:

    thank you very much for your example. and for your orientation on how to integrate cuda with visual studio 2008. your example was the first example that i compiled and run successfully after following your steps. and that helped me a lot for figuring out how cuda works. thanks thanks thanks

  10. Gautham says:

    Your post was very helpful! But even when I include the .rules file, I end up with a link error stating “utils32d file not found” I seriously dont know what to do. I have been searching a lot of places regarding this information. I am presently using
    Win7 64-bit
    GeForce 420M
    Visual Studio 2008

    Being a beginner, I am finding it really tough to even run a simple program. Can u explain how to sort out this issue..

    Even when I use the command prompt, I’m not ending up with desired result. Please help!

  11. Paul says:

    Nvidia have dropped the cudartemu.lib file from recent releases of the CUDA Toolkit. Anyone who doesn’t have a CUDA GPU on their system will need to get a hold of the older CUDA Toolkit 3.0
    The emu.lib trick is great, it allows me to show simple CUDA samples on my laptop, short of that I was going to have to purchase a new one! Thanks a heap llparanorma.

  12. Raju says:

    Thank you for making easy for begginers like me and Nice explanation..
    After exicution this program in
    OS; Windows 2008 server R2
    Cuda: Sdk3.1
    Device telsa s1070.
    i am getting these errors can u plz sort it out.
    Error 1 error LNK2019: unresolved external symbol _main referenced in function ___tmainCRTStartup MSVCRTD.lib cuda_example3
    Error 2 fatal error LNK1120: 1 unresolved externals C:\Users\Administrator\Documents\Visual Studio 2008\Projects\cuda_example3\Debug\cuda_example3.exe cuda_example3

  13. Daniel says:

    Thank you very much. This helped a lot.

  14. John says:

    Hello there llpanorama.
    This post was very useful, thank you very much!

    I didn’t even knew NVIDIA shipped the build rules with the skd!

    One issue though.
    While haveing my source codes renamed to the .cu extension, everything runs well, but then I get no intellisense.

    I switched the kernel on your example to a “” file, and renamed do .cpp.

    But then there’s no recognition to cudaMalloc and the other functions. How would one integrate a CUDA code into a regular Visual Studio project?

    • llpanorama says:

      You’re right – Intellisense doesn’t seem to work with files having the .cu extension even though the C++ syntax-coloring does work. I did a web search and found out how to fix that:

      In the registry, add .cu and .cuh extensions to the NCB Default C/C++ Extensions key under
      HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\Languages\Language Services\C/C++ path.

      I don’t know how you could call the kernel from a .cpp file since the kernel call uses the <<>> construct to pass configuration parameters to the kernel. The .cpp file would bypass the CUDA compiler (nvcc) and the <<>> wouldn’t get translated into anything the standard C compiler could handle.

      • John says:

        Yes, the register entry worked very well, better than thought, actually. It seems like the parser doesn’t even care about the <<>> construct.

        About the use of .cpp: I have all my cuda calls inside a .cu file.
        I made call, from a .cpp file, the __host__ function (which is in a .cu file) that launches the kernels.
        It didn’t compile, but I’ll get the hang of it and eventually it will get there.

        The nBody example shipped with the sdk has only one .cu file, very well integrated into the project. I understand the rest of the code is actually compiled by cl (VC compiler), which is important.

        I’ll let you know of progress, if you like AND if I get there.

        Thanks for the help.

  15. Maurice says:

    Thank you for this great post!
    This is my first working with C++ and cuda. I’m trying to improve the speed of my data fitting procedure by implementing it on my gpu. I got everything up and running but when I try to compile this example MSVS says it can’t find the necessary header files like stdafx.h. A search tells me I don’t seem to have them on my computer.
    Anyone know where I can get these header files? Are they supposed to be standard or do I have to download them?

    • llpanorama says:

      When you make the Win32 console application, Visual Studio should create an stdafx.h file inside the project directory where your .cpp source file is stored. What version of Visual Studio are you using and how did you setup the project?

      • Maurice says:

        Thank you for your fast reaction. I realise now MSVS (2008 express edition) should create stafx.h, I just made an empty project and put a .cu file with your code in it. No wonder it didn’t work.
        After I did it the right way your program still did not work. It asked for library files and everytime I included the directory where it was in compiling a new library file was missing.

        I have solved the problem, though in a strange way. I downloaded the cuda programming template from NVIDIA removed the code in it and put your code in it. Now it is working like a charm.

        Now comes the hard part, rewriting my fitting procedure such that parts can be calculated on the gpu. I need more understanding of cuda for that.
        If you have any tips, or recommendations for literature on this please let me know.
        Thank in advance!

  16. chang says:

    I followed your instruction using MSVS 2010. I got error as below.
    fatal error LNK1181: cannot open input file ‘.\Debug\’
    Download your project zip file, open it with MSVS2010, got same error.

  17. Pingback: My first CUDA program! « /// Parallel Panorama ///

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: