My first CUDA program!

Note: Check out “CUDA Gets Easier” for a simpler way to create CUDA projects in Visual Studio.

I got CUDA setup and running with Visual C++ 2005 Express Edition in my previous post. Now I’ll write my first CUDA program. It’s a modification of an example program from a great series of articles on CUDA by Rob Farber published in Dr. Dobbs Journal. Rob does his examples in a make-based build environment; I’ll show how to build a CUDA program in the Visual C++ IDE.

Simple CUDA programs have a basic flow:

  1. The host initializes an array with data.
  2. The array is copied from the host to the memory on the CUDA device.
  3. The CUDA device operates on the data in the array.
  4. The array is copied back to the host.

My first CUDA program, shown below, follows this flow. It takes an array and squares each element. I can barely contain my excitement.

// example1.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
__global__ void square_array(float *a, int N)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if (idx<N) a[idx] = a[idx] * a[idx];
}

// main routine that executes on the host
int main(void)
{
  float *a_h, *a_d;  // Pointer to host & device arrays
  const int N = 10;  // Number of elements in arrays
  size_t size = N * sizeof(float);
  a_h = (float *)malloc(size);        // Allocate array on host
  cudaMalloc((void **) &a_d, size);   // Allocate array on device
  // Initialize host array and copy it to CUDA device
  for (int i=0; i<N; i++) a_h[i] = (float)i;
  cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
  // Do calculation on device:
  int block_size = 4;
  int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
  square_array <<< n_blocks, block_size >>> (a_d, N);
  // Retrieve result from device and store it in host array
  cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
  // Print results
  for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
  // Cleanup
  free(a_h); cudaFree(a_d);
}

Two pointers are declared on line 19 of the main routine: a_h points to the array that is stored on the host, while a_d points to the array on the CUDA device. The a_h array is allocated in the host memory on line 22 using the standard malloc subroutine, but a_d is allocated in the CUDA device memory using the cudaMalloc subroutine found in the CUDA API (line 23). (Note that a pointer to the a_d pointer is passed to cudaMalloc so it can store the address of the array in a_d.)

In order to create some values to operate upon, each element in the host array is initialized with its array index (line 25). Then the cudaMemcpy subroutine is used to copy a_h from the host into a_d on the CUDA device. (The cudaMemcpyHostToDevice flag, defined in the API, indicates the direction of the transfer.)

In lines 28-30, the host initiates the execution of the kernel function, square_array, on the CUDA device. A CUDA device contains individual processing elements, each of which can execute a thread. A number of the processing elements are grouped together to form a block, and a number of blocks constitutes a grid. In this example, the number of threads per block is set to four (line 28). Then the total number of blocks that are needed to get enough threads to square each array element is calculated on line 29. (For ten array elements, three blocks each with four threads are needed.) On line 30, the host initiates the kernel function on the CUDA device. The number of blocks and the number of threads in each block are indicated between the <<<…>>> following the kernel name. (This information is picked up by the Nvidia compiler, nvcc, and is used when generating the instructions that start the kernel on the CUDA device. More on nvcc, later.) Following that, the standard argument list to square_array contains a pointer to the array in the CUDA device memory and the number of elements in the array.

The kernel is shown on lines 10-14. The __global__ keyword indicates that this is a kernel function that should be processed by nvcc to create machine code that executes on the CUDA device, not the host. In this example, each thread will execute the same kernel function and will operate upon only a single array element. Each thread is distinguished from all the others by block and thread indices that can be used to determine the array element the thread will access. On line 12, the array index is found by multiplying the thread’s block index (blockIdx.x) by the number of threads in each block (blockDim.x) and then adding the index of the thread within the block (threadIdx.x). If the index is within the bounds of the array, then the corresponding array element is squared (line 13).

Immediately after starting the kernel, the host begins a transfer of the data from the array in the CUDA device memory back to the array in the host memory (line 32). This transfer is delayed until the CUDA device has finished executing the kernel, so there is no chance of getting data that has not been processed yet. Then the host displays the contents of the array (line 34) and frees the array memory on both itself and the CUDA device (line 36).

At this point, I have a CUDA-enabled program, but I don’t have it integrated into a Visual C++ project. It actually takes a bit of work to do that. To start, I brought up the Visual C++ 2005 Express Edition IDE and clicked on the New Project button (you can also use File→New→Project… from the menu). In the New Project window, I selected Win32 as the project type and Win32 Console Application as the template. I gave the project the creative name of example1 and set its location to the C:\llpanorama\CUDA\examples directory. After clicking OK in the New Project window, and then clicking Finish in the Win32 Application Wizard window, a window opened with a simple code skeleton. I replaced the code skeleton with the code shown above.

After saving the code, I right-clicked the example1.cpp file, selected Rename from the drop-down menu and renamed the file to example1.cu. Files with the .cu extension are intended to be processed by nvcc. nvcc will extract the kernel portion of example1.cu and compile it for execution on the CUDA device while using the Visual C++ compiler to compile the remainder of the file for execution on the host.

In its default configuration, Visual C++ doesn’t know how to compile .cu file. It has to be told explicitly how to do this using a Custom Build Step. This is done by right-clicking on the example1.cu file and selecting Properties from the drop-down menu. In the Property Pages window that appears, set the Custom Build Step command line as follows:

Configuration Properties → Custom Build Step → General:
Command Line =
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu

What does this command line do? Let’s break it down piece-by-piece:

“$(CUDA_BIN_PATH)\nvcc.exe”: The location of the nvcc compiler.

-ccbin “$(VCInstallDir)bin”: The location of the Visual C++ compiler.

-c: The compilation will proceed all the way to the generation of an object file (.obj extension).

-D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS: Various macro definitions.

-Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd: Various options that are passed by nvcc directly to the Visual C++ compiler.

-I”$(CUDA_INC_PATH)”: Look in the CUDA include directories for needed header files.

-I./: Look in the current directory for needed header files.

-o $(ConfigurationName)\example1.obj: The location and name of the resulting object file.

example1.cu: The source file that the compiler will work on.

In addition to setting the command line for the example1.cu file, the location of the output file is specified as follows:

Configuration Properties → Custom Build Step → General:
Outputs = $(ConfigurationName)\example1.obj

After setting the file properties, the properties for the example1 project have to be modified. Here are the project property settings I used for the Debug configuration:

Configuration Properties → C/C++ → General:
Additional Include Directories = $(CUDA_INC_PATH);”C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc”

Configuration Properties → C/C++ → General:
Debug Information Format = Program Database (/Zi)

Configuration Properties → C/C++ → Code Generation:
Runtime Library = Multi-threaded Debug (/MTd)

Configuration Properties → Linker → General:
Enable incremental linking = No (/INCREMENTAL:NO)

Configuration Properties → Linker -> General:
Additional Library Directories = “C:\CUDA\lib”;”C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib”

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib cutil32D.lib

Configuration Properties → Linker → Optimization:
Enable COMDAT folding = Do Not Remove Redundant COMDATs (/OPT:NOICF)

Now the project can be compiled and run. Here’s the result:

0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000

I told you it was exciting! Well, at least it’s right.

In order to compile the Release configuration, a few changes need to be made to the file and project properties. For the example1.cu file,the Custom Build Step command line has to be changed to remove the _DEBUG macro definition, enable compiler optimization, and link with the Release runtime library:

Configuration Properties → Custom Build Step → General:
Command Line =
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/O2,/Zi,/MT -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu

The project properties that have to be changed in the Release configuration are the linking for the runtime library and the use of the non-debug version of the CUDA utilities library:

Configuration Properties → C/C++ → Code Generation:
Runtime Library = Multi-threaded (/MT)

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib cutil32.lib

Once those changes are made, the Release version of the example1 project can be compiled and run. It will output the same exciting result.

Here’s the source code for this example if you want to try it.

Don’t have a CUDA-capable GPU board on your PC but still want to try running this program? Easy! Just add the following option to the Custom Build Step command line: -deviceemu. This will link-in a CUDA device emulator that runs on the host. The emulator becomes the target for all the CUDA API calls and executes the kernel. The program will run just like a CUDA device is there, except slower.  (Here is the project file with the -deviceemu option.)

So I’ve written my first CUDA program and gotten it to compile using Visual C++ 2005 Express Edition. Setting up the compilation options was as much (more?) work as writing the program, so you might be interested in a CUDA template for Visual C++ 2005 written by kyzhao. The installer doesn’t work for me (maybe because I’m using the free Express Edition), but it might help you.

About these ads

About dave_vandenbout
President of XESS Corp, a manufacturer of FPGA development boards.

163 Responses to My first CUDA program!

  1. obviously like your website but you have to test
    the spelling on several of your posts. Many of them are rife with spelling problems and I find it very troublesome to tell the reality nevertheless
    I will certainly come again again.

  2. Pingback: Ejemplos de programación de GPGPU en CUDA | Todo lo que buscabas en Programación

  3. regions says:

    garcinia Cambogia Reviews – GCE likewise assists to boost
    the serotonin material in the brain. It decreases manufacturing of terrible cholesterol
    and enhances the production of superb cholesterol levels.

  4. Nell says:

    Generally I do not read post on blogs, however I wish to say that this write-up very forced me to try and
    do it! Your writing taste has been amazed me. Thank you,
    very great article.

  5. I’m gone to inform my little brother, that he should also visit this blog on regular basis to get updated from latest information.

  6. Hakan says:

    Thank you for article. It is written very clear

  7. menoannuams says:

    replica watches
    reputable replica watch sites

  8. Pingback: CUDA+Visual Studio-NVidea NSight | Мои IT-заметки

  9. tassameh says:

    HI every one:
    i am new in CUDA programming i need the help for my graduated application
    thanks

  10. Pingback: _ITERATOR_DEBUG_LEVEL of Cuda (NVCC) compiler? | PHP Developer Resource

  11. csb says:

    i m having cuda toolkit 2.2 ,windows 7, visual studio 8
    so what changes do i have to do?

  12. Vijay says:

    Hi. Many many thanks for the article. I have been experimenting with CUDA on VS for while now. I have had to reformat my system many times in the past 2 years and everytime I have to set up VStudio and CUDA, I have this page saved in my bookmarks. I just wanted to add a little more to this info. Recently Microsoft launched VStudio 2011 beta. My project involves using managed windows and cuda. While Vstudio 2010 does not have intellisense for managed systems, Vstudio 2011 beta is not supported by Nvidia yet. So what I did was to set up my project using VS 2010 and then copied the contents of ‘C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations’ into ‘C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\BuildCustomizations’. Now I can open my VS 2010 projects in VS 2011 and enjoy the comfort of intellisense. I have not invented any new kinda rocket here but thought will help somebody on the way.

  13. NL says:

    Hi there,

    thanks for the posting. It really helps. But I thought since
    I was new to cuda and visual studio and I manage to run the above program using
    Visual studio prof 2010 ( I think express will work as well)
    Cuda 4.1 without the Nsight parallel, it may helps the other users of CUDA and VS 2010.

    once you have created the example1.cpp as described at the beginning of this blog.
    Right click example1 (the project on your left under Solution explorer)
    Build customization and check Cuda 4.1

    Then right click example1.cpp (C++ file now) ->rename example1.cu
    right click again example1.cu properties->general ->

    item properties change to Cuda C/C++, click on apply.

    Expand Cuda C\C++ (on the left under general) , click common ,
    in additional directories put
    ./
    ../../common/inc
    ../../../shared/inc

    Now go back to example1 project
    right click the project expand configuration properties, in custom build step-change outputs to
    $(Configuration Name)\example1.obj
    right click the project expand configuration properties, in VC++ directories-add to include directories
    the pathe where the inc are , for me it is

    $(Configuration Name)\example1.obj

  14. M says:

    Hi,

    I am trying to compile my program via nvcc by command line.I tried different ways to use cygwin or dos,I only managed to create the .obj file.It doesn’t create the .exe.I don’t know how to link them.I am doing this by using visual studio as well.

    Could you please help me .

  15. Sawela says:

    Бесплатная Юридическая Консультация http://jurspravki.ru/ – Правовая Инициатива.

  16. Thanks for the article. Very helpful and clear.

  17. vinayaka says:

    i want to use juda in emulation how can i do that, can any one help me

    thank you

  18. N says:

    I am working on non GPU based machine

    with VC++ 2005 Express edition
    CUDA 1.1 toolkit
    CUDA 1.1 SDK

    It compile successfully, but gives following o/p when I run the solution.
    Please help

    ‘deviceQuery.exe': Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\deviceQuery.exe’, Symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\Windows\System32\ntdll.dll’, No symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\Windows\System32\kernel32.dll’, No symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\Program Files\AVAST Software\Avast\snxhk.dll’, No symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
    ‘deviceQuery.exe': Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\cutil32D.dll’, No symbols loaded.
    First-chance exception at 0x7760fc56 in deviceQuery.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fcf0..
    First-chance exception at 0x7760fc56 in deviceQuery.exe: Microsoft C++ exception: cudaError at memory location 0x0012fd40..
    The program ‘[5372] deviceQuery.exe: Native’ has exited with code 1 (0x1).

    • N says:

      I am not seeing CUDA C/C++ itemtype to select in cudafile property pages

      I have either C/C++ or I have CUDA as item type.

      I have included nvcc.rules in custom build rules.

      Please help !!!

      Thanks,
      N

  19. Nick says:

    It’s working perfectly. Thank you

  20. gjhgjhg says:

    nice one! those settings completely ruined every other c++ project! thanks

  21. Can any one say me how to start from python orange library in CUDA ?

  22. Pingback: Informatikdienstleistungen.de » Archive » Tutorial: Parallele Berechnungen mit CUDA, ein Beispiel

  23. Sonar says:

    СОНАР – запчасти к технике.http://www.sonarr.ru/

  24. Guillermo says:

    I got:
    0 0.000000
    1 1.000000
    2 2.000000
    3 3.000000
    4 16.000000
    5 5.000000
    6 6.000000
    7 7.000000
    8 64.000000
    9 9.000000
    but i don´t understand.. any idea please? (i’m beginner, sorry)

    • Guillermo says:

      Sorry, I was wrong to write the array index calculation in the kernel f’n

      • Guillermo says:

        We have 3 blocks of 4 threads each. Let B(x,y) a block, then we have B(0,0), B(1,0), B(2,0) and if let Ti(x,y) a thread of Bi then for B(0,0) we have T(0,0), T(1,0), T(2,0), T(3,0) .. and so on. Note that we have 12 threads where we can calculate the square of up to 12 array’s elements. For that reason we must apply the (idx <N) condition in the kernel function. e.g. the 5th array element is one of idx = 1 (second block) * 4 (number of threads per block) + 1 (second thread of the block).

  25. Moise Munderere says:

    I am biginner on Visual studio , can ony one tell why I can not run a program , I always see that message ” fatal error LNK1104: cannot open file ‘.\Debug\example1.obj”

  26. Pingback: CUDA e o Visual Studio 2008 « Diário de Pesquisas – Pesquisas em Computação

  27. Max says:

    Very comprehensive article. Thank you very much

  28. Kay says:

    I am just starting with CUDA. I have a GeForce 6150SE nForce 430 on my desktop. All the samples in the toolkit say min req is GeForce 8. Although, everything seems to be installed well for 64 bit.

    Here is my problem:
    I opened bandwidthTest_vc90.sln in Visual c++2008. It needs cutil32D.lib and shrUtils32D.lib.
    I do not have these files anywhere on my computer. I do have cutil32.lib, cUtil64.lib and cutil64D.lib.
    Is there anything that I should change on the properties in VC project, that would help me compile?

  29. Жаль, что сейчас не могу высказаться – опаздываю на встречу. Но освобожусь – обязательно напишу что я думаю.

  30. I had started programming in CUDA but dont know how to decompose RANDOM FOREST algorithm in CUDA so please help me .

  31. Looshu says:

    Can you please post the CUDA code for dijkstra’s algorithm..!
    Pleashe saar.

  32. Rashmi says:

    can anyone tell me the procedure to run cuda programs on the emulator mode??

    • llpanorama says:

      The -deviceemu option isn’t used any more. Now you have to link against a CUDA library that implements the emulator. This is described in the “CUDA Gets Easier” entry of this blog (although I’m not sure this really is easier).

  33. Pingback: CUDA troubles

  34. trap15 says:

    First off, I have to say this was fairly helpful, and helped me get started.
    Secondly, you write some fuck-ugly code :)

  35. sumit says:

    somebody plz tel me, how do we calculate optimization vs cpu? r there any cuda instructions for it?

  36. Avers says:

    У меня такой вопрос к вам сразу Кто-ниюужь использовал это в своей жизни? результаты есть? а то ж неизвестно чем все сие кончится, хотелось бы из рук практика услышать

  37. raju says:

    Nice post.
    i am getting the error
    LINK : fatal error LNK1181: cannot open input file ‘cutil32D.lib’
    then i just deleted that file from the linker and now i am getting error
    LINK : fatal error LNK1181: cannot open input file .obj

    can any plz solve my problem
    i have seen all the comments many programmers got the problem like this it may help many oh them plz sort it out.
    Thak you.

    • quest says:

      Did you installed CUDA toolkit & CUDA SDK properly? CUDA compatible display driver?

    • sumit says:

      1>the path for .lib file has to be given. and it should be short. so save both .lib files mentioned above on c drive. and then paste the address in additional dependencies col.
      2> .obj file is formed after successful build of code. so look upon the project properties steps once again.

  38. Sree says:

    HI,

    I am trying to invoke the asyncAPI from C# in VS2008. My program asyncAPI.cu is compiling without issues in VS2008 C++ and I have exported the dll.

    But when I invoke the function in C# exe, it is throwing an exception that it is not finding the entrypoint.

    Any example code of invoking a CUDA source from C# would be very helpful.

  39. Anwarul says:

    i m very beginning to CUDA programming : i have CUDA toolkit 3.1 for 64 bit and windows 7 64-bit . and i m trying run a CUDA program on visual studio 2008. and i m getting following error :

    1>—— Rebuild All started: Project: newCuda, Configuration: Debug Win32 ——
    1>Deleting intermediate and output files for project ‘newCuda’, configuration ‘Debug|Win32′
    1>Compiling with CUDA Build Rule…
    1>”C:\CUDA\bin64\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT ” -maxrregcount=32 –compile -o “Debug\cuda.cu.obj” cuda.cu
    1>nvcc Hâ∞(Ç=y░: Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..’
    1>Linking…
    1>LINK : fatal error LNK1104: cannot open file ‘.\Debug\cuda.cu.obj’
    1>Build log was saved at “file://c:\Users\Anwarul\Desktop\newCuda\newCuda\Debug\BuildLog.htm”
    1>newCuda – 1 error(s), 0 warning(s)
    ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

    anybody please help , i would really appreciate you help . thanks in advance

    • llpanorama says:

      Have you tried any of the examples packaged with your CUDA distribution?

      • Mary says:

        Yes
        but this example produce following error:
        1>—— Rebuild All started: Project: MyFirstCUDA, Configuration: Debug Win32 ——
        1>Deleting intermediate and output files for project ‘MyFirstCUDA’, configuration ‘Debug|Win32′
        1>Compiling with CUDA Build Rule…
        1>”C:\CUDA\bin64\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT ” -maxrregcount=32 –compile -o “Debug\example1.cu.obj” “d:\Project\GPU\example1.cu”
        1>nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..’
        1>Compiling manifest to resources…
        1>Microsoft (R) Windows (R) Resource Compiler Version 6.1.6723.1
        1>Copyright (C) Microsoft Corporation. All rights reserved.
        1>Linking…
        1>LINK : fatal error LNK1104: cannot open file ‘.\Debug\example1.cu.obj’
        1>Build log was saved at “file://c:\Users\Rosa\Documents\Visual Studio 2008\Projects\MyFirstCUDA\MyFirstCUDA\Debug\BuildLog.htm”
        1>MyFirstCUDA – 1 error(s), 0 warning(s)
        ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

    • Mike says:

      This was the error that was plaguing me for a long time, but I finally fixed it. I think this needs to be done when you’re on a 64 bit OS but compiling in Win32 because Express limits you to that. Anyway, I changed a build rule option as follows…

      Right click on your .cu file(s). Select properties. Under “CUDA Build Rule v3.xxx”, highlight “General”. Toward the bottom there should be a field called “Extra Options”. In this field, type “-m32″ (without the quotation marks). I believe this tells nvcc to use the 32 bit compiler, but I’m not really sure. All I know is it eliminated that error for me.

      Note that after I fixed this, I had to go through and make sure all files were compiled using the same code generation run time library (such as MTd) so everything linked properly. A good thing to do is to open a sample project (such as SobelFilter_vc90 in 2008 or SobelFilter in 2005) in one window, open up your project in a second window, and match up all the properties.

      Hope this helps somebody…. it took me about 24 frustrating hours to compile my own CUDA program!

  40. Pingback: CUDA Gets Easier! « /// Parallel Panorama ///

  41. Pingback: Updating to CUDA 2.3 « /// Parallel Panorama ///

  42. Ankit says:

    Thanks for the gr8 article…It really saved me lot of time. I am stuck in a problem , if I copy cuda kernel in .cpp file it says that the commands are undefined. If I rename the file as .cu then it says following error

    1>NVCC Compiling…
    1>cuda.cu
    1>c1xx : fatal error C1083: Cannot open source file: ‘cuda.cu': No such file or directory
    1>Project : error PRJ0019: A tool returned an error code from “NVCC Compiling…”
    1>Build log was saved at “file://c:\Users\ankit\Documents\Visual Studio 2008\Projects\ankit\ankit\Debug\BuildLog.htm”
    1>ankit – 2 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

    I m using Visual C++ 2008 express edition.
    Also when I click on Properties>Configuration and enable emulatiion mode and then when I compile I get the following statement “>NOTE: device emulation mode is deprecated in this release
    and will be removed in a future release.”

    I m really stuck with this. Pls Help!! thanks

  43. muni says:

    hi,
    i am a beginner of cuda programming.actually i had GeForce 6150SE nForce 430 on my desktop but it is not cuda enabled.then how can i go ahead without cuda enabled gpu.

  44. Kelson says:

    Hi Sir,

    I’m very new to cuda.

    Thank you for you helpful tutorial.
    I downloaded the files indicated (not the same version) and try to execute the programs.

    The build was ok but when I start the debug (emuDebug, debug, release) I got an exception from VC++ as following:
    First-chance exception at 0x7c812afb in matrixMul.exe: Microsoft C++ exception: cudaError at memory location 0x0012fe20..
    Microsoft C++ exception: cudaError at memory location 0x0012fc24..
    Microsoft C++ exception: cudaError at memory location 0x0012fc80..

    I try to investigate for the solution but could not solve it.
    Your help will be welcome.
    Thank you.

    • MengYu says:

      sir maybe you have use too much of the shared memory which exceed the limit

      low the tile_width and try

  45. Abhishek Dey says:

    Whether programs compiled with -deviceemu mode will utilize CUDA device fully if run on?

    • llpanorama says:

      I doubt it. Emulation mode can be used to debug your CUDA program on the main CPU and that would be thwarted if the GPU were automatically enabled whenever it is present.

    • cirus says:

      CUDA is not used properly when -deviceemu is on. In this mode, multiple threads are simulated by a single CPU thread and hence it makes process very slow.

  46. kavita sharma says:

    if i have to add 1000 numbers then
    how many threads,blocks,warps and no of threads [er block will be their?

  47. Rohit says:

    Sir can u help me to providing cuda programee to find out the workload characterization of cuda application.

  48. Krixt says:

    Thanks for the complete information. It’s give me a lot of help.

  49. sappy says:

    Hey,
    Thank you so much for a nicely explained tutorial. The thing is I hate using VStudios whatnot. So, I was wondering if it is possible to use a MinGW (on Codeblocks or NetBeans or even DevCPP) to compile and run CUDA programs with nvcc. If so, any idea on how I can configure them ?

    Thanks in advance

    PS : Sorry for the repeated post, I accidentally posted in the wrong page.

  50. gkarlis says:

    For people with unresolved externals like:
    error LNK2019: unresolved external symbol cudaFree referenced in function main

    If you are working on an x64 compilation make sure that instead of “C:\Cuda\lib” you have added “C:\Cuda\lib64″ in the Additional Library Dependencies.

  51. Pingback: CUDA dan VS 2008 « notes

  52. PCruz86 says:

    Very helpful article! This cleared all my doubts about starting to use CUDA in VC. I always wanted to use it in my apps. Thanks a lot!

  53. Balaji says:

    Building CUDA programs has been a pain for me using Visual Studio IDE. Your tutorial very clearly explains the steps. Great article ! I will recommend this for anybody starting to use CUDA.

    Thanks.

  54. Jahanzeb says:

    Thank you so much for such an outstanding article….
    It really helped me out to set up and configure my CUDA on VC 2005.

    Thanks a lot once again…

  55. Reply says:

    “C:\CUDA\bin\nvcc.exe” ccbin “$ (VCInstallDir) bin”-c-D_DEBUG-DWIN32-D_CONSOLE-D_MBCS-Xcompiler/EHsc,/W3,/nologo,/Wp64,/Od,/ Zi,/MTD-I “C:\CUDA\include\”-I./-o$ (ConfigurationName)\Example_1_CUDA.obj Example_1_CUDA.cu

    Corect the probléme plz

  56. OMG loved reading your blogpost. I added your feed to my blogreader.

  57. Pingback: Is anyone using CUDA? - Overclock.net - Overclocking.net

  58. theaky says:

    thanks for this article and the comments below. It’s very helpful for me to build my first cuda program. I use X64 machine with window7 and VS2008 compiler.

    I have summarized some solutions when I built my first program with x64 machine.

    To built with x64, plz make sure that:

    properties -> linker -> general -> Additional Library Directories:
    C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib;C:\CUDA\lib64

    properties -> linker -> input -> Additional Dependencies: cudart.lib cutil64D.lib

    properties -> linker -> Advanced -> Target Machine: MachineX64 (/MACHINE:X64)

    and make a new configuration manager as llpanorama mentioned: “* open the solution explorer, select solution, right click->Configuration Manager.
    * go to ‘Active Solution Platform’, click New.
    * in the ‘New Solution Platform’ dialog that comes up select the new platform x64. Set ‘Copy Settings From’ to ‘Win32′
    * click OK.”

    • Jay says:

      Very helpful!

    • firuze says:

      hi, i’m new in cuda. i have windows 7 64bit. visual studio 2008 professional. i did your instructions. but it can’t run. an error and warning occured.

      LINK : fatal error LNK1181: cannot open input file ‘cutil64D.lib’

      would you please help me?

  59. Pascal says:

    Thanks for this page!!!! Never found such clear explanation, even from NVidia pages or guides.

  60. Soeren says:

    Great article!

    Are you working on other Cuda / OpenCL projects ? I am trying to compile an openFoam solver to OpenCL. If there is more interest, how about a blog/forum for sharing experiences ?

  61. Whitchurch says:

    To solve the liner error 1181: this is whati did.

    After following all the steps in this tutorial.

    If you get linker error do the following:

    go to
    Tools ->options

    in the options window
    select
    Project and Solutions
    choose
    VC++ Directories
    click on new line icon
    manually locate and add cudart.lib and cutil32.lib.
    Once those paths are included
    The program builds and runs just fine

  62. Ursil says:

    how can i calculate the Lennard jones Forces LJ with CUDA

  63. maxphil says:

    I just compiled it sucessfully using SDK 2.3 on Ubuntu Linux 9.04 (jaunty) with emulation:

    nvcc -deviceemu example_1.cpp

  64. Jia says:

    Thanks for such a great tutorial. very detailed. The comments left by everybody helped me solved a few problems as well. Thanks.

  65. Abhishek Dey says:

    Many many thanx for this article….it runs smoothly. and its the only tutorial on the internet with such details explanation line by line. Sitting back tight for more such tutorials by you.

    Thanx again.

    Btw alex…Jus check whether u are using a compatible NVIDIA driver with ur CUDA Version. I’m using 2.3 with 190.16 Beta Drivers. 185.x drivers didnt work for me….

  66. alex says:

    Using it in Ubuntu. I am compiling with nvcc -o out vekadd.cu und running it with ./out and the result is as followed (no squares):
    0 0.000000
    1 1.000000
    2 2.000000
    3 3.000000
    4 4.000000
    5 5.000000
    6 6.000000
    7 7.000000
    8 8.000000
    9 9.000000

    What I’m doing wrong?

    Greets

    • llpanorama says:

      Alex, this is exactly the result I get if I disable my GPU card. Essentially, the a_h array gets initialized with 0..9 but never gets the squared results because the GPU is not running.

      I don’t know how to enable/disable your card under linux. You might try running the deviceQuery example program to see if it picks up your GPU card. (See my previous blog entry which does this.)

  67. tom says:

    Are you sure your example really runs on the GPU instead of on the CPU?

    I think you happy to soon.

    Just increase N and/or run it repeatedly, then you see that th GPU stays, cool, while the CPU reports load.

    T

    • llpanorama says:

      Yes, I believe this program runs on the GPU and not the CPU. I can disable my NVIDIA 8600 card and the program computes incorrect results when I do so. The correct results are output once the GPU is re-enabled.

      I also compiled the program for CPU-only operation using the -deviceemu option and it computes correct answers regardless of whether the GPU is enabled or disabled.

      This program makes very little use of the GPU, even with large N or repeated use (just a single multiplication for each array element). Most of the work involves moving the data from the PC to the GPU card and back under the direction of the CPU. Therefore, it is not surprising that the GPU stays cool and the CPU shows a large load.

      I could be wrong, but I would need to see more compelling evidence than you have provided that the program is not actually running in the GPU.

    • Vincent says:

      If you increase N, then you increase the number of lines displayed in the command window using printf. That’s where the CPU load happens.

      Put N = 10000 (for example) and comment the line with “printf”, you won’t see CPU load.

  68. Chetan Khaladkar says:

    Hi,
    thank you for providing me your gmail ID. Last day CUDA was working fine on VC++ 2005 but it had developed some problem.
    when I compile the file it gives this message.

    1>—— Build started: Project: example1, Configuration: Debug Win32 ——
    1>Performing Custom Build Step
    1>nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
    1>Build log was saved at “file://c:\Users\chetan\Desktop\example1_emu1\example1_emu\example1\Debug\BuildLog.htm”
    1>example1 – 0 error(s), 0 warning(s)
    ========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

    and when i build the project it gives .

    >—— Build started: Project: example1, Configuration: Debug Win32 ——
    1>Performing Custom Build Step
    1>nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
    1>Linking…
    1>LINK : fatal error LNK1181: cannot open input file ‘.\Debug\example1.obj’
    1>Build log was saved at “file://c:\Users\chetan\Desktop\example1_emu1\example1_emu\example1\Debug\BuildLog.htm”
    1>example1 – 1 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

    I think there is some problem in Custom Build step .I tried putting quotes (“) for every path i specify but still it is not working. can you pelase help me.

    with regards
    chetan

    • llpanorama says:

      I don’t know. Maybe it’s the version of CUDA you are using. (I used 1.1 for my example.)

    • Ahmad says:

      i use the following custom bulid command string:

      “$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -deviceemu -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I “$(CUDA_INC_PATH)” -o $(ConfigurationName)\example1.obj example1.cu

      then remove the #include “stdafx.h” from example.cu source file.

      these steps remove the above error in mine case.

  69. Chetan Khaladkar says:

    Hi,
    amazing tutorial! I really appreciate your effort. Please do continue this.

    I followed your steps and I took your program for emulator. But when i build it give “cannot read input file: cutil32D.lib” I have checked i have both the cutil32.lib as well as cutil 32D.lib. and I also tried manually adding the file location. But it is not working. then i deleted that Configuration Properties->linker->Input . but then I am not able to compile your program but simple CUDA programs are compiling. SO what can i do? please guide me.

    thank you in advance

  70. Sonal says:

    I’m facing the following problem:

    1>—— Build started: Project: hope, Configuration: Emudebug Win32 ——
    1>Linking…
    1>.\Emudebug\stdafx.obj : fatal error LNK1112: module machine type ‘X86′ conflicts with target machine type ‘x64′

    plz help me out!!

    • llpanorama says:

      You are compiling for a Windows 32 machine, but you are running on a 64-bit Windows machine. Either move your development to a 32-bit Windows PC, or upgrade your CUDA to a version that supports 64-bit Windows.

  71. Alessandro says:

    I still get an error (WIN XP 64bit, VS2008)

    Error 1 fatal error LNK1181: cannot open input file ‘.\Debug\example1.obj’ CUDA_ex1b CUDA_ex1b

    Do you know what it means?

  72. kkapron says:

    I’ve tried it on linux and it works, just simple: nvcc example.cu

  73. cakrud says:

    I have a error:

    —— Build started: Project: example1, Configuration: Debug Win32 ——
    Performing Custom Build Step
    Project : error PRJ0002 : Error result -1073741510 returned from ‘C:\WINDOWS\system32\cmd.exe’.
    Build log was saved at “file://c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\example1_emu\example1\Debug\BuildLog.htm”
    example1 – 1 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

  74. cakrud says:

    I have error:

    —— Build started: Project: example1, Configuration: Debug Win32 ——
    Performing Custom Build Step
    Project : error PRJ0002 : Error result -1073741510 returned from ‘C:\WINDOWS\system32\cmd.exe’.
    Build log was saved at “file://c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\example1_emu\example1\Debug\BuildLog.htm”
    example1 – 1 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

  75. lam says:

    Hi, I am trying to compile your exemple in VISTA 64 and I got the error: “>nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘c:/Program File” do you have nay advise?

    • llpanorama says:

      Do a search for CUDA, Vista and 64-bits. I’m sure I’ve seen people talk about running CUDA on 64-bit XP.

    • aldebaran says:

      Here’s a solution for Visual Studio:

      nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 8/VC/bin

      This is probably because Visual Studio doesn’t install “X64 Compilers and Tools” by default, so you should go to Control Panel, Programs and Features, select visual studio, add or remove features, and select “X64 Compilers and Tools” under Visual C++.

      Just notice that I think this solution doesn’t work for Express Editions because it doesn’t have support for x64.

      I hope this info helps.-

  76. MIA says:

    Hello thx a lot for this helpful article.
    i follow ur steps one by one , but when i build the project ,it blocks at this step :

    1>—— Build started: Project: example1, Configuration: Debug x32 ——
    1>Performing Custom Build Step
    1>example1.cu
    1>tmpxft_00000be0_00000000-3_example1.cudafe1.gpu
    1>tmpxft_00000be0_00000000-8_example1.cudafe2.gpu

    and i don t see the cause !!!
    when i build the cuda program directly from cmd with “nvcc -deviceemu -o exp1 exemple1.cu” it can generate the .exe file and it gives same result !!!! do u have any ideas about this prob????

  77. llpanorama says:

    Up in the main article, I have added a link to a project zip file with the -deviceemu option.

  78. robert says:

    would it be possible that you post your program as a zip file with the visual studio 2005 set up with emulation and debug?

    perhaps this would be a simple solution for all of us here.

    thanks

  79. robert says:

    Hi;

    I have the same problem as one of your other users. I switched on the emumode as suggested, but I still get
    the output
    1 1.0000
    2 2.0000
    3 3.0000
    etc, etc etc.

    Any Suggestions

  80. Chathuri says:

    My program runs fine in emudebug mode, but when I try to run it on Debug (using the actual GPU, because my goal is to run the program for more than 1,000,000 threads) I end up with following two problems:
    1. I have memcopy from host to device, but when I debug the structure doesn’t get copied.
    cutilSafeCall( cudaMemcpy(layerIni,layer, 1 * sizeof(TissueStruct), cudaMemcpyHostToDevice) );
    2. The kernel execution fails with cudaThreadSynchronize error : unspecified launch failure.
    Can you please help me in this case.
    The device I am using is GeForce GTX 260.

  81. Ho Xung Lenh says:

    I am getting the following message:

    cudaSafeCall() Runtime API error in file , line 59: feature is not yet implemented.

    line 59 of that piece of code says

    cutilSafeCall(cudaGetDeviceProperties(&deviceProp, dev));

  82. Ho Xung Lenh says:

    Thanks for your advice.
    Actually, I do not have the CUDA graphic cards on my machine, so I must use the emulator mode. I also tried to followed your steps in the previous post, but I can do the step 1 and 2. I can not do the step 3, which is about installing the Driver. It says that it could not locate any drivers compatible with the current hardware. The DeviceQuery compiles fine but it could not run. The debug mode show the following information when running:

    ‘deviceQuery.exe': Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\deviceQuery.exe’, Symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\WINDOWS\system32\ntdll.dll’, No symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\WINDOWS\system32\kernel32.dll’, No symbols loaded.
    ‘deviceQuery.exe': Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
    ‘deviceQuery.exe': Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\cutil32D.dll’, No symbols loaded.
    First-chance exception at 0x7c812a5b in deviceQuery.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fd2c..
    First-chance exception at 0x7c812a5b in deviceQuery.exe: Microsoft C++ exception: cudaError at memory location 0x0012fd7c..
    The program ‘[464] deviceQuery.exe: Native’ has exited with code 1 (0x1).

    Also note: I use Visual Studio .NET 2005.

  83. llpanorama says:

    Have you downloaded my source files and tried to compile and run those? Have you tried the DeviceQuery example in my previous post about setting-up the CUDA tools?

  84. Ho Xung Lenh says:

    PS: The debug mode shows the following information:

    ‘example2.exe': Loaded ‘C:\Documents and Settings\Tuan Anh NGUYEN\My Documents\Visual Studio 2005\Projects\example2\debug\example2.exe’, Symbols loaded.
    ‘example2.exe': Loaded ‘C:\WINDOWS\system32\ntdll.dll’, No symbols loaded.
    ‘example2.exe': Loaded ‘C:\WINDOWS\system32\kernel32.dll’, No symbols loaded.
    ‘example2.exe': Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe5c..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012feac..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe54..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fea4..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe44..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fe94..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe54..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fea4..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe60..
    First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012feb0..
    The program ‘[1932] example2.exe: Native’ has exited with code 0 (0x0).

    Please help me in this case.

  85. Ho Xung Lenh says:

    Hi,

    I follow your step (with -deviceemu option): it compiled fine but the result is wrong:

    0 0.000000
    1 1.000000
    2 2.000000
    3 3.000000
    4 4.000000
    5 5.000000
    6 6.000000
    7 7.000000
    8 8.000000
    9 9.000000

    The full command is:

    “$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I ” $(CUDA_INC_PATH)” -I./ -o -deviceemu $(ConfigurationName)\example1.obj example1.cu

    Can you help me for this problem ? I use SDK and Toolkit 1.1

    Thanks

  86. llpanorama says:

    Marc:

    I believe the indices in the program all start at zero, so the lowest array index is 0 * 4 + 0 = 0.

  87. Marc says:

    First of all, thank you for an awesome article!

    I’m a bit confused about the inner workings of the kernel function.

    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx<N) a[idx] = a[idx] * a[idx];

    so blockIdx ranges from 1 to 3 (3 blocks, given by n_blocks), and blockDim is 4.

    so 1 * 4 + 1 = 5 is the lowest array index you can get.

    What am I not understanding correctly here?

  88. Asim says:

    Ivan Dj …

    Please use the following: I just removed the qoutation marks from this statement and it worked :)

    Configuration Properties → Linker -> General:
    Additional Library Directories = C:\CUDA\lib;C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib

  89. llpanorama says:

    Ivan:

    I used Visual Studio 2005 and CUDA 1.1. You’re using VS 2008. Go back and use VS 2005 and maybe then the example will work for you. Or find the cudart.lib file on your system and update the linkage paths so it will be found.

  90. Ivan Dj says:

    Hello!
    please help quickly. I have the following problem:
    1>—— Build started: Project: example1, Configuration: Debug Win32 ——
    1>Linking…
    1>LINK : fatal error LNK1181: cannot open input file ‘cudart.lib’
    1>Build log was saved at “file://e:\Software Projects\Visual Studio 2008 projects\C++\CUDA\example1\example1\Debug\BuildLog.htm”
    1>example1 – 1 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

    What else do I need to do, I did everything like you said in tutorial

  91. Jon says:

    Thanks for this! With just a little bit of tweaking, I was able to get this code to work under linux without issue. Matter of fact, it was easier to do, I think.

    In my case, all I had to do was comment out the stdafx.h include, rename the source to a .cu file, then compile it with nvcc. This created an a.out file that worked first time through!

  92. Krishna says:

    it was really useful…
    thanks alot :)

  93. Song says:

    Muchas grasias~!

  94. Pingback: Desenvolvimendo com CUDA no Ubuntu 8.04 « John Tortugo

  95. llpanorama says:

    I would do the obvious and install CUDA 1.1 and see if the error still occurs. If it does, then there is a problem when you setup the project. Otherwise, there is a problem when using CUDA 2.0.

    • binay kumar pandey says:

      im getting error a tool returned error code from “compiling” i get struck there. pls help me give ur suggestion in my mail also

  96. sky says:

    the tutorial is wonderful. Unfortunately, I got the error from vs2005,
    Error 1 error PRJ0019: A tool returned an error code from “Performing Custom Build Step”
    I dont know how to figure our it. Could you help me if possible? thank u a lot. By the way,I used CUDA2.0, Is it trouble with that version?

  97. N says:

    Thank you, wonderful article.

  98. Thanks for this great tutorial. I used Vista x64 and works very well.

    Thanks a lot.

  99. Robert Evrae says:

    When you are messing around with the properties, it might be advantagous to replace all uses of “example1″ with “$(InputName)” (without quotes).

    This means that the project created can easily be reused just by renaming the files involved, and not requiring that you manually fiddle with the properties every time.

    Great guide by the way! I just got bugged having to constantly change those variables, so I went hunting for an alternative.

  100. Josue says:

    Thanks for the help, GREAT TUTORIAL
    The project can be compiled and run. Here’s the result:

    0 0.000000
    1 1.000000
    2 4.000000
    3 9.000000
    4 16.000000
    5 25.000000
    6 36.000000
    7 49.000000
    8 64.000000
    9 81.000000

    :)))))))Thanks for article!!!!!!

  101. llpanorama says:

    Here is some info I found about compiling for 64-bit Windows on the Nvidia forums:

    I ended up getting it to work by following the instructions under “How To Create 64-bit apps” at http://blogs.msdn.com/deeptanshuv/archive/…/11/573795.aspx

    In summary I had to:

    * List cutil64D.lib instead of cutil32D.lib under Project Properties -> Configuration Properties -> Linker -> Input -> Additional Dependancies
    * Change from the MachineX86 to MachineX64 option under Project Properties -> Configuration Properties -> Linker -> Advanced -> Target Machine
    * open the solution explorer, select solution, right click->Configuration Manager.
    * go to ‘Active Solution Platform’, click New.
    * in the ‘New Solution Platform’ dialog that comes up select the new platform x64. Set ‘Copy Settings From’ to ‘Win32′
    * click OK.

    And if I do this before writing a project it seems to build properly.

  102. Josue says:

    OK.my bad.I change the Active solution platform Win32 to x64.But now the problem is other:

    1>—— Build started: Project: example1, Configuration: Debug x64 ——
    1>Performing Custom Build Step
    1>example1.cu
    1>tmpxft_00000be0_00000000-3_example1.cudafe1.gpu
    1>tmpxft_00000be0_00000000-8_example1.cudafe2.gpu
    1>tmpxft_00000be0_00000000-3_example1.cudafe1.cpp
    1>tmpxft_00000be0_00000000-12_example1.ii
    1>Linking…
    1>LINK : fatal error LNK1181: cannot open input file ‘cutil32D.lib’
    1>Build log was saved at “file://c:\Documents and Settings\jacevedo\Desktop\example1\example1\example1\x64\Debug\BuildLog.htm”
    1>example1 – 1 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

    and this library is not in C:\CUDA\lib

    Can somebody give me advise how to resolve this problem? Thanks.

  103. Josue says:

    hi,Great tutorial
    I did all the steps.
    But I got a fatal error when i tried to build it (compile) using VS2005 in a XP x64
    This is what shows:

    1>—— Build started: Project: example1, Configuration: Debug Win32 ——
    1>Compiling…
    1>stdafx.cpp
    1>Linking…
    1>LINK : fatal error LNK1181: cannot open input file ‘cudart.lib’
    1>Build log was saved at “file://c:\Documents and Settings\jacevedo\Desktop\example1\example1\example1\Debug\BuildLog.htm”
    1>example1 – 1 error(s), 0 warning(s)
    ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

    Can somebody give me advise how to resolve this problem? Thanks.

  104. llpanorama says:

    The result you are getting is the same thing that happens if I try to run the program in non-emulated mode with my 8600 card disabled. That is similar to trying to run the code on a non-CUDA device such as your Radeon. Are you sure you are running the emulated version of the program?

  105. Zebiloute says:

    Great tutorial.

    I have a question. i try to use this cool sample without NVDIA gpu (i use ATI radeon). So it’s compil (with -deviceemu option) fine but when i try to launch the exe file the result it’s strong. See below :

    0 0.000000
    1 1.000000
    2 2.000000
    3 3.000000
    4 4.000000
    5 5.000000
    6 6.000000
    7 7.000000
    8 8.000000
    9 9.000000

    There is no square computing. Thanks.

  106. llpanorama says:

    Marek:

    When you install CUDA, the installer should create all the CUDA… environment variables. Then, in the Visual Studio project, you have to create all the configuration properties so the correct compiler is called as I did in the example shown above.

    Also, I’m not sure if CUDA supports VS2008. I know the version I am using (1.1) doesn’t. That may have changed. Check the Nvidia forums for more information.

  107. Marek says:

    Hi. I have a problem with compilation CUDA programs in Visual Studio 2008. It doesn’t know CUDA… variables so for example it doesn’t find CUDA compiler. Can somebode give me advise how to resolve this problem? Thanks.

  108. llpanorama says:

    Peter:

    Use the -deviceemu compiler option as shown in the second-to-last paragraph of this article. This will create an executable that uses the CUDA emulator instead of a graphics card.

  109. Hyunhojo says:

    Thank you

    I have a mistake, because I copy option directly to visual studio

    “$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu

    but upper option double quotes wasn`t correctly copied.

  110. peter says:

    I was wondering if you knew how to set up the emulator on XP so I dont have to buy an new graphics card. I cant seem to find anything on the internet.

  111. Sumesh says:

    This article is very helpful :)

  112. llpanorama says:

    Satakarni:

    I know that a block of threads will be executed on a single multiprocessor and multiple blocks can be assigned to each multiprocessor. Other than that, I don’t know of any static relation between grid elements and multiprocessors.

    I don’t see anything in the CUDA API that lets you select the number of processors that will be used to run your code. You might ask on the CUDA forum and see if anyone knows of a switch to do this.

  113. Satakarni says:

    As we know `<<>>’ is required for calling Kernel to execute on GPUs (or device), where Dg meant for grid size, Db for block size (and number of threads), and optional Ns for memory allocation.

    However I would like to know how the number of processors and grids are related?

    For example, I am using Tesla C870. which has 16 multiprocessors with each multiprocessor having 8 processors. total 128 processors. I want to scale my program my testing it on 16, 32, 48, and so on up to 128 processors. How can I archive this with CUDA programming?

    ( I thought that there must be some relation with grid and/or block size used in the program with the number of processors in the GPU card.)

    Kindly let me know.

    With Regards,
    Satakarni

  114. m ravi kuar says:

    Explanation was excellent, and I found interesting and helpful for my Work. Keep writing my dear friend.

  115. kyzhao says:

    You can download new version of the CUDA wizard for the VS Express.

    url:

    http://forums.nvidia.com/index.php?showtopic=69183

  116. samsam99 says:

    Thank you for the article.

  117. Fatih says:

    Thank you for this article,it is very helpful.

  118. Pingback: Threads and blocks and grids, oh my! « /// Parallel Panorama ///

  119. amput says:

    Thank you very much for this article. It really helped me. Continue writing.

  120. kurt says:

    This is extremely helpful for those of us wanting to start from scratch (which is the only way I can learn anything).
    Great work buddy! Particularly the painstaking details given on how to configure VS for cu, etc.

  121. Anton says:

    Many thanks for article, it is written very clear, to read and understand !
    I am interested in that direction very, but there is no good information enough ! Please do not stop and write more !!!!
    I would be very glad to see the same detailed example with using MersenneTwister and Montecarlo !!!

    Thanks for article!!!!!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 33 other followers

%d bloggers like this: