My first CUDA program!

May 21, 2008 184 Comments

Note: Check out “CUDA Gets Easier” for a simpler way to create CUDA projects in Visual Studio.

I got CUDA setup and running with Visual C++ 2005 Express Edition in my previous post. Now I’ll write my first CUDA program. It’s a modification of an example program from a great series of articles on CUDA by Rob Farber published in Dr. Dobbs Journal. Rob does his examples in a make-based build environment; I’ll show how to build a CUDA program in the Visual C++ IDE.

Simple CUDA programs have a basic flow:

The host initializes an array with data.
The array is copied from the host to the memory on the CUDA device.
The CUDA device operates on the data in the array.
The array is copied back to the host.

My first CUDA program, shown below, follows this flow. It takes an array and squares each element. I can barely contain my excitement.

// example1.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <stdio.h>
#include <cuda.h>

// Kernel that executes on the CUDA device
__global__ void square_array(float *a, int N)
{
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  if (idx<N) a[idx] = a[idx] * a[idx];
}

// main routine that executes on the host
int main(void)
{
  float *a_h, *a_d;  // Pointer to host & device arrays
  const int N = 10;  // Number of elements in arrays
  size_t size = N * sizeof(float);
  a_h = (float *)malloc(size);        // Allocate array on host
  cudaMalloc((void **) &a_d, size);   // Allocate array on device
  // Initialize host array and copy it to CUDA device
  for (int i=0; i<N; i++) a_h[i] = (float)i;
  cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
  // Do calculation on device:
  int block_size = 4;
  int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);
  square_array <<< n_blocks, block_size >>> (a_d, N);
  // Retrieve result from device and store it in host array
  cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);
  // Print results
  for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);
  // Cleanup
  free(a_h); cudaFree(a_d);
}

Two pointers are declared on line 19 of the main routine: a_h points to the array that is stored on the host, while a_d points to the array on the CUDA device. The a_h array is allocated in the host memory on line 22 using the standard malloc subroutine, but a_d is allocated in the CUDA device memory using the cudaMalloc subroutine found in the CUDA API (line 23). (Note that a pointer to the a_d pointer is passed to cudaMalloc so it can store the address of the array in a_d.)

In order to create some values to operate upon, each element in the host array is initialized with its array index (line 25). Then the cudaMemcpy subroutine is used to copy a_h from the host into a_d on the CUDA device. (The cudaMemcpyHostToDevice flag, defined in the API, indicates the direction of the transfer.)

In lines 28-30, the host initiates the execution of the kernel function, square_array, on the CUDA device. A CUDA device contains individual processing elements, each of which can execute a thread. A number of the processing elements are grouped together to form a block, and a number of blocks constitutes a grid. In this example, the number of threads per block is set to four (line 28). Then the total number of blocks that are needed to get enough threads to square each array element is calculated on line 29. (For ten array elements, three blocks each with four threads are needed.) On line 30, the host initiates the kernel function on the CUDA device. The number of blocks and the number of threads in each block are indicated between the <<<…>>> following the kernel name. (This information is picked up by the Nvidia compiler, nvcc, and is used when generating the instructions that start the kernel on the CUDA device. More on nvcc, later.) Following that, the standard argument list to square_array contains a pointer to the array in the CUDA device memory and the number of elements in the array.

The kernel is shown on lines 10-14. The __global__ keyword indicates that this is a kernel function that should be processed by nvcc to create machine code that executes on the CUDA device, not the host. In this example, each thread will execute the same kernel function and will operate upon only a single array element. Each thread is distinguished from all the others by block and thread indices that can be used to determine the array element the thread will access. On line 12, the array index is found by multiplying the thread’s block index (blockIdx.x) by the number of threads in each block (blockDim.x) and then adding the index of the thread within the block (threadIdx.x). If the index is within the bounds of the array, then the corresponding array element is squared (line 13).

Immediately after starting the kernel, the host begins a transfer of the data from the array in the CUDA device memory back to the array in the host memory (line 32). This transfer is delayed until the CUDA device has finished executing the kernel, so there is no chance of getting data that has not been processed yet. Then the host displays the contents of the array (line 34) and frees the array memory on both itself and the CUDA device (line 36).

At this point, I have a CUDA-enabled program, but I don’t have it integrated into a Visual C++ project. It actually takes a bit of work to do that. To start, I brought up the Visual C++ 2005 Express Edition IDE and clicked on the New Project button (you can also use File→New→Project… from the menu). In the New Project window, I selected Win32 as the project type and Win32 Console Application as the template. I gave the project the creative name of example1 and set its location to the C:\llpanorama\CUDA\examples directory. After clicking OK in the New Project window, and then clicking Finish in the Win32 Application Wizard window, a window opened with a simple code skeleton. I replaced the code skeleton with the code shown above.

After saving the code, I right-clicked the example1.cpp file, selected Rename from the drop-down menu and renamed the file to example1.cu. Files with the .cu extension are intended to be processed by nvcc. nvcc will extract the kernel portion of example1.cu and compile it for execution on the CUDA device while using the Visual C++ compiler to compile the remainder of the file for execution on the host.

In its default configuration, Visual C++ doesn’t know how to compile .cu file. It has to be told explicitly how to do this using a Custom Build Step. This is done by right-clicking on the example1.cu file and selecting Properties from the drop-down menu. In the Property Pages window that appears, set the Custom Build Step command line as follows:

Configuration Properties → Custom Build Step → General:
Command Line =
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu

What does this command line do? Let’s break it down piece-by-piece:

“$(CUDA_BIN_PATH)\nvcc.exe”: The location of the nvcc compiler.

-ccbin “$(VCInstallDir)bin”: The location of the Visual C++ compiler.

-c: The compilation will proceed all the way to the generation of an object file (.obj extension).

-D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS: Various macro definitions.

-Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd: Various options that are passed by nvcc directly to the Visual C++ compiler.

-I”$(CUDA_INC_PATH)”: Look in the CUDA include directories for needed header files.

-I./: Look in the current directory for needed header files.

-o $(ConfigurationName)\example1.obj: The location and name of the resulting object file.

example1.cu: The source file that the compiler will work on.

In addition to setting the command line for the example1.cu file, the location of the output file is specified as follows:

Configuration Properties → Custom Build Step → General:
Outputs = $(ConfigurationName)\example1.obj

After setting the file properties, the properties for the example1 project have to be modified. Here are the project property settings I used for the Debug configuration:

Configuration Properties → C/C++ → General:
Additional Include Directories = $(CUDA_INC_PATH);”C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc”

Configuration Properties → C/C++ → General:
Debug Information Format = Program Database (/Zi)

Configuration Properties → C/C++ → Code Generation:
Runtime Library = Multi-threaded Debug (/MTd)

Configuration Properties → Linker → General:
Enable incremental linking = No (/INCREMENTAL:NO)

Configuration Properties → Linker -> General:
Additional Library Directories = “C:\CUDA\lib”;”C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib”

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib cutil32D.lib

Configuration Properties → Linker → Optimization:
Enable COMDAT folding = Do Not Remove Redundant COMDATs (/OPT:NOICF)

Now the project can be compiled and run. Here’s the result:

0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000

I told you it was exciting! Well, at least it’s right.

In order to compile the Release configuration, a few changes need to be made to the file and project properties. For the example1.cu file,the Custom Build Step command line has to be changed to remove the _DEBUG macro definition, enable compiler optimization, and link with the Release runtime library:

Configuration Properties → Custom Build Step → General:
Command Line =
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/O2,/Zi,/MT -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu

The project properties that have to be changed in the Release configuration are the linking for the runtime library and the use of the non-debug version of the CUDA utilities library:

Configuration Properties → C/C++ → Code Generation:
Runtime Library = Multi-threaded (/MT)

Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib cutil32.lib

Once those changes are made, the Release version of the example1 project can be compiled and run. It will output the same exciting result.

Here’s the source code for this example if you want to try it.

Don’t have a CUDA-capable GPU board on your PC but still want to try running this program? Easy! Just add the following option to the Custom Build Step command line: -deviceemu. This will link-in a CUDA device emulator that runs on the host. The emulator becomes the target for all the CUDA API calls and executes the kernel. The program will run just like a CUDA device is there, except slower. (Here is the project file with the -deviceemu option.)

So I’ve written my first CUDA program and gotten it to compile using Visual C++ 2005 Express Edition. Setting up the compilation options was as much (more?) work as writing the program, so you might be interested in a CUDA template for Visual C++ 2005 written by kyzhao. The installer doesn’t work for me (maybe because I’m using the free Express Edition), but it might help you.

Filed under GPU Tagged with CUDA

184 Responses to My first CUDA program!

Pingback: C#: Perform Operations on GPU, not CPU (Calculate Pi) [closed]
LeroyLieni says:

May 26, 2020 at 5:11 pm

index Blockchain Game

Reply
aswathy says:

June 26, 2018 at 12:31 am

fatal error: stdafx.h: No such file or directory
#include “stdafx.h”

Reply
Pingback: CUDA dan VS 2008 | teguhsatria.com
restaurant frankfurt oder says:

May 8, 2017 at 1:28 am

I found your weblog site on google and test a couple of of your early posts. Proceed to keep up the excellent operate. I simply additional up your RSS feed to my MSN News Reader. Looking for forward to reading extra from you in a while!…

Reply
restaurant frankfurt oder says:

May 8, 2017 at 12:52 am

Very interesting details you have mentioned, thanks for putting up.

Reply
hotel frankfurt-Oder says:

May 5, 2017 at 12:48 am

Fantastic blog you have here but I was curious about if you knew of any discussion boards that cover the same topics talked about here? I’d really love to be a part of online community where I can get feedback from other knowledgeable people that share the same interest. If you have any recommendations, please let me know. Thanks a lot!

Reply
Michael says:

March 16, 2017 at 3:58 pm

Very very good intro to CUDA. Your step by step instructions are so clear I could picture everything as I was reading.You will be a very good author of a book on this. If you have the desire,don’t look back.
I am getting interested in CUDA because I am a great fun of writing high performance applications. I have made so many sql server applications run fast inside sql server and now I feel the I need to step outside sql server. I also feel I have hit the roof with multithread applications in C#.Something more challenging. Major hurdle is C++.!!

Reply
Gianfranco Levati says:

November 11, 2016 at 3:38 am

Kudos!

Reply
Rory Bievenue says:

August 14, 2016 at 1:50 pm

Nice!

Reply
Asmdrieli says:

February 18, 2016 at 3:18 am

Бронхиальная астма представляет собой хроническое воспалительное заболевание дыхательных путей. Заболевание бронхиальная астма чаще всего возникает в детском возрасте. Бронхиальная астма у детей может проявляться неодинаково. Заболевание может дать о себе знать затрудненным или свистящим дыханием, приступами удушья, одышкой, кашлем, стеснением в груди. Когда поставлен диагноз бронхиальная астма лечение оптимально начинать незамедлительно. Существует множество способов как лечить бронхиальную астму. При этом особого подхода требует лечение астмы у детей. Особенно результативным показывает себя лечение астмы современными методами. Новейшие методы лечения астмы помогут Вам забыть об этом заболевании. В частности, прекрасно зарекомендовало себя лечение астмы по методу цубо, который пришел к нам из Японии, где применяется уже сотни лет.
Asthma.In.Ua – Бронхиальная астма: лечение астмы по методу цубо

Reply
Homevop says:

January 6, 2016 at 3:44 am

Типография Printing House. Мы занимаемся полиграфией с 2008 года, и всегда обращены лицом к нашим заказчикам, открыты для общения. http://pr-home.ru/broshyury.html – печать журналов

Reply
Pingback: Fix Cuda Nvcc Error Windows XP, Vista, 7, 8 [Solved]
Kazuko says:

October 6, 2014 at 1:21 pm

Spot on with this write-up, I really believe this web site needs much more attention. I’ll probably be
back again to see more, thanks for the advice!

Reply
garcinia x slim camera says:

September 24, 2014 at 8:10 pm

I am in fact pleased to read this webpage posts which consists of tons
of valuable facts, thanks for providing these kinds of information.

Reply
detskoeveselц says:

September 21, 2014 at 8:32 pm

Сайтдетское веселье о детях для родителей

Reply
body metabolic says:

September 20, 2014 at 2:58 pm

I don’t even understand how I ended up right here, but I thought this put up
was once good. I don’t know who you are however certainly
you’re going to a well-known blogger for those who aren’t already.
Cheers!

Reply
Carmen Peralta says:

September 17, 2014 at 10:29 pm

Hello again. I tried to get the source program using firefox . It did not work¡¡¡ Please.

Reply
Carmen Peralta says:

September 16, 2014 at 6:26 pm

Hi, ¿can i get source code for this example? The link is no longer.

Reply
Tyree says:

September 6, 2014 at 6:15 pm

Hey there! Do you know if they make any plugins to safeguard
against hackers? I’m kinda paranoid about losing everything I’ve worked hard on. Any tips?

Reply
natural weight loss says:

August 31, 2014 at 11:58 pm

obviously like your website but you have to test
the spelling on several of your posts. Many of them are rife with spelling problems and I find it very troublesome to tell the reality nevertheless
I will certainly come again again.

Reply
Pingback: Ejemplos de programación de GPGPU en CUDA | Todo lo que buscabas en Programación
regions says:

December 13, 2013 at 1:50 pm

garcinia Cambogia Reviews – GCE likewise assists to boost
the serotonin material in the brain. It decreases manufacturing of terrible cholesterol
and enhances the production of superb cholesterol levels.

Reply
Nell says:

August 19, 2013 at 8:56 am

Generally I do not read post on blogs, however I wish to say that this write-up very forced me to try and
do it! Your writing taste has been amazed me. Thank you,
very great article.

Reply
Axl Hazarika Hum Badal Gaye says:

July 29, 2013 at 12:00 pm

I’m gone to inform my little brother, that he should also visit this blog on regular basis to get updated from latest information.

Reply
Hakan says:

July 27, 2013 at 5:38 am

Thank you for article. It is written very clear

Reply
menoannuams says:

February 18, 2013 at 6:56 pm

replica watches
reputable replica watch sites

Reply
Pingback: CUDA+Visual Studio-NVidea NSight | Мои IT-заметки
tassameh says:

May 24, 2012 at 3:57 pm

HI every one:
i am new in CUDA programming i need the help for my graduated application
thanks

Reply
Pingback: _ITERATOR_DEBUG_LEVEL of Cuda (NVCC) compiler? | PHP Developer Resource
csb says:

April 10, 2012 at 6:03 am

i m having cuda toolkit 2.2 ,windows 7, visual studio 8
so what changes do i have to do?

Reply
Vijay says:

April 9, 2012 at 1:01 pm

Hi. Many many thanks for the article. I have been experimenting with CUDA on VS for while now. I have had to reformat my system many times in the past 2 years and everytime I have to set up VStudio and CUDA, I have this page saved in my bookmarks. I just wanted to add a little more to this info. Recently Microsoft launched VStudio 2011 beta. My project involves using managed windows and cuda. While Vstudio 2010 does not have intellisense for managed systems, Vstudio 2011 beta is not supported by Nvidia yet. So what I did was to set up my project using VS 2010 and then copied the contents of ‘C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations’ into ‘C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\BuildCustomizations’. Now I can open my VS 2010 projects in VS 2011 and enjoy the comfort of intellisense. I have not invented any new kinda rocket here but thought will help somebody on the way.

Reply
NL says:

March 28, 2012 at 8:16 am

Hi there,

thanks for the posting. It really helps. But I thought since
I was new to cuda and visual studio and I manage to run the above program using
Visual studio prof 2010 ( I think express will work as well)
Cuda 4.1 without the Nsight parallel, it may helps the other users of CUDA and VS 2010.

once you have created the example1.cpp as described at the beginning of this blog.
Right click example1 (the project on your left under Solution explorer)
Build customization and check Cuda 4.1

Then right click example1.cpp (C++ file now) ->rename example1.cu
right click again example1.cu properties->general ->

item properties change to Cuda C/C++, click on apply.

Expand Cuda C\C++ (on the left under general) , click common ,
in additional directories put
./
../../common/inc
../../../shared/inc

Now go back to example1 project
right click the project expand configuration properties, in custom build step-change outputs to
$(Configuration Name)\example1.obj
right click the project expand configuration properties, in VC++ directories-add to include directories
the pathe where the inc are , for me it is

$(Configuration Name)\example1.obj

Reply
M says:

March 16, 2012 at 5:49 am

Hi,

I am trying to compile my program via nvcc by command line.I tried different ways to use cygwin or dos,I only managed to create the .obj file.It doesn’t create the .exe.I don’t know how to link them.I am doing this by using visual studio as well.

Could you please help me .

Reply
Sawela says:

December 14, 2011 at 7:33 pm

Бесплатная Юридическая Консультация http://jurspravki.ru/ – Правовая Инициатива.

Reply
Amitabh Modak says:

November 10, 2011 at 12:02 am

Thanks for the article. Very helpful and clear.

Reply
vinayaka says:

October 15, 2011 at 8:12 am

i want to use juda in emulation how can i do that, can any one help me

thank you

Reply
N says:

October 11, 2011 at 10:43 pm

I am working on non GPU based machine

with VC++ 2005 Express edition
CUDA 1.1 toolkit
CUDA 1.1 SDK

It compile successfully, but gives following o/p when I run the solution.
Please help

‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\deviceQuery.exe’, Symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\Windows\System32\ntdll.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\Windows\System32\kernel32.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\Program Files\AVAST Software\Avast\snxhk.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\cutil32D.dll’, No symbols loaded.
First-chance exception at 0x7760fc56 in deviceQuery.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fcf0..
First-chance exception at 0x7760fc56 in deviceQuery.exe: Microsoft C++ exception: cudaError at memory location 0x0012fd40..
The program ‘[5372] deviceQuery.exe: Native’ has exited with code 1 (0x1).

Reply
- N says:
  
  October 12, 2011 at 5:38 pm
  
  I am not seeing CUDA C/C++ itemtype to select in cudafile property pages
  
  I have either C/C++ or I have CUDA as item type.
  
  I have included nvcc.rules in custom build rules.
  
  Please help !!!
  
  Thanks,
  N
  
  Reply
Nick says:

August 29, 2011 at 2:32 am

It’s working perfectly. Thank you

Reply
gjhgjhg says:

July 27, 2011 at 6:14 pm

nice one! those settings completely ruined every other c++ project! thanks

Reply
- dave_vandenbout says:
  
  July 27, 2011 at 8:18 pm
  
  You’re welcome! That’s me – always trying to help!
  
  Your other c++ projects probably sucked, anyway. You’re better off without them.
  
  Reply
karthikselvakumar says:

July 21, 2011 at 9:08 am

Can any one say me how to start from python orange library in CUDA ?

Reply
Pingback: Informatikdienstleistungen.de » Archive » Tutorial: Parallele Berechnungen mit CUDA, ein Beispiel
Sonar says:

June 20, 2011 at 4:17 am

СОНАР – запчасти к технике.http://www.sonarr.ru/

Reply
Guillermo says:

February 17, 2011 at 5:21 pm

I got:
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 16.000000
5 5.000000
6 6.000000
7 7.000000
8 64.000000
9 9.000000
but i don´t understand.. any idea please? (i’m beginner, sorry)

Reply
- Guillermo says:
  
  February 17, 2011 at 10:30 pm
  
  Sorry, I was wrong to write the array index calculation in the kernel f’n
  
  Reply
  - Guillermo says:
    
    February 17, 2011 at 10:40 pm
    
    We have 3 blocks of 4 threads each. Let B(x,y) a block, then we have B(0,0), B(1,0), B(2,0) and if let Ti(x,y) a thread of Bi then for B(0,0) we have T(0,0), T(1,0), T(2,0), T(3,0) .. and so on. Note that we have 12 threads where we can calculate the square of up to 12 array’s elements. For that reason we must apply the (idx <N) condition in the kernel function. e.g. the 5th array element is one of idx = 1 (second block) * 4 (number of threads per block) + 1 (second thread of the block).
Moise Munderere says:

January 31, 2011 at 12:17 am

I am biginner on Visual studio , can ony one tell why I can not run a program , I always see that message ” fatal error LNK1104: cannot open file ‘.\Debug\example1.obj”

Reply
- sumit says:
  
  January 31, 2011 at 1:09 am
  
  create a new project under win 32 console application.i had the same problem, but v solved it. give me ur email id, i can mail u the whole procedure
  
  Reply
  - Zhiyi says:
    
    March 20, 2011 at 8:21 am
    
    Hi, I have the same problem. I tried to solve the problem several days, however I could not find the solution. Could you mail me the procedure! Thanks a lot! my E-mail address is wzymail@vip.qq.com
    
    Hoping your reply!
    
    best wishes!
  - tiru says:
    
    August 1, 2011 at 1:05 am
    
    my email id : manovancha@gmail.com
    please send me whole procedure how to compile and run the above program using visual c++ IDE.
  - Bolor says:
    
    September 14, 2011 at 3:05 am
    
    my email ID: enkhbolora@gmail.com
    please send me too whole procedure from how to create new project in visual studio 2005.
  - Akhtari Zameel says:
    
    December 17, 2017 at 12:34 pm
    
    I am new in cuda. please send me whole procedure how to compile and run the above program.my email id : akhtari09.aj@gmail.com
Pingback: CUDA e o Visual Studio 2008 « Diário de Pesquisas – Pesquisas em Computação
Max says:

January 11, 2011 at 6:15 am

Very comprehensive article. Thank you very much

Reply
Kay says:

January 7, 2011 at 2:13 am

I am just starting with CUDA. I have a GeForce 6150SE nForce 430 on my desktop. All the samples in the toolkit say min req is GeForce 8. Although, everything seems to be installed well for 64 bit.

Here is my problem:
I opened bandwidthTest_vc90.sln in Visual c++2008. It needs cutil32D.lib and shrUtils32D.lib.
I do not have these files anywhere on my computer. I do have cutil32.lib, cUtil64.lib and cutil64D.lib.
Is there anything that I should change on the properties in VC project, that would help me compile?

Reply
pusHeibimesit says:

January 6, 2011 at 7:38 am

Жаль, что сейчас не могу высказаться – опаздываю на встречу. Но освобожусь – обязательно напишу что я думаю.

Reply
karthikselvakumar says:

November 21, 2010 at 5:15 am

I had started programming in CUDA but dont know how to decompose RANDOM FOREST algorithm in CUDA so please help me .

Reply
Looshu says:

November 12, 2010 at 2:19 pm

Can you please post the CUDA code for dijkstra’s algorithm..!
Pleashe saar.

Reply
Rashmi says:

October 16, 2010 at 7:12 am

can anyone tell me the procedure to run cuda programs on the emulator mode??

Reply
- llpanorama says:
  
  October 16, 2010 at 7:53 am
  
  The -deviceemu option isn’t used any more. Now you have to link against a CUDA library that implements the emulator. This is described in the “CUDA Gets Easier” entry of this blog (although I’m not sure this really is easier).
  
  Reply
Pingback: CUDA troubles
trap15 says:

September 20, 2010 at 1:32 am

First off, I have to say this was fairly helpful, and helped me get started.
Secondly, you write some fuck-ugly code 🙂

Reply
- llpanorama says:
  
  September 20, 2010 at 3:48 pm
  
  Thanks! I’ll let you know when I start my “Social Etiquette for Coders” blog.
  
  Reply
  - sumit says:
    
    September 21, 2010 at 4:57 am
    
    somebody plz tel me, how do we calculate optimization vs cpu? r there any cuda instructions for it?
sumit says:

September 10, 2010 at 5:07 am

somebody plz tel me, how do we calculate optimization vs cpu? r there any cuda instructions for it?

Reply
- llpanorama says:
  
  September 21, 2010 at 8:32 pm
  
  Time the operation on your CPU. Then time it using the GPU. The ratio of the times is your speedup.
  
  Reply
Avers says:

August 28, 2010 at 9:37 pm

У меня такой вопрос к вам сразу Кто-ниюужь использовал это в своей жизни? результаты есть? а то ж неизвестно чем все сие кончится, хотелось бы из рук практика услышать

Reply
raju says:

August 23, 2010 at 9:12 am

Nice post.
i am getting the error
LINK : fatal error LNK1181: cannot open input file ‘cutil32D.lib’
then i just deleted that file from the linker and now i am getting error
LINK : fatal error LNK1181: cannot open input file .obj

can any plz solve my problem
i have seen all the comments many programmers got the problem like this it may help many oh them plz sort it out.
Thak you.

Reply
- quest says:
  
  August 30, 2010 at 11:18 am
  
  Did you installed CUDA toolkit & CUDA SDK properly? CUDA compatible display driver?
  
  Reply
- sumit says:
  
  September 10, 2010 at 5:13 am
  
  1>the path for .lib file has to be given. and it should be short. so save both .lib files mentioned above on c drive. and then paste the address in additional dependencies col.
  2> .obj file is formed after successful build of code. so look upon the project properties steps once again.
  
  Reply
Sree says:

August 13, 2010 at 6:25 am

HI,

I am trying to invoke the asyncAPI from C# in VS2008. My program asyncAPI.cu is compiling without issues in VS2008 C++ and I have exported the dll.

But when I invoke the function in C# exe, it is throwing an exception that it is not finding the entrypoint.

Any example code of invoking a CUDA source from C# would be very helpful.

Reply
Anwarul says:

July 29, 2010 at 7:44 am

i m very beginning to CUDA programming : i have CUDA toolkit 3.1 for 64 bit and windows 7 64-bit . and i m trying run a CUDA program on visual studio 2008. and i m getting following error :

1>—— Rebuild All started: Project: newCuda, Configuration: Debug Win32 ——
1>Deleting intermediate and output files for project ‘newCuda’, configuration ‘Debug|Win32’
1>Compiling with CUDA Build Rule…
1>”C:\CUDA\bin64\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT ” -maxrregcount=32 –compile -o “Debug\cuda.cu.obj” cuda.cu
1>nvcc Hâ∞(Ç=y░: Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..’
1>Linking…
1>LINK : fatal error LNK1104: cannot open file ‘.\Debug\cuda.cu.obj’
1>Build log was saved at “file://c:\Users\Anwarul\Desktop\newCuda\newCuda\Debug\BuildLog.htm”
1>newCuda – 1 error(s), 0 warning(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

anybody please help , i would really appreciate you help . thanks in advance

Reply
- llpanorama says:
  
  July 30, 2010 at 6:56 am
  
  Have you tried any of the examples packaged with your CUDA distribution?
  
  Reply
  - Mary says:
    
    August 20, 2010 at 7:09 am
    
    Yes
    but this example produce following error:
    1>—— Rebuild All started: Project: MyFirstCUDA, Configuration: Debug Win32 ——
    1>Deleting intermediate and output files for project ‘MyFirstCUDA’, configuration ‘Debug|Win32’
    1>Compiling with CUDA Build Rule…
    1>”C:\CUDA\bin64\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT ” -maxrregcount=32 –compile -o “Debug\example1.cu.obj” “d:\Project\GPU\example1.cu”
    1>nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..’
    1>Compiling manifest to resources…
    1>Microsoft (R) Windows (R) Resource Compiler Version 6.1.6723.1
    1>Copyright (C) Microsoft Corporation. All rights reserved.
    1>Linking…
    1>LINK : fatal error LNK1104: cannot open file ‘.\Debug\example1.cu.obj’
    1>Build log was saved at “file://c:\Users\Rosa\Documents\Visual Studio 2008\Projects\MyFirstCUDA\MyFirstCUDA\Debug\BuildLog.htm”
    1>MyFirstCUDA – 1 error(s), 0 warning(s)
    ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
- Mike says:
  
  August 30, 2010 at 11:04 pm
  
  This was the error that was plaguing me for a long time, but I finally fixed it. I think this needs to be done when you’re on a 64 bit OS but compiling in Win32 because Express limits you to that. Anyway, I changed a build rule option as follows…
  
  Right click on your .cu file(s). Select properties. Under “CUDA Build Rule v3.xxx”, highlight “General”. Toward the bottom there should be a field called “Extra Options”. In this field, type “-m32” (without the quotation marks). I believe this tells nvcc to use the 32 bit compiler, but I’m not really sure. All I know is it eliminated that error for me.
  
  Note that after I fixed this, I had to go through and make sure all files were compiled using the same code generation run time library (such as MTd) so everything linked properly. A good thing to do is to open a sample project (such as SobelFilter_vc90 in 2008 or SobelFilter in 2005) in one window, open up your project in a second window, and match up all the properties.
  
  Hope this helps somebody…. it took me about 24 frustrating hours to compile my own CUDA program!
  
  Reply
Pingback: CUDA Gets Easier! « /// Parallel Panorama ///
Pingback: Updating to CUDA 2.3 « /// Parallel Panorama ///
Ankit says:

June 17, 2010 at 1:59 am

Thanks for the gr8 article…It really saved me lot of time. I am stuck in a problem , if I copy cuda kernel in .cpp file it says that the commands are undefined. If I rename the file as .cu then it says following error

1>NVCC Compiling…
1>cuda.cu
1>c1xx : fatal error C1083: Cannot open source file: ‘cuda.cu’: No such file or directory
1>Project : error PRJ0019: A tool returned an error code from “NVCC Compiling…”
1>Build log was saved at “file://c:\Users\ankit\Documents\Visual Studio 2008\Projects\ankit\ankit\Debug\BuildLog.htm”
1>ankit – 2 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

I m using Visual C++ 2008 express edition.
Also when I click on Properties>Configuration and enable emulatiion mode and then when I compile I get the following statement “>NOTE: device emulation mode is deprecated in this release
and will be removed in a future release.”

I m really stuck with this. Pls Help!! thanks

Reply
muni says:

June 6, 2010 at 2:25 am

hi,
i am a beginner of cuda programming.actually i had GeForce 6150SE nForce 430 on my desktop but it is not cuda enabled.then how can i go ahead without cuda enabled gpu.

Reply
- llpanorama says:
  
  June 6, 2010 at 10:00 am
  
  This post actually links to a project file that uses the -deviceemu option to run on a non-GPU machine. Why not use that?
  
  Here’s the link again: ftp://ftp.drivehq.com/llpanorama/CUDA/example1_emu.zip
  
  Reply
Kelson says:

June 3, 2010 at 9:44 pm

Hi Sir,

I’m very new to cuda.

Thank you for you helpful tutorial.
I downloaded the files indicated (not the same version) and try to execute the programs.

The build was ok but when I start the debug (emuDebug, debug, release) I got an exception from VC++ as following:
First-chance exception at 0x7c812afb in matrixMul.exe: Microsoft C++ exception: cudaError at memory location 0x0012fe20..
Microsoft C++ exception: cudaError at memory location 0x0012fc24..
Microsoft C++ exception: cudaError at memory location 0x0012fc80..
…
I try to investigate for the solution but could not solve it.
Your help will be welcome.
Thank you.

Reply
- MengYu says:
  
  June 1, 2011 at 11:44 pm
  
  sir maybe you have use too much of the shared memory which exceed the limit
  
  low the tile_width and try
  
  Reply
Abhishek Dey says:

April 22, 2010 at 11:39 am

Whether programs compiled with -deviceemu mode will utilize CUDA device fully if run on?

Reply
- llpanorama says:
  
  April 22, 2010 at 1:11 pm
  
  I doubt it. Emulation mode can be used to debug your CUDA program on the main CPU and that would be thwarted if the GPU were automatically enabled whenever it is present.
  
  Reply
- cirus says:
  
  May 28, 2010 at 12:33 pm
  
  CUDA is not used properly when -deviceemu is on. In this mode, multiple threads are simulated by a single CPU thread and hence it makes process very slow.
  
  Reply
kavita sharma says:

March 29, 2010 at 12:44 pm

if i have to add 1000 numbers then
how many threads,blocks,warps and no of threads [er block will be their?

Reply
Rohit says:

March 28, 2010 at 9:15 am

Sir can u help me to providing cuda programee to find out the workload characterization of cuda application.

Reply
Krixt says:

March 9, 2010 at 6:45 am

Thanks for the complete information. It’s give me a lot of help.

Reply
sappy says:

March 7, 2010 at 12:59 am

Hey,
Thank you so much for a nicely explained tutorial. The thing is I hate using VStudios whatnot. So, I was wondering if it is possible to use a MinGW (on Codeblocks or NetBeans or even DevCPP) to compile and run CUDA programs with nvcc. If so, any idea on how I can configure them ?

Thanks in advance

PS : Sorry for the repeated post, I accidentally posted in the wrong page.

Reply
gkarlis says:

February 27, 2010 at 3:27 pm

For people with unresolved externals like:
error LNK2019: unresolved external symbol cudaFree referenced in function main

If you are working on an x64 compilation make sure that instead of “C:\Cuda\lib” you have added “C:\Cuda\lib64” in the Additional Library Dependencies.

Reply
Pingback: CUDA dan VS 2008 « notes
PCruz86 says:

January 21, 2010 at 9:12 am

Very helpful article! This cleared all my doubts about starting to use CUDA in VC. I always wanted to use it in my apps. Thanks a lot!

Reply
Balaji says:

January 7, 2010 at 2:01 am

Building CUDA programs has been a pain for me using Visual Studio IDE. Your tutorial very clearly explains the steps. Great article ! I will recommend this for anybody starting to use CUDA.

Thanks.

Reply
Jahanzeb says:

December 9, 2009 at 11:55 am

Thank you so much for such an outstanding article….
It really helped me out to set up and configure my CUDA on VC 2005.

Thanks a lot once again…

Reply
Reply says:

November 26, 2009 at 9:19 pm

“C:\CUDA\bin\nvcc.exe” ccbin “$ (VCInstallDir) bin”-c-D_DEBUG-DWIN32-D_CONSOLE-D_MBCS-Xcompiler/EHsc,/W3,/nologo,/Wp64,/Od,/ Zi,/MTD-I “C:\CUDA\include\”-I./-o$ (ConfigurationName)\Example_1_CUDA.obj Example_1_CUDA.cu

Corect the probléme plz

Reply
Sweawlsishele says:

November 24, 2009 at 5:24 pm

OMG loved reading your blogpost. I added your feed to my blogreader.

Reply
Pingback: Is anyone using CUDA? - Overclock.net - Overclocking.net
theaky says:

November 14, 2009 at 10:11 am

thanks for this article and the comments below. It’s very helpful for me to build my first cuda program. I use X64 machine with window7 and VS2008 compiler.

I have summarized some solutions when I built my first program with x64 machine.

To built with x64, plz make sure that:

properties -> linker -> general -> Additional Library Directories:
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib;C:\CUDA\lib64

properties -> linker -> input -> Additional Dependencies: cudart.lib cutil64D.lib

properties -> linker -> Advanced -> Target Machine: MachineX64 (/MACHINE:X64)

and make a new configuration manager as llpanorama mentioned: “* open the solution explorer, select solution, right click->Configuration Manager.
* go to ‘Active Solution Platform’, click New.
* in the ‘New Solution Platform’ dialog that comes up select the new platform x64. Set ‘Copy Settings From’ to ‘Win32′
* click OK.”

Reply
- Jay says:
  
  January 20, 2010 at 11:50 am
  
  Very helpful!
  
  Reply
- firuze says:
  
  July 18, 2011 at 2:38 pm
  
  hi, i’m new in cuda. i have windows 7 64bit. visual studio 2008 professional. i did your instructions. but it can’t run. an error and warning occured.
  
  LINK : fatal error LNK1181: cannot open input file ‘cutil64D.lib’
  
  would you please help me?
  
  Reply
Pascal says:

October 14, 2009 at 8:47 am

Thanks for this page!!!! Never found such clear explanation, even from NVidia pages or guides.

Reply
Soeren says:

October 12, 2009 at 3:14 am

Great article!

Are you working on other Cuda / OpenCL projects ? I am trying to compile an openFoam solver to OpenCL. If there is more interest, how about a blog/forum for sharing experiences ?

Reply
Whitchurch says:

September 13, 2009 at 6:47 pm

To solve the liner error 1181: this is whati did.

After following all the steps in this tutorial.

If you get linker error do the following:

go to
Tools ->options

in the options window
select
Project and Solutions
choose
VC++ Directories
click on new line icon
manually locate and add cudart.lib and cutil32.lib.
Once those paths are included
The program builds and runs just fine

Reply
Ursil says:

September 7, 2009 at 10:20 am

how can i calculate the Lennard jones Forces LJ with CUDA

Reply
maxphil says:

August 31, 2009 at 8:26 am

I just compiled it sucessfully using SDK 2.3 on Ubuntu Linux 9.04 (jaunty) with emulation:

nvcc -deviceemu example_1.cpp

Reply
Jia says:

August 1, 2009 at 2:51 pm

Thanks for such a great tutorial. very detailed. The comments left by everybody helped me solved a few problems as well. Thanks.

Reply
- Jia says:
  
  August 1, 2009 at 2:53 pm
  
  Great to see so many people developing on cuda. I would like to make some contacts with any of you. If you are interested, my e-mail is jam0cam@yahoo.com
  
  Reply
Abhishek Dey says:

July 29, 2009 at 5:49 am

Many many thanx for this article….it runs smoothly. and its the only tutorial on the internet with such details explanation line by line. Sitting back tight for more such tutorials by you.

Thanx again.

Btw alex…Jus check whether u are using a compatible NVIDIA driver with ur CUDA Version. I’m using 2.3 with 190.16 Beta Drivers. 185.x drivers didnt work for me….

Reply
alex says:

July 9, 2009 at 4:17 am

Using it in Ubuntu. I am compiling with nvcc -o out vekadd.cu und running it with ./out and the result is as followed (no squares):
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000

What I’m doing wrong?

Greets

Reply
- llpanorama says:
  
  July 9, 2009 at 8:30 am
  
  Alex, this is exactly the result I get if I disable my GPU card. Essentially, the a_h array gets initialized with 0..9 but never gets the squared results because the GPU is not running.
  
  I don’t know how to enable/disable your card under linux. You might try running the deviceQuery example program to see if it picks up your GPU card. (See my previous blog entry which does this.)
  
  Reply
tom says:

July 6, 2009 at 10:49 pm

Are you sure your example really runs on the GPU instead of on the CPU?

I think you happy to soon.

Just increase N and/or run it repeatedly, then you see that th GPU stays, cool, while the CPU reports load.

T

Reply
- llpanorama says:
  
  July 7, 2009 at 1:20 pm
  
  Yes, I believe this program runs on the GPU and not the CPU. I can disable my NVIDIA 8600 card and the program computes incorrect results when I do so. The correct results are output once the GPU is re-enabled.
  
  I also compiled the program for CPU-only operation using the -deviceemu option and it computes correct answers regardless of whether the GPU is enabled or disabled.
  
  This program makes very little use of the GPU, even with large N or repeated use (just a single multiplication for each array element). Most of the work involves moving the data from the PC to the GPU card and back under the direction of the CPU. Therefore, it is not surprising that the GPU stays cool and the CPU shows a large load.
  
  I could be wrong, but I would need to see more compelling evidence than you have provided that the program is not actually running in the GPU.
  
  Reply
- Vincent says:
  
  July 7, 2009 at 2:38 pm
  
  If you increase N, then you increase the number of lines displayed in the command window using printf. That’s where the CPU load happens.
  
  Put N = 10000 (for example) and comment the line with “printf”, you won’t see CPU load.
  
  Reply
Chetan Khaladkar says:

May 26, 2009 at 1:03 am

Hi,
thank you for providing me your gmail ID. Last day CUDA was working fine on VC++ 2005 but it had developed some problem.
when I compile the file it gives this message.

1>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Performing Custom Build Step
1>nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
1>Build log was saved at “file://c:\Users\chetan\Desktop\example1_emu1\example1_emu\example1\Debug\BuildLog.htm”
1>example1 – 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

and when i build the project it gives .

>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Performing Custom Build Step
1>nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘.\Debug\example1.obj’
1>Build log was saved at “file://c:\Users\chetan\Desktop\example1_emu1\example1_emu\example1\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

I think there is some problem in Custom Build step .I tried putting quotes (“) for every path i specify but still it is not working. can you pelase help me.

with regards
chetan

Reply
- llpanorama says:
  
  May 26, 2009 at 8:16 am
  
  I don’t know. Maybe it’s the version of CUDA you are using. (I used 1.1 for my example.)
  
  Reply
- Ahmad says:
  
  January 31, 2011 at 4:55 am
  
  i use the following custom bulid command string:
  
  “$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -deviceemu -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I “$(CUDA_INC_PATH)” -o $(ConfigurationName)\example1.obj example1.cu
  
  then remove the #include “stdafx.h” from example.cu source file.
  
  these steps remove the above error in mine case.
  
  Reply
Chetan Khaladkar says:

May 22, 2009 at 10:15 pm

Hi,
amazing tutorial! I really appreciate your effort. Please do continue this.

I followed your steps and I took your program for emulator. But when i build it give “cannot read input file: cutil32D.lib” I have checked i have both the cutil32.lib as well as cutil 32D.lib. and I also tried manually adding the file location. But it is not working. then i deleted that Configuration Properties->linker->Input . but then I am not able to compile your program but simple CUDA programs are compiling. SO what can i do? please guide me.

thank you in advance

Reply
Sonal says:

May 20, 2009 at 5:48 am

I’m facing the following problem:

1>—— Build started: Project: hope, Configuration: Emudebug Win32 ——
1>Linking…
1>.\Emudebug\stdafx.obj : fatal error LNK1112: module machine type ‘X86’ conflicts with target machine type ‘x64’

plz help me out!!

Reply
- llpanorama says:
  
  May 20, 2009 at 8:47 am
  
  You are compiling for a Windows 32 machine, but you are running on a 64-bit Windows machine. Either move your development to a 32-bit Windows PC, or upgrade your CUDA to a version that supports 64-bit Windows.
  
  Reply
Alessandro says:

May 8, 2009 at 9:06 am

I still get an error (WIN XP 64bit, VS2008)

Error 1 fatal error LNK1181: cannot open input file ‘.\Debug\example1.obj’ CUDA_ex1b CUDA_ex1b

Do you know what it means?

Reply
kkapron says:

April 29, 2009 at 2:34 pm

I’ve tried it on linux and it works, just simple: nvcc example.cu

Reply
cakrud says:

April 6, 2009 at 9:44 pm

I have a error:

—— Build started: Project: example1, Configuration: Debug Win32 ——
Performing Custom Build Step
Project : error PRJ0002 : Error result -1073741510 returned from ‘C:\WINDOWS\system32\cmd.exe’.
Build log was saved at “file://c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\example1_emu\example1\Debug\BuildLog.htm”
example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Reply
cakrud says:

April 6, 2009 at 9:42 pm

I have error:

—— Build started: Project: example1, Configuration: Debug Win32 ——
Performing Custom Build Step
Project : error PRJ0002 : Error result -1073741510 returned from ‘C:\WINDOWS\system32\cmd.exe’.
Build log was saved at “file://c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\example1_emu\example1\Debug\BuildLog.htm”
example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Reply
lam says:

March 27, 2009 at 10:16 am

Hi, I am trying to compile your exemple in VISTA 64 and I got the error: “>nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘c:/Program File” do you have nay advise?

Reply
- llpanorama says:
  
  March 28, 2009 at 12:07 am
  
  Do a search for CUDA, Vista and 64-bits. I’m sure I’ve seen people talk about running CUDA on 64-bit XP.
  
  Reply
- aldebaran says:
  
  June 2, 2009 at 10:37 pm
  
  Here’s a solution for Visual Studio:
  
  nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 8/VC/bin
  
  This is probably because Visual Studio doesn’t install “X64 Compilers and Tools” by default, so you should go to Control Panel, Programs and Features, select visual studio, add or remove features, and select “X64 Compilers and Tools” under Visual C++.
  
  Just notice that I think this solution doesn’t work for Express Editions because it doesn’t have support for x64.
  
  I hope this info helps.-
  
  Reply
  - siddharth says:
    
    October 1, 2009 at 4:44 am
    
    hey thanks! been banging my head.. installing the x64 components compiled instantly.. 😉
  - Anton says:
    
    November 28, 2009 at 2:23 am
    
    Thanks!
    
    it helped me too!
MIA says:

March 19, 2009 at 11:13 am

Hello thx a lot for this helpful article.
i follow ur steps one by one , but when i build the project ,it blocks at this step :
”
1>—— Build started: Project: example1, Configuration: Debug x32 ——
1>Performing Custom Build Step
1>example1.cu
1>tmpxft_00000be0_00000000-3_example1.cudafe1.gpu
1>tmpxft_00000be0_00000000-8_example1.cudafe2.gpu
”
and i don t see the cause !!!
when i build the cuda program directly from cmd with “nvcc -deviceemu -o exp1 exemple1.cu” it can generate the .exe file and it gives same result !!!! do u have any ideas about this prob????

Reply
- llpanorama says:
  
  March 19, 2009 at 9:24 pm
  
  Have you tried getting the source and project files and compiling it directly from those? Are you using CUDA 1.1?
  
  Reply
- MIA says:
  
  March 20, 2009 at 9:08 pm
  
  thx for ur answer ,yes i already try to get source of project ,it work perfectly in emulation mode (debugemu vc++ 8 or using cmd “nvcc -deviceemu….)
  i use CUDA 2.0 / MSVC 8 /Gforce 8600 😦
  
  Reply
  - llpanorama says:
    
    March 20, 2009 at 10:27 pm
    
    You should try CUDA 1.1 since that is the version used with my example.
  - MIA says:
    
    March 21, 2009 at 4:25 am
    
    OK i will try ur advice,thx a lot for ur answer
  - MIA says:
    
    March 23, 2009 at 10:18 am
    
    thx for ur cooperation ,i tryed CUDA 1.1 and now the program works perfectly.
llpanorama says:

March 14, 2009 at 7:22 pm

Up in the main article, I have added a link to a project zip file with the -deviceemu option.

Reply
robert says:

March 14, 2009 at 1:08 pm

would it be possible that you post your program as a zip file with the visual studio 2005 set up with emulation and debug?

perhaps this would be a simple solution for all of us here.

thanks

Reply
robert says:

March 14, 2009 at 1:06 pm

Hi;

I have the same problem as one of your other users. I switched on the emumode as suggested, but I still get
the output
1 1.0000
2 2.0000
3 3.0000
etc, etc etc.

Any Suggestions

Reply
- vo says:
  
  March 28, 2009 at 11:37 pm
  
  me too…any ideas what’s causing this?
  
  Reply
Chathuri says:

March 11, 2009 at 10:00 am

My program runs fine in emudebug mode, but when I try to run it on Debug (using the actual GPU, because my goal is to run the program for more than 1,000,000 threads) I end up with following two problems:
1. I have memcopy from host to device, but when I debug the structure doesn’t get copied.
cutilSafeCall( cudaMemcpy(layerIni,layer, 1 * sizeof(TissueStruct), cudaMemcpyHostToDevice) );
2. The kernel execution fails with cudaThreadSynchronize error : unspecified launch failure.
Can you please help me in this case.
The device I am using is GeForce GTX 260.

Reply
- llpanorama says:
  
  March 16, 2009 at 9:28 am
  
  No idea what the problem is. I suggest you try asking on the CUDA forums.
  
  Reply
Ho Xung Lenh says:

February 19, 2009 at 1:00 am

I am getting the following message:

cudaSafeCall() Runtime API error in file , line 59: feature is not yet implemented.

line 59 of that piece of code says

cutilSafeCall(cudaGetDeviceProperties(&deviceProp, dev));

Reply
Ho Xung Lenh says:

February 19, 2009 at 12:56 am

Thanks for your advice.
Actually, I do not have the CUDA graphic cards on my machine, so I must use the emulator mode. I also tried to followed your steps in the previous post, but I can do the step 1 and 2. I can not do the step 3, which is about installing the Driver. It says that it could not locate any drivers compatible with the current hardware. The DeviceQuery compiles fine but it could not run. The debug mode show the following information when running:

‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\deviceQuery.exe’, Symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\WINDOWS\system32\ntdll.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\WINDOWS\system32\kernel32.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\cutil32D.dll’, No symbols loaded.
First-chance exception at 0x7c812a5b in deviceQuery.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fd2c..
First-chance exception at 0x7c812a5b in deviceQuery.exe: Microsoft C++ exception: cudaError at memory location 0x0012fd7c..
The program ‘[464] deviceQuery.exe: Native’ has exited with code 1 (0x1).

Also note: I use Visual Studio .NET 2005.

Reply
llpanorama says:

February 18, 2009 at 11:36 am

Have you downloaded my source files and tried to compile and run those? Have you tried the DeviceQuery example in my previous post about setting-up the CUDA tools?

Reply
Ho Xung Lenh says:

February 18, 2009 at 1:02 am

PS: The debug mode shows the following information:

‘example2.exe’: Loaded ‘C:\Documents and Settings\Tuan Anh NGUYEN\My Documents\Visual Studio 2005\Projects\example2\debug\example2.exe’, Symbols loaded.
‘example2.exe’: Loaded ‘C:\WINDOWS\system32\ntdll.dll’, No symbols loaded.
‘example2.exe’: Loaded ‘C:\WINDOWS\system32\kernel32.dll’, No symbols loaded.
‘example2.exe’: Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe5c..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012feac..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe54..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fea4..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe44..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fe94..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe54..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fea4..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe60..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012feb0..
The program ‘[1932] example2.exe: Native’ has exited with code 0 (0x0).

Please help me in this case.

Reply
Ho Xung Lenh says:

February 18, 2009 at 12:56 am

Hi,

I follow your step (with -deviceemu option): it compiled fine but the result is wrong:

0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000

The full command is:

“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I ” $(CUDA_INC_PATH)” -I./ -o -deviceemu $(ConfigurationName)\example1.obj example1.cu

Can you help me for this problem ? I use SDK and Toolkit 1.1

Thanks

Reply
llpanorama says:

January 29, 2009 at 5:08 pm

Marc:

I believe the indices in the program all start at zero, so the lowest array index is 0 * 4 + 0 = 0.

Reply
Marc says:

January 29, 2009 at 4:08 pm

First of all, thank you for an awesome article!

I’m a bit confused about the inner workings of the kernel function.

int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];

so blockIdx ranges from 1 to 3 (3 blocks, given by n_blocks), and blockDim is 4.

so 1 * 4 + 1 = 5 is the lowest array index you can get.

What am I not understanding correctly here?

Reply
Asim says:

January 6, 2009 at 11:46 pm

Ivan Dj …

Please use the following: I just removed the qoutation marks from this statement and it worked 🙂 …

Configuration Properties → Linker -> General:
Additional Library Directories = C:\CUDA\lib;C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib

Reply
llpanorama says:

December 20, 2008 at 10:46 am

Ivan:

I used Visual Studio 2005 and CUDA 1.1. You’re using VS 2008. Go back and use VS 2005 and maybe then the example will work for you. Or find the cudart.lib file on your system and update the linkage paths so it will be found.

Reply
Ivan Dj says:

December 20, 2008 at 9:19 am

Hello!
please help quickly. I have the following problem:
1>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘cudart.lib’
1>Build log was saved at “file://e:\Software Projects\Visual Studio 2008 projects\C++\CUDA\example1\example1\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

What else do I need to do, I did everything like you said in tutorial

Reply
Jon says:

December 11, 2008 at 3:22 pm

Thanks for this! With just a little bit of tweaking, I was able to get this code to work under linux without issue. Matter of fact, it was easier to do, I think.

In my case, all I had to do was comment out the stdafx.h include, rename the source to a .cu file, then compile it with nvcc. This created an a.out file that worked first time through!

Reply
Krishna says:

November 4, 2008 at 8:12 am

it was really useful…
thanks alot 🙂

Reply
Song says:

October 30, 2008 at 12:47 pm

Muchas grasias~!

Reply
Pingback: Desenvolvimendo com CUDA no Ubuntu 8.04 « John Tortugo
llpanorama says:

October 21, 2008 at 6:26 am

I would do the obvious and install CUDA 1.1 and see if the error still occurs. If it does, then there is a problem when you setup the project. Otherwise, there is a problem when using CUDA 2.0.

Reply
- binay kumar pandey says:
  
  September 6, 2010 at 1:49 am
  
  im getting error a tool returned error code from “compiling” i get struck there. pls help me give ur suggestion in my mail also
  
  Reply
sky says:

October 21, 2008 at 12:05 am

the tutorial is wonderful. Unfortunately, I got the error from vs2005,
Error 1 error PRJ0019: A tool returned an error code from “Performing Custom Build Step”
I dont know how to figure our it. Could you help me if possible? thank u a lot. By the way,I used CUDA2.0, Is it trouble with that version?

Reply
N says:

September 26, 2008 at 7:58 am

Thank you, wonderful article.

Reply
J.F. Garamendi says:

September 10, 2008 at 10:36 am

Thanks for this great tutorial. I used Vista x64 and works very well.

Thanks a lot.

Reply
Robert Evrae says:

September 9, 2008 at 3:03 pm

When you are messing around with the properties, it might be advantagous to replace all uses of “example1” with “$(InputName)” (without quotes).

This means that the project created can easily be reused just by renaming the files involved, and not requiring that you manually fiddle with the properties every time.

Great guide by the way! I just got bugged having to constantly change those variables, so I went hunting for an alternative.

Reply
Josue says:

August 18, 2008 at 1:58 pm

Thanks for the help, GREAT TUTORIAL
The project can be compiled and run. Here’s the result:

0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000

:)))))))Thanks for article!!!!!!

Reply
llpanorama says:

August 18, 2008 at 7:59 am

Here is some info I found about compiling for 64-bit Windows on the Nvidia forums:

I ended up getting it to work by following the instructions under “How To Create 64-bit apps” at http://blogs.msdn.com/deeptanshuv/archive/…/11/573795.aspx

In summary I had to:

* List cutil64D.lib instead of cutil32D.lib under Project Properties -> Configuration Properties -> Linker -> Input -> Additional Dependancies
* Change from the MachineX86 to MachineX64 option under Project Properties -> Configuration Properties -> Linker -> Advanced -> Target Machine
* open the solution explorer, select solution, right click->Configuration Manager.
* go to ‘Active Solution Platform’, click New.
* in the ‘New Solution Platform’ dialog that comes up select the new platform x64. Set ‘Copy Settings From’ to ‘Win32’
* click OK.

And if I do this before writing a project it seems to build properly.

Reply
Josue says:

August 15, 2008 at 9:13 am

OK.my bad.I change the Active solution platform Win32 to x64.But now the problem is other:

1>—— Build started: Project: example1, Configuration: Debug x64 ——
1>Performing Custom Build Step
1>example1.cu
1>tmpxft_00000be0_00000000-3_example1.cudafe1.gpu
1>tmpxft_00000be0_00000000-8_example1.cudafe2.gpu
1>tmpxft_00000be0_00000000-3_example1.cudafe1.cpp
1>tmpxft_00000be0_00000000-12_example1.ii
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘cutil32D.lib’
1>Build log was saved at “file://c:\Documents and Settings\jacevedo\Desktop\example1\example1\example1\x64\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

and this library is not in C:\CUDA\lib

Can somebody give me advise how to resolve this problem? Thanks.

Reply
Josue says:

August 15, 2008 at 8:38 am

hi,Great tutorial
I did all the steps.
But I got a fatal error when i tried to build it (compile) using VS2005 in a XP x64
This is what shows:

1>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Compiling…
1>stdafx.cpp
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘cudart.lib’
1>Build log was saved at “file://c:\Documents and Settings\jacevedo\Desktop\example1\example1\example1\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

Can somebody give me advise how to resolve this problem? Thanks.

Reply
llpanorama says:

August 1, 2008 at 2:36 pm

The result you are getting is the same thing that happens if I try to run the program in non-emulated mode with my 8600 card disabled. That is similar to trying to run the code on a non-CUDA device such as your Radeon. Are you sure you are running the emulated version of the program?

Reply
Zebiloute says:

August 1, 2008 at 8:13 am

Great tutorial.

I have a question. i try to use this cool sample without NVDIA gpu (i use ATI radeon). So it’s compil (with -deviceemu option) fine but when i try to launch the exe file the result it’s strong. See below :

0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000

There is no square computing. Thanks.

Reply
llpanorama says:

July 31, 2008 at 9:59 am

Marek:

When you install CUDA, the installer should create all the CUDA… environment variables. Then, in the Visual Studio project, you have to create all the configuration properties so the correct compiler is called as I did in the example shown above.

Also, I’m not sure if CUDA supports VS2008. I know the version I am using (1.1) doesn’t. That may have changed. Check the Nvidia forums for more information.

Reply
Marek says:

July 29, 2008 at 10:57 am

Hi. I have a problem with compilation CUDA programs in Visual Studio 2008. It doesn’t know CUDA… variables so for example it doesn’t find CUDA compiler. Can somebode give me advise how to resolve this problem? Thanks.

Reply
llpanorama says:

July 17, 2008 at 6:23 am

Peter:

Use the -deviceemu compiler option as shown in the second-to-last paragraph of this article. This will create an executable that uses the CUDA emulator instead of a graphics card.

Reply
Hyunhojo says:

July 17, 2008 at 6:05 am

Thank you

I have a mistake, because I copy option directly to visual studio

“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu

but upper option double quotes wasn`t correctly copied.

Reply
peter says:

July 16, 2008 at 9:12 pm

I was wondering if you knew how to set up the emulator on XP so I dont have to buy an new graphics card. I cant seem to find anything on the internet.

Reply
Sumesh says:

July 14, 2008 at 11:27 pm

This article is very helpful 🙂

Reply
llpanorama says:

July 14, 2008 at 8:23 am

Satakarni:

I know that a block of threads will be executed on a single multiprocessor and multiple blocks can be assigned to each multiprocessor. Other than that, I don’t know of any static relation between grid elements and multiprocessors.

I don’t see anything in the CUDA API that lets you select the number of processors that will be used to run your code. You might ask on the CUDA forum and see if anyone knows of a switch to do this.

Reply
Satakarni says:

July 13, 2008 at 7:37 pm

As we know `<<>>’ is required for calling Kernel to execute on GPUs (or device), where Dg meant for grid size, Db for block size (and number of threads), and optional Ns for memory allocation.

However I would like to know how the number of processors and grids are related?

For example, I am using Tesla C870. which has 16 multiprocessors with each multiprocessor having 8 processors. total 128 processors. I want to scale my program my testing it on 16, 32, 48, and so on up to 128 processors. How can I archive this with CUDA programming?

( I thought that there must be some relation with grid and/or block size used in the program with the number of processors in the GPU card.)

Kindly let me know.

With Regards,
Satakarni

Reply
m ravi kuar says:

June 24, 2008 at 6:55 am

Explanation was excellent, and I found interesting and helpful for my Work. Keep writing my dear friend.

Reply
kyzhao says:

June 21, 2008 at 5:54 pm

You can download new version of the CUDA wizard for the VS Express.

url:
http://forums.nvidia.com/index.php?showtopic=69183

Reply
samsam99 says:

June 19, 2008 at 12:59 pm

Thank you for the article.

Reply
Fatih says:

June 15, 2008 at 6:21 am

Thank you for this article,it is very helpful.

Reply
Pingback: Threads and blocks and grids, oh my! « /// Parallel Panorama ///
amput says:

May 28, 2008 at 12:17 pm

Thank you very much for this article. It really helped me. Continue writing.

Reply
kurt says:

May 23, 2008 at 6:21 pm

This is extremely helpful for those of us wanting to start from scratch (which is the only way I can learn anything).
Great work buddy! Particularly the painstaking details given on how to configure VS for cu, etc.

Reply
Anton says:

May 22, 2008 at 8:05 am

Many thanks for article, it is written very clear, to read and understand !
I am interested in that direction very, but there is no good information enough ! Please do not stop and write more !!!!
I would be very glad to see the same detailed example with using MersenneTwister and Montecarlo !!!

Thanks for article!!!!!!

Reply