My first CUDA program!
May 21, 2008 184 Comments
Note: Check out “CUDA Gets Easier” for a simpler way to create CUDA projects in Visual Studio.
I got CUDA setup and running with Visual C++ 2005 Express Edition in my previous post. Now I’ll write my first CUDA program. It’s a modification of an example program from a great series of articles on CUDA by Rob Farber published in Dr. Dobbs Journal. Rob does his examples in a make-based build environment; I’ll show how to build a CUDA program in the Visual C++ IDE.
Simple CUDA programs have a basic flow:
- The host initializes an array with data.
- The array is copied from the host to the memory on the CUDA device.
- The CUDA device operates on the data in the array.
- The array is copied back to the host.
My first CUDA program, shown below, follows this flow. It takes an array and squares each element. I can barely contain my excitement.
// example1.cpp : Defines the entry point for the console application. // #include "stdafx.h" #include <stdio.h> #include <cuda.h> // Kernel that executes on the CUDA device __global__ void square_array(float *a, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx<N) a[idx] = a[idx] * a[idx]; } // main routine that executes on the host int main(void) { float *a_h, *a_d; // Pointer to host & device arrays const int N = 10; // Number of elements in arrays size_t size = N * sizeof(float); a_h = (float *)malloc(size); // Allocate array on host cudaMalloc((void **) &a_d, size); // Allocate array on device // Initialize host array and copy it to CUDA device for (int i=0; i<N; i++) a_h[i] = (float)i; cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice); // Do calculation on device: int block_size = 4; int n_blocks = N/block_size + (N%block_size == 0 ? 0:1); square_array <<< n_blocks, block_size >>> (a_d, N); // Retrieve result from device and store it in host array cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost); // Print results for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]); // Cleanup free(a_h); cudaFree(a_d); }
Two pointers are declared on line 19 of the main routine: a_h points to the array that is stored on the host, while a_d points to the array on the CUDA device. The a_h array is allocated in the host memory on line 22 using the standard malloc subroutine, but a_d is allocated in the CUDA device memory using the cudaMalloc subroutine found in the CUDA API (line 23). (Note that a pointer to the a_d pointer is passed to cudaMalloc so it can store the address of the array in a_d.)
In order to create some values to operate upon, each element in the host array is initialized with its array index (line 25). Then the cudaMemcpy subroutine is used to copy a_h from the host into a_d on the CUDA device. (The cudaMemcpyHostToDevice flag, defined in the API, indicates the direction of the transfer.)
In lines 28-30, the host initiates the execution of the kernel function, square_array, on the CUDA device. A CUDA device contains individual processing elements, each of which can execute a thread. A number of the processing elements are grouped together to form a block, and a number of blocks constitutes a grid. In this example, the number of threads per block is set to four (line 28). Then the total number of blocks that are needed to get enough threads to square each array element is calculated on line 29. (For ten array elements, three blocks each with four threads are needed.) On line 30, the host initiates the kernel function on the CUDA device. The number of blocks and the number of threads in each block are indicated between the <<<…>>> following the kernel name. (This information is picked up by the Nvidia compiler, nvcc, and is used when generating the instructions that start the kernel on the CUDA device. More on nvcc, later.) Following that, the standard argument list to square_array contains a pointer to the array in the CUDA device memory and the number of elements in the array.
The kernel is shown on lines 10-14. The __global__ keyword indicates that this is a kernel function that should be processed by nvcc to create machine code that executes on the CUDA device, not the host. In this example, each thread will execute the same kernel function and will operate upon only a single array element. Each thread is distinguished from all the others by block and thread indices that can be used to determine the array element the thread will access. On line 12, the array index is found by multiplying the thread’s block index (blockIdx.x) by the number of threads in each block (blockDim.x) and then adding the index of the thread within the block (threadIdx.x). If the index is within the bounds of the array, then the corresponding array element is squared (line 13).
Immediately after starting the kernel, the host begins a transfer of the data from the array in the CUDA device memory back to the array in the host memory (line 32). This transfer is delayed until the CUDA device has finished executing the kernel, so there is no chance of getting data that has not been processed yet. Then the host displays the contents of the array (line 34) and frees the array memory on both itself and the CUDA device (line 36).
At this point, I have a CUDA-enabled program, but I don’t have it integrated into a Visual C++ project. It actually takes a bit of work to do that. To start, I brought up the Visual C++ 2005 Express Edition IDE and clicked on the New Project button (you can also use File→New→Project… from the menu). In the New Project window, I selected Win32 as the project type and Win32 Console Application as the template. I gave the project the creative name of example1 and set its location to the C:\llpanorama\CUDA\examples directory. After clicking OK in the New Project window, and then clicking Finish in the Win32 Application Wizard window, a window opened with a simple code skeleton. I replaced the code skeleton with the code shown above.
After saving the code, I right-clicked the example1.cpp file, selected Rename from the drop-down menu and renamed the file to example1.cu. Files with the .cu extension are intended to be processed by nvcc. nvcc will extract the kernel portion of example1.cu and compile it for execution on the CUDA device while using the Visual C++ compiler to compile the remainder of the file for execution on the host.
In its default configuration, Visual C++ doesn’t know how to compile .cu file. It has to be told explicitly how to do this using a Custom Build Step. This is done by right-clicking on the example1.cu file and selecting Properties from the drop-down menu. In the Property Pages window that appears, set the Custom Build Step command line as follows:
Configuration Properties → Custom Build Step → General:
Command Line =
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu
What does this command line do? Let’s break it down piece-by-piece:
“$(CUDA_BIN_PATH)\nvcc.exe”: The location of the nvcc compiler.
-ccbin “$(VCInstallDir)bin”: The location of the Visual C++ compiler.
-c: The compilation will proceed all the way to the generation of an object file (.obj extension).
-D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS: Various macro definitions.
-Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd: Various options that are passed by nvcc directly to the Visual C++ compiler.
-I”$(CUDA_INC_PATH)”: Look in the CUDA include directories for needed header files.
-I./: Look in the current directory for needed header files.
-o $(ConfigurationName)\example1.obj: The location and name of the resulting object file.
example1.cu: The source file that the compiler will work on.
In addition to setting the command line for the example1.cu file, the location of the output file is specified as follows:
Configuration Properties → Custom Build Step → General:
Outputs = $(ConfigurationName)\example1.obj
After setting the file properties, the properties for the example1 project have to be modified. Here are the project property settings I used for the Debug configuration:
Configuration Properties → C/C++ → General:
Additional Include Directories = $(CUDA_INC_PATH);”C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\inc”Configuration Properties → C/C++ → General:
Debug Information Format = Program Database (/Zi)Configuration Properties → C/C++ → Code Generation:
Runtime Library = Multi-threaded Debug (/MTd)Configuration Properties → Linker → General:
Enable incremental linking = No (/INCREMENTAL:NO)Configuration Properties → Linker -> General:
Additional Library Directories = “C:\CUDA\lib”;”C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib”Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib cutil32D.libConfiguration Properties → Linker → Optimization:
Enable COMDAT folding = Do Not Remove Redundant COMDATs (/OPT:NOICF)
Now the project can be compiled and run. Here’s the result:
0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000
I told you it was exciting! Well, at least it’s right.
In order to compile the Release configuration, a few changes need to be made to the file and project properties. For the example1.cu file,the Custom Build Step command line has to be changed to remove the _DEBUG macro definition, enable compiler optimization, and link with the Release runtime library:
Configuration Properties → Custom Build Step → General:
Command Line =
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/O2,/Zi,/MT -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu
The project properties that have to be changed in the Release configuration are the linking for the runtime library and the use of the non-debug version of the CUDA utilities library:
Configuration Properties → C/C++ → Code Generation:
Runtime Library = Multi-threaded (/MT)Configuration Properties → Linker → Input:
Additional Dependencies = cudart.lib cutil32.lib
Once those changes are made, the Release version of the example1 project can be compiled and run. It will output the same exciting result.
Here’s the source code for this example if you want to try it.
Don’t have a CUDA-capable GPU board on your PC but still want to try running this program? Easy! Just add the following option to the Custom Build Step command line: -deviceemu. This will link-in a CUDA device emulator that runs on the host. The emulator becomes the target for all the CUDA API calls and executes the kernel. The program will run just like a CUDA device is there, except slower. (Here is the project file with the -deviceemu option.)
So I’ve written my first CUDA program and gotten it to compile using Visual C++ 2005 Express Edition. Setting up the compilation options was as much (more?) work as writing the program, so you might be interested in a CUDA template for Visual C++ 2005 written by kyzhao. The installer doesn’t work for me (maybe because I’m using the free Express Edition), but it might help you.
Pingback: C#: Perform Operations on GPU, not CPU (Calculate Pi) [closed]
index Blockchain Game
fatal error: stdafx.h: No such file or directory
#include “stdafx.h”
Pingback: CUDA dan VS 2008 | teguhsatria.com
I found your weblog site on google and test a couple of of your early posts. Proceed to keep up the excellent operate. I simply additional up your RSS feed to my MSN News Reader. Looking for forward to reading extra from you in a while!…
Very interesting details you have mentioned, thanks for putting up.
Fantastic blog you have here but I was curious about if you knew of any discussion boards that cover the same topics talked about here? I’d really love to be a part of online community where I can get feedback from other knowledgeable people that share the same interest. If you have any recommendations, please let me know. Thanks a lot!
Very very good intro to CUDA. Your step by step instructions are so clear I could picture everything as I was reading.You will be a very good author of a book on this. If you have the desire,don’t look back.
I am getting interested in CUDA because I am a great fun of writing high performance applications. I have made so many sql server applications run fast inside sql server and now I feel the I need to step outside sql server. I also feel I have hit the roof with multithread applications in C#.Something more challenging. Major hurdle is C++.!!
Kudos!
Nice!
Бронхиальная астма представляет собой хроническое воспалительное заболевание дыхательных путей. Заболевание бронхиальная астма чаще всего возникает в детском возрасте. Бронхиальная астма у детей может проявляться неодинаково. Заболевание может дать о себе знать затрудненным или свистящим дыханием, приступами удушья, одышкой, кашлем, стеснением в груди. Когда поставлен диагноз бронхиальная астма лечение оптимально начинать незамедлительно. Существует множество способов как лечить бронхиальную астму. При этом особого подхода требует лечение астмы у детей. Особенно результативным показывает себя лечение астмы современными методами. Новейшие методы лечения астмы помогут Вам забыть об этом заболевании. В частности, прекрасно зарекомендовало себя лечение астмы по методу цубо, который пришел к нам из Японии, где применяется уже сотни лет.
Asthma.In.Ua – Бронхиальная астма: лечение астмы по методу цубо
Типография Printing House. Мы занимаемся полиграфией с 2008 года, и всегда обращены лицом к нашим заказчикам, открыты для общения. http://pr-home.ru/broshyury.html – печать журналов
Pingback: Fix Cuda Nvcc Error Windows XP, Vista, 7, 8 [Solved]
Spot on with this write-up, I really believe this web site needs much more attention. I’ll probably be
back again to see more, thanks for the advice!
I am in fact pleased to read this webpage posts which consists of tons
of valuable facts, thanks for providing these kinds of information.
Сайтдетское веселье о детях для родителей
I don’t even understand how I ended up right here, but I thought this put up
was once good. I don’t know who you are however certainly
you’re going to a well-known blogger for those who aren’t already.
Cheers!
Hello again. I tried to get the source program using firefox . It did not work¡¡¡ Please.
Hi, ¿can i get source code for this example? The link is no longer.
Hey there! Do you know if they make any plugins to safeguard
against hackers? I’m kinda paranoid about losing everything I’ve worked hard on. Any tips?
obviously like your website but you have to test
the spelling on several of your posts. Many of them are rife with spelling problems and I find it very troublesome to tell the reality nevertheless
I will certainly come again again.
Pingback: Ejemplos de programación de GPGPU en CUDA | Todo lo que buscabas en Programación
garcinia Cambogia Reviews – GCE likewise assists to boost
the serotonin material in the brain. It decreases manufacturing of terrible cholesterol
and enhances the production of superb cholesterol levels.
Generally I do not read post on blogs, however I wish to say that this write-up very forced me to try and
do it! Your writing taste has been amazed me. Thank you,
very great article.
I’m gone to inform my little brother, that he should also visit this blog on regular basis to get updated from latest information.
Thank you for article. It is written very clear
replica watches
reputable replica watch sites
Pingback: CUDA+Visual Studio-NVidea NSight | Мои IT-заметки
HI every one:
i am new in CUDA programming i need the help for my graduated application
thanks
Pingback: _ITERATOR_DEBUG_LEVEL of Cuda (NVCC) compiler? | PHP Developer Resource
i m having cuda toolkit 2.2 ,windows 7, visual studio 8
so what changes do i have to do?
Hi. Many many thanks for the article. I have been experimenting with CUDA on VS for while now. I have had to reformat my system many times in the past 2 years and everytime I have to set up VStudio and CUDA, I have this page saved in my bookmarks. I just wanted to add a little more to this info. Recently Microsoft launched VStudio 2011 beta. My project involves using managed windows and cuda. While Vstudio 2010 does not have intellisense for managed systems, Vstudio 2011 beta is not supported by Nvidia yet. So what I did was to set up my project using VS 2010 and then copied the contents of ‘C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\BuildCustomizations’ into ‘C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V110\BuildCustomizations’. Now I can open my VS 2010 projects in VS 2011 and enjoy the comfort of intellisense. I have not invented any new kinda rocket here but thought will help somebody on the way.
Hi there,
thanks for the posting. It really helps. But I thought since
I was new to cuda and visual studio and I manage to run the above program using
Visual studio prof 2010 ( I think express will work as well)
Cuda 4.1 without the Nsight parallel, it may helps the other users of CUDA and VS 2010.
once you have created the example1.cpp as described at the beginning of this blog.
Right click example1 (the project on your left under Solution explorer)
Build customization and check Cuda 4.1
Then right click example1.cpp (C++ file now) ->rename example1.cu
right click again example1.cu properties->general ->
item properties change to Cuda C/C++, click on apply.
Expand Cuda C\C++ (on the left under general) , click common ,
in additional directories put
./
../../common/inc
../../../shared/inc
Now go back to example1 project
right click the project expand configuration properties, in custom build step-change outputs to
$(Configuration Name)\example1.obj
right click the project expand configuration properties, in VC++ directories-add to include directories
the pathe where the inc are , for me it is
$(Configuration Name)\example1.obj
Hi,
I am trying to compile my program via nvcc by command line.I tried different ways to use cygwin or dos,I only managed to create the .obj file.It doesn’t create the .exe.I don’t know how to link them.I am doing this by using visual studio as well.
Could you please help me .
Бесплатная Юридическая Консультация http://jurspravki.ru/ – Правовая Инициатива.
Thanks for the article. Very helpful and clear.
i want to use juda in emulation how can i do that, can any one help me
thank you
I am working on non GPU based machine
with VC++ 2005 Express edition
CUDA 1.1 toolkit
CUDA 1.1 SDK
It compile successfully, but gives following o/p when I run the solution.
Please help
‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\deviceQuery.exe’, Symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\Windows\System32\ntdll.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\Windows\System32\kernel32.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\Program Files\AVAST Software\Avast\snxhk.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\cutil32D.dll’, No symbols loaded.
First-chance exception at 0x7760fc56 in deviceQuery.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fcf0..
First-chance exception at 0x7760fc56 in deviceQuery.exe: Microsoft C++ exception: cudaError at memory location 0x0012fd40..
The program ‘[5372] deviceQuery.exe: Native’ has exited with code 1 (0x1).
I am not seeing CUDA C/C++ itemtype to select in cudafile property pages
I have either C/C++ or I have CUDA as item type.
I have included nvcc.rules in custom build rules.
Please help !!!
Thanks,
N
It’s working perfectly. Thank you
nice one! those settings completely ruined every other c++ project! thanks
You’re welcome! That’s me – always trying to help!
Your other c++ projects probably sucked, anyway. You’re better off without them.
Can any one say me how to start from python orange library in CUDA ?
Pingback: Informatikdienstleistungen.de » Archive » Tutorial: Parallele Berechnungen mit CUDA, ein Beispiel
СОНАР – запчасти к технике.http://www.sonarr.ru/
I got:
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 16.000000
5 5.000000
6 6.000000
7 7.000000
8 64.000000
9 9.000000
but i don´t understand.. any idea please? (i’m beginner, sorry)
Sorry, I was wrong to write the array index calculation in the kernel f’n
We have 3 blocks of 4 threads each. Let B(x,y) a block, then we have B(0,0), B(1,0), B(2,0) and if let Ti(x,y) a thread of Bi then for B(0,0) we have T(0,0), T(1,0), T(2,0), T(3,0) .. and so on. Note that we have 12 threads where we can calculate the square of up to 12 array’s elements. For that reason we must apply the (idx <N) condition in the kernel function. e.g. the 5th array element is one of idx = 1 (second block) * 4 (number of threads per block) + 1 (second thread of the block).
I am biginner on Visual studio , can ony one tell why I can not run a program , I always see that message ” fatal error LNK1104: cannot open file ‘.\Debug\example1.obj”
create a new project under win 32 console application.i had the same problem, but v solved it. give me ur email id, i can mail u the whole procedure
Hi, I have the same problem. I tried to solve the problem several days, however I could not find the solution. Could you mail me the procedure! Thanks a lot! my E-mail address is wzymail@vip.qq.com
Hoping your reply!
best wishes!
my email id : manovancha@gmail.com
please send me whole procedure how to compile and run the above program using visual c++ IDE.
my email ID: enkhbolora@gmail.com
please send me too whole procedure from how to create new project in visual studio 2005.
I am new in cuda. please send me whole procedure how to compile and run the above program.my email id : akhtari09.aj@gmail.com
Pingback: CUDA e o Visual Studio 2008 « Diário de Pesquisas – Pesquisas em Computação
Very comprehensive article. Thank you very much
I am just starting with CUDA. I have a GeForce 6150SE nForce 430 on my desktop. All the samples in the toolkit say min req is GeForce 8. Although, everything seems to be installed well for 64 bit.
Here is my problem:
I opened bandwidthTest_vc90.sln in Visual c++2008. It needs cutil32D.lib and shrUtils32D.lib.
I do not have these files anywhere on my computer. I do have cutil32.lib, cUtil64.lib and cutil64D.lib.
Is there anything that I should change on the properties in VC project, that would help me compile?
Жаль, что сейчас не могу высказаться – опаздываю на встречу. Но освобожусь – обязательно напишу что я думаю.
I had started programming in CUDA but dont know how to decompose RANDOM FOREST algorithm in CUDA so please help me .
Can you please post the CUDA code for dijkstra’s algorithm..!
Pleashe saar.
can anyone tell me the procedure to run cuda programs on the emulator mode??
The -deviceemu option isn’t used any more. Now you have to link against a CUDA library that implements the emulator. This is described in the “CUDA Gets Easier” entry of this blog (although I’m not sure this really is easier).
Pingback: CUDA troubles
First off, I have to say this was fairly helpful, and helped me get started.
Secondly, you write some fuck-ugly code 🙂
Thanks! I’ll let you know when I start my “Social Etiquette for Coders” blog.
somebody plz tel me, how do we calculate optimization vs cpu? r there any cuda instructions for it?
somebody plz tel me, how do we calculate optimization vs cpu? r there any cuda instructions for it?
Time the operation on your CPU. Then time it using the GPU. The ratio of the times is your speedup.
У меня такой вопрос к вам сразу Кто-ниюужь использовал это в своей жизни? результаты есть? а то ж неизвестно чем все сие кончится, хотелось бы из рук практика услышать
Nice post.
i am getting the error
LINK : fatal error LNK1181: cannot open input file ‘cutil32D.lib’
then i just deleted that file from the linker and now i am getting error
LINK : fatal error LNK1181: cannot open input file .obj
can any plz solve my problem
i have seen all the comments many programmers got the problem like this it may help many oh them plz sort it out.
Thak you.
Did you installed CUDA toolkit & CUDA SDK properly? CUDA compatible display driver?
1>the path for .lib file has to be given. and it should be short. so save both .lib files mentioned above on c drive. and then paste the address in additional dependencies col.
2> .obj file is formed after successful build of code. so look upon the project properties steps once again.
HI,
I am trying to invoke the asyncAPI from C# in VS2008. My program asyncAPI.cu is compiling without issues in VS2008 C++ and I have exported the dll.
But when I invoke the function in C# exe, it is throwing an exception that it is not finding the entrypoint.
Any example code of invoking a CUDA source from C# would be very helpful.
i m very beginning to CUDA programming : i have CUDA toolkit 3.1 for 64 bit and windows 7 64-bit . and i m trying run a CUDA program on visual studio 2008. and i m getting following error :
1>—— Rebuild All started: Project: newCuda, Configuration: Debug Win32 ——
1>Deleting intermediate and output files for project ‘newCuda’, configuration ‘Debug|Win32’
1>Compiling with CUDA Build Rule…
1>”C:\CUDA\bin64\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT ” -maxrregcount=32 –compile -o “Debug\cuda.cu.obj” cuda.cu
1>nvcc Hâ∞(Ç=y░: Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..’
1>Linking…
1>LINK : fatal error LNK1104: cannot open file ‘.\Debug\cuda.cu.obj’
1>Build log was saved at “file://c:\Users\Anwarul\Desktop\newCuda\newCuda\Debug\BuildLog.htm”
1>newCuda – 1 error(s), 0 warning(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
anybody please help , i would really appreciate you help . thanks in advance
Have you tried any of the examples packaged with your CUDA distribution?
Yes
but this example produce following error:
1>—— Rebuild All started: Project: MyFirstCUDA, Configuration: Debug Win32 ——
1>Deleting intermediate and output files for project ‘MyFirstCUDA’, configuration ‘Debug|Win32’
1>Compiling with CUDA Build Rule…
1>”C:\CUDA\bin64\nvcc.exe” -arch sm_10 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\bin” -Xcompiler “/EHsc /W3 /nologo /O2 /Zi /MT ” -maxrregcount=32 –compile -o “Debug\example1.cu.obj” “d:\Project\GPU\example1.cu”
1>nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin/../..’
1>Compiling manifest to resources…
1>Microsoft (R) Windows (R) Resource Compiler Version 6.1.6723.1
1>Copyright (C) Microsoft Corporation. All rights reserved.
1>Linking…
1>LINK : fatal error LNK1104: cannot open file ‘.\Debug\example1.cu.obj’
1>Build log was saved at “file://c:\Users\Rosa\Documents\Visual Studio 2008\Projects\MyFirstCUDA\MyFirstCUDA\Debug\BuildLog.htm”
1>MyFirstCUDA – 1 error(s), 0 warning(s)
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
This was the error that was plaguing me for a long time, but I finally fixed it. I think this needs to be done when you’re on a 64 bit OS but compiling in Win32 because Express limits you to that. Anyway, I changed a build rule option as follows…
Right click on your .cu file(s). Select properties. Under “CUDA Build Rule v3.xxx”, highlight “General”. Toward the bottom there should be a field called “Extra Options”. In this field, type “-m32” (without the quotation marks). I believe this tells nvcc to use the 32 bit compiler, but I’m not really sure. All I know is it eliminated that error for me.
Note that after I fixed this, I had to go through and make sure all files were compiled using the same code generation run time library (such as MTd) so everything linked properly. A good thing to do is to open a sample project (such as SobelFilter_vc90 in 2008 or SobelFilter in 2005) in one window, open up your project in a second window, and match up all the properties.
Hope this helps somebody…. it took me about 24 frustrating hours to compile my own CUDA program!
Pingback: CUDA Gets Easier! « /// Parallel Panorama ///
Pingback: Updating to CUDA 2.3 « /// Parallel Panorama ///
Thanks for the gr8 article…It really saved me lot of time. I am stuck in a problem , if I copy cuda kernel in .cpp file it says that the commands are undefined. If I rename the file as .cu then it says following error
1>NVCC Compiling…
1>cuda.cu
1>c1xx : fatal error C1083: Cannot open source file: ‘cuda.cu’: No such file or directory
1>Project : error PRJ0019: A tool returned an error code from “NVCC Compiling…”
1>Build log was saved at “file://c:\Users\ankit\Documents\Visual Studio 2008\Projects\ankit\ankit\Debug\BuildLog.htm”
1>ankit – 2 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
I m using Visual C++ 2008 express edition.
Also when I click on Properties>Configuration and enable emulatiion mode and then when I compile I get the following statement “>NOTE: device emulation mode is deprecated in this release
and will be removed in a future release.”
I m really stuck with this. Pls Help!! thanks
hi,
i am a beginner of cuda programming.actually i had GeForce 6150SE nForce 430 on my desktop but it is not cuda enabled.then how can i go ahead without cuda enabled gpu.
This post actually links to a project file that uses the -deviceemu option to run on a non-GPU machine. Why not use that?
Here’s the link again: ftp://ftp.drivehq.com/llpanorama/CUDA/example1_emu.zip
Hi Sir,
I’m very new to cuda.
Thank you for you helpful tutorial.
I downloaded the files indicated (not the same version) and try to execute the programs.
The build was ok but when I start the debug (emuDebug, debug, release) I got an exception from VC++ as following:
First-chance exception at 0x7c812afb in matrixMul.exe: Microsoft C++ exception: cudaError at memory location 0x0012fe20..
Microsoft C++ exception: cudaError at memory location 0x0012fc24..
Microsoft C++ exception: cudaError at memory location 0x0012fc80..
…
I try to investigate for the solution but could not solve it.
Your help will be welcome.
Thank you.
sir maybe you have use too much of the shared memory which exceed the limit
low the tile_width and try
Whether programs compiled with -deviceemu mode will utilize CUDA device fully if run on?
I doubt it. Emulation mode can be used to debug your CUDA program on the main CPU and that would be thwarted if the GPU were automatically enabled whenever it is present.
CUDA is not used properly when -deviceemu is on. In this mode, multiple threads are simulated by a single CPU thread and hence it makes process very slow.
if i have to add 1000 numbers then
how many threads,blocks,warps and no of threads [er block will be their?
Sir can u help me to providing cuda programee to find out the workload characterization of cuda application.
Thanks for the complete information. It’s give me a lot of help.
Hey,
Thank you so much for a nicely explained tutorial. The thing is I hate using VStudios whatnot. So, I was wondering if it is possible to use a MinGW (on Codeblocks or NetBeans or even DevCPP) to compile and run CUDA programs with nvcc. If so, any idea on how I can configure them ?
Thanks in advance
PS : Sorry for the repeated post, I accidentally posted in the wrong page.
For people with unresolved externals like:
error LNK2019: unresolved external symbol cudaFree referenced in function main
If you are working on an x64 compilation make sure that instead of “C:\Cuda\lib” you have added “C:\Cuda\lib64” in the Additional Library Dependencies.
Pingback: CUDA dan VS 2008 « notes
Very helpful article! This cleared all my doubts about starting to use CUDA in VC. I always wanted to use it in my apps. Thanks a lot!
Building CUDA programs has been a pain for me using Visual Studio IDE. Your tutorial very clearly explains the steps. Great article ! I will recommend this for anybody starting to use CUDA.
Thanks.
Thank you so much for such an outstanding article….
It really helped me out to set up and configure my CUDA on VC 2005.
Thanks a lot once again…
“C:\CUDA\bin\nvcc.exe” ccbin “$ (VCInstallDir) bin”-c-D_DEBUG-DWIN32-D_CONSOLE-D_MBCS-Xcompiler/EHsc,/W3,/nologo,/Wp64,/Od,/ Zi,/MTD-I “C:\CUDA\include\”-I./-o$ (ConfigurationName)\Example_1_CUDA.obj Example_1_CUDA.cu
Corect the probléme plz
OMG loved reading your blogpost. I added your feed to my blogreader.
Pingback: Is anyone using CUDA? - Overclock.net - Overclocking.net
thanks for this article and the comments below. It’s very helpful for me to build my first cuda program. I use X64 machine with window7 and VS2008 compiler.
I have summarized some solutions when I built my first program with x64 machine.
To built with x64, plz make sure that:
properties -> linker -> general -> Additional Library Directories:
C:\ProgramData\NVIDIA Corporation\NVIDIA GPU Computing SDK\C\common\lib;C:\CUDA\lib64
properties -> linker -> input -> Additional Dependencies: cudart.lib cutil64D.lib
properties -> linker -> Advanced -> Target Machine: MachineX64 (/MACHINE:X64)
and make a new configuration manager as llpanorama mentioned: “* open the solution explorer, select solution, right click->Configuration Manager.
* go to ‘Active Solution Platform’, click New.
* in the ‘New Solution Platform’ dialog that comes up select the new platform x64. Set ‘Copy Settings From’ to ‘Win32′
* click OK.”
Very helpful!
hi, i’m new in cuda. i have windows 7 64bit. visual studio 2008 professional. i did your instructions. but it can’t run. an error and warning occured.
LINK : fatal error LNK1181: cannot open input file ‘cutil64D.lib’
would you please help me?
Thanks for this page!!!! Never found such clear explanation, even from NVidia pages or guides.
Great article!
Are you working on other Cuda / OpenCL projects ? I am trying to compile an openFoam solver to OpenCL. If there is more interest, how about a blog/forum for sharing experiences ?
To solve the liner error 1181: this is whati did.
After following all the steps in this tutorial.
If you get linker error do the following:
go to
Tools ->options
in the options window
select
Project and Solutions
choose
VC++ Directories
click on new line icon
manually locate and add cudart.lib and cutil32.lib.
Once those paths are included
The program builds and runs just fine
how can i calculate the Lennard jones Forces LJ with CUDA
I just compiled it sucessfully using SDK 2.3 on Ubuntu Linux 9.04 (jaunty) with emulation:
nvcc -deviceemu example_1.cpp
Thanks for such a great tutorial. very detailed. The comments left by everybody helped me solved a few problems as well. Thanks.
Great to see so many people developing on cuda. I would like to make some contacts with any of you. If you are interested, my e-mail is jam0cam@yahoo.com
Many many thanx for this article….it runs smoothly. and its the only tutorial on the internet with such details explanation line by line. Sitting back tight for more such tutorials by you.
Thanx again.
Btw alex…Jus check whether u are using a compatible NVIDIA driver with ur CUDA Version. I’m using 2.3 with 190.16 Beta Drivers. 185.x drivers didnt work for me….
Using it in Ubuntu. I am compiling with nvcc -o out vekadd.cu und running it with ./out and the result is as followed (no squares):
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000
What I’m doing wrong?
Greets
Alex, this is exactly the result I get if I disable my GPU card. Essentially, the a_h array gets initialized with 0..9 but never gets the squared results because the GPU is not running.
I don’t know how to enable/disable your card under linux. You might try running the deviceQuery example program to see if it picks up your GPU card. (See my previous blog entry which does this.)
Are you sure your example really runs on the GPU instead of on the CPU?
I think you happy to soon.
Just increase N and/or run it repeatedly, then you see that th GPU stays, cool, while the CPU reports load.
T
Yes, I believe this program runs on the GPU and not the CPU. I can disable my NVIDIA 8600 card and the program computes incorrect results when I do so. The correct results are output once the GPU is re-enabled.
I also compiled the program for CPU-only operation using the -deviceemu option and it computes correct answers regardless of whether the GPU is enabled or disabled.
This program makes very little use of the GPU, even with large N or repeated use (just a single multiplication for each array element). Most of the work involves moving the data from the PC to the GPU card and back under the direction of the CPU. Therefore, it is not surprising that the GPU stays cool and the CPU shows a large load.
I could be wrong, but I would need to see more compelling evidence than you have provided that the program is not actually running in the GPU.
If you increase N, then you increase the number of lines displayed in the command window using printf. That’s where the CPU load happens.
Put N = 10000 (for example) and comment the line with “printf”, you won’t see CPU load.
Hi,
thank you for providing me your gmail ID. Last day CUDA was working fine on VC++ 2005 but it had developed some problem.
when I compile the file it gives this message.
1>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Performing Custom Build Step
1>nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
1>Build log was saved at “file://c:\Users\chetan\Desktop\example1_emu1\example1_emu\example1\Debug\BuildLog.htm”
1>example1 – 0 error(s), 0 warning(s)
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
and when i build the project it gives .
>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Performing Custom Build Step
1>nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘.\Debug\example1.obj’
1>Build log was saved at “file://c:\Users\chetan\Desktop\example1_emu1\example1_emu\example1\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
I think there is some problem in Custom Build step .I tried putting quotes (“) for every path i specify but still it is not working. can you pelase help me.
with regards
chetan
I don’t know. Maybe it’s the version of CUDA you are using. (I used 1.1 for my example.)
i use the following custom bulid command string:
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -deviceemu -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I “$(CUDA_INC_PATH)” -o $(ConfigurationName)\example1.obj example1.cu
then remove the #include “stdafx.h” from example.cu source file.
these steps remove the above error in mine case.
Hi,
amazing tutorial! I really appreciate your effort. Please do continue this.
I followed your steps and I took your program for emulator. But when i build it give “cannot read input file: cutil32D.lib” I have checked i have both the cutil32.lib as well as cutil 32D.lib. and I also tried manually adding the file location. But it is not working. then i deleted that Configuration Properties->linker->Input . but then I am not able to compile your program but simple CUDA programs are compiling. SO what can i do? please guide me.
thank you in advance
I’m facing the following problem:
1>—— Build started: Project: hope, Configuration: Emudebug Win32 ——
1>Linking…
1>.\Emudebug\stdafx.obj : fatal error LNK1112: module machine type ‘X86’ conflicts with target machine type ‘x64’
plz help me out!!
You are compiling for a Windows 32 machine, but you are running on a 64-bit Windows machine. Either move your development to a 32-bit Windows PC, or upgrade your CUDA to a version that supports 64-bit Windows.
I still get an error (WIN XP 64bit, VS2008)
Error 1 fatal error LNK1181: cannot open input file ‘.\Debug\example1.obj’ CUDA_ex1b CUDA_ex1b
Do you know what it means?
I’ve tried it on linux and it works, just simple: nvcc example.cu
I have a error:
—— Build started: Project: example1, Configuration: Debug Win32 ——
Performing Custom Build Step
Project : error PRJ0002 : Error result -1073741510 returned from ‘C:\WINDOWS\system32\cmd.exe’.
Build log was saved at “file://c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\example1_emu\example1\Debug\BuildLog.htm”
example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
I have error:
—— Build started: Project: example1, Configuration: Debug Win32 ——
Performing Custom Build Step
Project : error PRJ0002 : Error result -1073741510 returned from ‘C:\WINDOWS\system32\cmd.exe’.
Build log was saved at “file://c:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\projects\example1_emu\example1\Debug\BuildLog.htm”
example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Hi, I am trying to compile your exemple in VISTA 64 and I got the error: “>nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘c:/Program File” do you have nay advise?
Do a search for CUDA, Vista and 64-bits. I’m sure I’ve seen people talk about running CUDA on 64-bit XP.
Here’s a solution for Visual Studio:
nvcc fatal : Visual Studio configuration file ‘(null)’ could not be found for installation at ‘C:/Program Files (x86)/Microsoft Visual Studio 8/VC/bin
This is probably because Visual Studio doesn’t install “X64 Compilers and Tools” by default, so you should go to Control Panel, Programs and Features, select visual studio, add or remove features, and select “X64 Compilers and Tools” under Visual C++.
Just notice that I think this solution doesn’t work for Express Editions because it doesn’t have support for x64.
I hope this info helps.-
hey thanks! been banging my head.. installing the x64 components compiled instantly.. 😉
Thanks!
it helped me too!
Hello thx a lot for this helpful article.
i follow ur steps one by one , but when i build the project ,it blocks at this step :
”
1>—— Build started: Project: example1, Configuration: Debug x32 ——
1>Performing Custom Build Step
1>example1.cu
1>tmpxft_00000be0_00000000-3_example1.cudafe1.gpu
1>tmpxft_00000be0_00000000-8_example1.cudafe2.gpu
”
and i don t see the cause !!!
when i build the cuda program directly from cmd with “nvcc -deviceemu -o exp1 exemple1.cu” it can generate the .exe file and it gives same result !!!! do u have any ideas about this prob????
Have you tried getting the source and project files and compiling it directly from those? Are you using CUDA 1.1?
thx for ur answer ,yes i already try to get source of project ,it work perfectly in emulation mode (debugemu vc++ 8 or using cmd “nvcc -deviceemu….)
i use CUDA 2.0 / MSVC 8 /Gforce 8600 😦
You should try CUDA 1.1 since that is the version used with my example.
OK i will try ur advice,thx a lot for ur answer
thx for ur cooperation ,i tryed CUDA 1.1 and now the program works perfectly.
Up in the main article, I have added a link to a project zip file with the -deviceemu option.
would it be possible that you post your program as a zip file with the visual studio 2005 set up with emulation and debug?
perhaps this would be a simple solution for all of us here.
thanks
Hi;
I have the same problem as one of your other users. I switched on the emumode as suggested, but I still get
the output
1 1.0000
2 2.0000
3 3.0000
etc, etc etc.
Any Suggestions
me too…any ideas what’s causing this?
My program runs fine in emudebug mode, but when I try to run it on Debug (using the actual GPU, because my goal is to run the program for more than 1,000,000 threads) I end up with following two problems:
1. I have memcopy from host to device, but when I debug the structure doesn’t get copied.
cutilSafeCall( cudaMemcpy(layerIni,layer, 1 * sizeof(TissueStruct), cudaMemcpyHostToDevice) );
2. The kernel execution fails with cudaThreadSynchronize error : unspecified launch failure.
Can you please help me in this case.
The device I am using is GeForce GTX 260.
No idea what the problem is. I suggest you try asking on the CUDA forums.
I am getting the following message:
cudaSafeCall() Runtime API error in file , line 59: feature is not yet implemented.
line 59 of that piece of code says
cutilSafeCall(cudaGetDeviceProperties(&deviceProp, dev));
Thanks for your advice.
Actually, I do not have the CUDA graphic cards on my machine, so I must use the emulator mode. I also tried to followed your steps in the previous post, but I can do the step 1 and 2. I can not do the step 3, which is about installing the Driver. It says that it could not locate any drivers compatible with the current hardware. The DeviceQuery compiles fine but it could not run. The debug mode show the following information when running:
‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\deviceQuery.exe’, Symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\WINDOWS\system32\ntdll.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\WINDOWS\system32\kernel32.dll’, No symbols loaded.
‘deviceQuery.exe’: Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
‘deviceQuery.exe’: Loaded ‘C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\bin\win32\Debug\cutil32D.dll’, No symbols loaded.
First-chance exception at 0x7c812a5b in deviceQuery.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fd2c..
First-chance exception at 0x7c812a5b in deviceQuery.exe: Microsoft C++ exception: cudaError at memory location 0x0012fd7c..
The program ‘[464] deviceQuery.exe: Native’ has exited with code 1 (0x1).
Also note: I use Visual Studio .NET 2005.
Have you downloaded my source files and tried to compile and run those? Have you tried the DeviceQuery example in my previous post about setting-up the CUDA tools?
PS: The debug mode shows the following information:
‘example2.exe’: Loaded ‘C:\Documents and Settings\Tuan Anh NGUYEN\My Documents\Visual Studio 2005\Projects\example2\debug\example2.exe’, Symbols loaded.
‘example2.exe’: Loaded ‘C:\WINDOWS\system32\ntdll.dll’, No symbols loaded.
‘example2.exe’: Loaded ‘C:\WINDOWS\system32\kernel32.dll’, No symbols loaded.
‘example2.exe’: Loaded ‘C:\CUDA\bin\cudart.dll’, Binary was not built with debug information.
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe5c..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012feac..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe54..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fea4..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe44..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fe94..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe54..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012fea4..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012fe60..
First-chance exception at 0x7c812a5b in example2.exe: Microsoft C++ exception: cudaError at memory location 0x0012feb0..
The program ‘[1932] example2.exe: Native’ has exited with code 0 (0x0).
Please help me in this case.
Hi,
I follow your step (with -deviceemu option): it compiled fine but the result is wrong:
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000
The full command is:
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I ” $(CUDA_INC_PATH)” -I./ -o -deviceemu $(ConfigurationName)\example1.obj example1.cu
Can you help me for this problem ? I use SDK and Toolkit 1.1
Thanks
Marc:
I believe the indices in the program all start at zero, so the lowest array index is 0 * 4 + 0 = 0.
First of all, thank you for an awesome article!
I’m a bit confused about the inner workings of the kernel function.
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx<N) a[idx] = a[idx] * a[idx];
so blockIdx ranges from 1 to 3 (3 blocks, given by n_blocks), and blockDim is 4.
so 1 * 4 + 1 = 5 is the lowest array index you can get.
What am I not understanding correctly here?
Ivan Dj …
Please use the following: I just removed the qoutation marks from this statement and it worked 🙂 …
Configuration Properties → Linker -> General:
Additional Library Directories = C:\CUDA\lib;C:\Program Files\NVIDIA Corporation\NVIDIA CUDA SDK\common\lib
Ivan:
I used Visual Studio 2005 and CUDA 1.1. You’re using VS 2008. Go back and use VS 2005 and maybe then the example will work for you. Or find the cudart.lib file on your system and update the linkage paths so it will be found.
Hello!
please help quickly. I have the following problem:
1>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘cudart.lib’
1>Build log was saved at “file://e:\Software Projects\Visual Studio 2008 projects\C++\CUDA\example1\example1\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
What else do I need to do, I did everything like you said in tutorial
Thanks for this! With just a little bit of tweaking, I was able to get this code to work under linux without issue. Matter of fact, it was easier to do, I think.
In my case, all I had to do was comment out the stdafx.h include, rename the source to a .cu file, then compile it with nvcc. This created an a.out file that worked first time through!
it was really useful…
thanks alot 🙂
Muchas grasias~!
Pingback: Desenvolvimendo com CUDA no Ubuntu 8.04 « John Tortugo
I would do the obvious and install CUDA 1.1 and see if the error still occurs. If it does, then there is a problem when you setup the project. Otherwise, there is a problem when using CUDA 2.0.
im getting error a tool returned error code from “compiling” i get struck there. pls help me give ur suggestion in my mail also
the tutorial is wonderful. Unfortunately, I got the error from vs2005,
Error 1 error PRJ0019: A tool returned an error code from “Performing Custom Build Step”
I dont know how to figure our it. Could you help me if possible? thank u a lot. By the way,I used CUDA2.0, Is it trouble with that version?
Thank you, wonderful article.
Thanks for this great tutorial. I used Vista x64 and works very well.
Thanks a lot.
When you are messing around with the properties, it might be advantagous to replace all uses of “example1” with “$(InputName)” (without quotes).
This means that the project created can easily be reused just by renaming the files involved, and not requiring that you manually fiddle with the properties every time.
Great guide by the way! I just got bugged having to constantly change those variables, so I went hunting for an alternative.
Thanks for the help, GREAT TUTORIAL
The project can be compiled and run. Here’s the result:
0 0.000000
1 1.000000
2 4.000000
3 9.000000
4 16.000000
5 25.000000
6 36.000000
7 49.000000
8 64.000000
9 81.000000
:)))))))Thanks for article!!!!!!
Here is some info I found about compiling for 64-bit Windows on the Nvidia forums:
I ended up getting it to work by following the instructions under “How To Create 64-bit apps” at http://blogs.msdn.com/deeptanshuv/archive/…/11/573795.aspx
In summary I had to:
* List cutil64D.lib instead of cutil32D.lib under Project Properties -> Configuration Properties -> Linker -> Input -> Additional Dependancies
* Change from the MachineX86 to MachineX64 option under Project Properties -> Configuration Properties -> Linker -> Advanced -> Target Machine
* open the solution explorer, select solution, right click->Configuration Manager.
* go to ‘Active Solution Platform’, click New.
* in the ‘New Solution Platform’ dialog that comes up select the new platform x64. Set ‘Copy Settings From’ to ‘Win32’
* click OK.
And if I do this before writing a project it seems to build properly.
OK.my bad.I change the Active solution platform Win32 to x64.But now the problem is other:
1>—— Build started: Project: example1, Configuration: Debug x64 ——
1>Performing Custom Build Step
1>example1.cu
1>tmpxft_00000be0_00000000-3_example1.cudafe1.gpu
1>tmpxft_00000be0_00000000-8_example1.cudafe2.gpu
1>tmpxft_00000be0_00000000-3_example1.cudafe1.cpp
1>tmpxft_00000be0_00000000-12_example1.ii
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘cutil32D.lib’
1>Build log was saved at “file://c:\Documents and Settings\jacevedo\Desktop\example1\example1\example1\x64\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
and this library is not in C:\CUDA\lib
Can somebody give me advise how to resolve this problem? Thanks.
hi,Great tutorial
I did all the steps.
But I got a fatal error when i tried to build it (compile) using VS2005 in a XP x64
This is what shows:
1>—— Build started: Project: example1, Configuration: Debug Win32 ——
1>Compiling…
1>stdafx.cpp
1>Linking…
1>LINK : fatal error LNK1181: cannot open input file ‘cudart.lib’
1>Build log was saved at “file://c:\Documents and Settings\jacevedo\Desktop\example1\example1\example1\Debug\BuildLog.htm”
1>example1 – 1 error(s), 0 warning(s)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Can somebody give me advise how to resolve this problem? Thanks.
The result you are getting is the same thing that happens if I try to run the program in non-emulated mode with my 8600 card disabled. That is similar to trying to run the code on a non-CUDA device such as your Radeon. Are you sure you are running the emulated version of the program?
Great tutorial.
I have a question. i try to use this cool sample without NVDIA gpu (i use ATI radeon). So it’s compil (with -deviceemu option) fine but when i try to launch the exe file the result it’s strong. See below :
0 0.000000
1 1.000000
2 2.000000
3 3.000000
4 4.000000
5 5.000000
6 6.000000
7 7.000000
8 8.000000
9 9.000000
There is no square computing. Thanks.
Marek:
When you install CUDA, the installer should create all the CUDA… environment variables. Then, in the Visual Studio project, you have to create all the configuration properties so the correct compiler is called as I did in the example shown above.
Also, I’m not sure if CUDA supports VS2008. I know the version I am using (1.1) doesn’t. That may have changed. Check the Nvidia forums for more information.
Hi. I have a problem with compilation CUDA programs in Visual Studio 2008. It doesn’t know CUDA… variables so for example it doesn’t find CUDA compiler. Can somebode give me advise how to resolve this problem? Thanks.
Peter:
Use the -deviceemu compiler option as shown in the second-to-last paragraph of this article. This will create an executable that uses the CUDA emulator instead of a graphics card.
Thank you
I have a mistake, because I copy option directly to visual studio
“$(CUDA_BIN_PATH)\nvcc.exe” -ccbin “$(VCInstallDir)bin” -c -D_DEBUG -DWIN32 -D_CONSOLE -D_MBCS -Xcompiler /EHsc,/W3,/nologo,/Wp64,/Od,/Zi,/MTd -I”$(CUDA_INC_PATH)” -I./ -o $(ConfigurationName)\example1.obj example1.cu
but upper option double quotes wasn`t correctly copied.
I was wondering if you knew how to set up the emulator on XP so I dont have to buy an new graphics card. I cant seem to find anything on the internet.
This article is very helpful 🙂
Satakarni:
I know that a block of threads will be executed on a single multiprocessor and multiple blocks can be assigned to each multiprocessor. Other than that, I don’t know of any static relation between grid elements and multiprocessors.
I don’t see anything in the CUDA API that lets you select the number of processors that will be used to run your code. You might ask on the CUDA forum and see if anyone knows of a switch to do this.
As we know `<<>>’ is required for calling Kernel to execute on GPUs (or device), where Dg meant for grid size, Db for block size (and number of threads), and optional Ns for memory allocation.
However I would like to know how the number of processors and grids are related?
For example, I am using Tesla C870. which has 16 multiprocessors with each multiprocessor having 8 processors. total 128 processors. I want to scale my program my testing it on 16, 32, 48, and so on up to 128 processors. How can I archive this with CUDA programming?
( I thought that there must be some relation with grid and/or block size used in the program with the number of processors in the GPU card.)
Kindly let me know.
With Regards,
Satakarni
Explanation was excellent, and I found interesting and helpful for my Work. Keep writing my dear friend.
You can download new version of the CUDA wizard for the VS Express.
url:
http://forums.nvidia.com/index.php?showtopic=69183
Thank you for the article.
Thank you for this article,it is very helpful.
Pingback: Threads and blocks and grids, oh my! « /// Parallel Panorama ///
Thank you very much for this article. It really helped me. Continue writing.
This is extremely helpful for those of us wanting to start from scratch (which is the only way I can learn anything).
Great work buddy! Particularly the painstaking details given on how to configure VS for cu, etc.
Many thanks for article, it is written very clear, to read and understand !
I am interested in that direction very, but there is no good information enough ! Please do not stop and write more !!!!
I would be very glad to see the same detailed example with using MersenneTwister and Montecarlo !!!
Thanks for article!!!!!!