<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Threads and blocks and grids, oh my!</title>
	<atom:link href="http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/feed/" rel="self" type="application/rss+xml" />
	<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/</link>
	<description>Snicker and snort as I explore parallel programming techniques...</description>
	<lastBuildDate>Sat, 14 Nov 2009 15:11:43 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Dominik</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-391</link>
		<dc:creator>Dominik</dc:creator>
		<pubDate>Mon, 26 Oct 2009 09:15:25 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-391</guid>
		<description>You mention an implementation of CUDA on multi-cores. If you&#039;re still interested you can find a paper about it here: http://www.ldc.usb.ve/~vtheok/cursos/ci6323/pdf/lecturas/MCUDA:%20An%20Eficient%20Implementation%20of%20CUDA%20Kernels%20on%20Multi-cores.pdf</description>
		<content:encoded><![CDATA[<p>You mention an implementation of CUDA on multi-cores. If you&#8217;re still interested you can find a paper about it here: <a href="http://www.ldc.usb.ve/~vtheok/cursos/ci6323/pdf/lecturas/MCUDA:%20An%20Eficient%20Implementation%20of%20CUDA%20Kernels%20on%20Multi-cores.pdf" rel="nofollow">http://www.ldc.usb.ve/~vtheok/cursos/ci6323/pdf/lecturas/MCUDA:%20An%20Eficient%20Implementation%20of%20CUDA%20Kernels%20on%20Multi-cores.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark King</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-390</link>
		<dc:creator>Mark King</dc:creator>
		<pubDate>Sat, 24 Oct 2009 21:36:47 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-390</guid>
		<description>If you are having a problem with the Visual C++ Express Edition compiler 2008 not finding the cutil.h header file then go to tool-&gt;options, project &amp; solutions and make sure the directory it is in is included in the list of directories for header files listed here.</description>
		<content:encoded><![CDATA[<p>If you are having a problem with the Visual C++ Express Edition compiler 2008 not finding the cutil.h header file then go to tool-&gt;options, project &amp; solutions and make sure the directory it is in is included in the list of directories for header files listed here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bharat Subedi</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-362</link>
		<dc:creator>Bharat Subedi</dc:creator>
		<pubDate>Sat, 15 Aug 2009 11:47:42 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-362</guid>
		<description>I tried the code by changing the kernel code to the below, I got the performance of 27.624880 ms in Geforce 8600 MGT 
  __global__ void square_array(float *a, int N)  
 {  
	int totalThreads = blockDim.x * gridDim.x ;
	int tasksPerThread = (N % totalThreads) == 0 ? N / totalThreads : N/totalThreads + 1; 
	int myThreadIdx = blockIdx.x * blockDim.x + threadIdx.x ;
	int startIdx = myThreadIdx ;
	int endIdx = myThreadIdx +  totalThreads * tasksPerThread ;
	if( endIdx &gt; N) 
	{
		endIdx = N;
	}
	const int stride = totalThreads;
	for(int i = startIdx ; i &lt; endIdx ; i += stride)
	{
		a[i] = a[i] * a[i];
	}	
 }</description>
		<content:encoded><![CDATA[<p>I tried the code by changing the kernel code to the below, I got the performance of 27.624880 ms in Geforce 8600 MGT<br />
  __global__ void square_array(float *a, int N)<br />
 {<br />
	int totalThreads = blockDim.x * gridDim.x ;<br />
	int tasksPerThread = (N % totalThreads) == 0 ? N / totalThreads : N/totalThreads + 1;<br />
	int myThreadIdx = blockIdx.x * blockDim.x + threadIdx.x ;<br />
	int startIdx = myThreadIdx ;<br />
	int endIdx = myThreadIdx +  totalThreads * tasksPerThread ;<br />
	if( endIdx &gt; N)<br />
	{<br />
		endIdx = N;<br />
	}<br />
	const int stride = totalThreads;<br />
	for(int i = startIdx ; i &lt; endIdx ; i += stride)<br />
	{<br />
		a[i] = a[i] * a[i];<br />
	}<br />
 }</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: LocalAdLink</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-303</link>
		<dc:creator>LocalAdLink</dc:creator>
		<pubDate>Mon, 23 Feb 2009 17:09:32 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-303</guid>
		<description>Love that yahoo, fine stuff.

Will read up and hopefully have something to post.

Cya.



_____________________________________
LocalAdLink</description>
		<content:encoded><![CDATA[<p>Love that yahoo, fine stuff.</p>
<p>Will read up and hopefully have something to post.</p>
<p>Cya.</p>
<p>_____________________________________<br />
LocalAdLink</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: llpanorama</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-150</link>
		<dc:creator>llpanorama</dc:creator>
		<pubDate>Wed, 21 Jan 2009 14:31:23 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-150</guid>
		<description>Josh, the squares computed by the program will differ from those found by your calculator because the Nvidia GPU does all its calculations using single-precision floating-point (24-bit mantissa, 8-bit exponent) while your calculator uses BCD integers (at least for the smaller integers).  Single-precision floating-point can only maintain 6 or 7 digits of accuracy, so the squares output by the GPU are drifting.</description>
		<content:encoded><![CDATA[<p>Josh, the squares computed by the program will differ from those found by your calculator because the Nvidia GPU does all its calculations using single-precision floating-point (24-bit mantissa, 8-bit exponent) while your calculator uses BCD integers (at least for the smaller integers).  Single-precision floating-point can only maintain 6 or 7 digits of accuracy, so the squares output by the GPU are drifting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Josh</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-148</link>
		<dc:creator>Josh</dc:creator>
		<pubDate>Wed, 21 Jan 2009 04:06:28 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-148</guid>
		<description>nevermind on the 671088, as I see that it is just 2^25 (N) * 50 as the for loop just prints every 50... sigh, its late...

Anyway, why is the math wrong?</description>
		<content:encoded><![CDATA[<p>nevermind on the 671088, as I see that it is just 2^25 (N) * 50 as the for loop just prints every 50&#8230; sigh, its late&#8230;</p>
<p>Anyway, why is the math wrong?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Josh</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-147</link>
		<dc:creator>Josh</dc:creator>
		<pubDate>Wed, 21 Jan 2009 04:03:56 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-147</guid>
		<description>Clarification: I understand that the number to the right is the approximate square of the number to the left (I say approximate because my computer calc shows the first answer should be 450359103744 when this program shows 450359099392 - a difference of 4352).

What I&#039;m not sure of is where &quot;671088&quot; as the first number came from (I also can tell that the following numbers are multiples of the first one).

Basically, does this number have any significance and why is it incorrect?</description>
		<content:encoded><![CDATA[<p>Clarification: I understand that the number to the right is the approximate square of the number to the left (I say approximate because my computer calc shows the first answer should be 450359103744 when this program shows 450359099392 &#8211; a difference of 4352).</p>
<p>What I&#8217;m not sure of is where &#8220;671088&#8243; as the first number came from (I also can tell that the following numbers are multiples of the first one).</p>
<p>Basically, does this number have any significance and why is it incorrect?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Josh</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-146</link>
		<dc:creator>Josh</dc:creator>
		<pubDate>Wed, 21 Jan 2009 03:57:18 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-146</guid>
		<description>Could you please show what the output from the code above should be? When I run your program, I get the following and I&#039;m not sure I understand what is going on:

0 0.000000
671088 450359099392.000000
1342176 1801436397568.000000
2013264 4053231992832.000000
2684352 7205745590272.000000
3355440 11258977845248.000000
4026528 16212927971328.000000
4697616 22067596492800.000000
5368704 28822982361088.000000
6039792 36479086624768.000000
6710880 45035911380992.000000
7381968 54493450338304.000000
8053056 64851711885312.000000
8724144 76110687633408.000000
9395232 88270385971200.000000
10066320 101330794315776.000000
10737408 115291929444352.000000
11408496 130153782968320.000000
12079584 145916346499072.000000
12750672 162579628425216.000000
13421760 180143645523968.000000
14092848 198608364240896.000000
14763936 217973801353216.000000
15435024 238239973638144.000000
16106112 259406847541248.000000
16777200 281474439839744.000000
17448288 304442750533632.000000
18119376 328311796400128.000000
18790464 353081543884800.000000
19461552 378751992987648.000000
20132640 405323177263104.000000
20803728 432795096711168.000000
21474816 461167717777408.000000
22145904 490441074016256.000000
22816992 520615131873280.000000
23488080 551689891348480.000000
24159168 583665385996288.000000
24830256 616541615816704.000000
25501344 650318513700864.000000
26172432 684996213866496.000000
26843520 720574582095872.000000
27514608 757053685497856.000000
28185696 794433456963584.000000
28856784 832713963601920.000000
29527872 871895205412864.000000
30198960 911977182396416.000000
30870048 952959894552576.000000
31541136 994843274772480.000000
32212224 1037627390164992.000000
32883312 1081312240730112.000000
33554400 1125897759358976.000000
CUDA execution time = 4.708800 ms</description>
		<content:encoded><![CDATA[<p>Could you please show what the output from the code above should be? When I run your program, I get the following and I&#8217;m not sure I understand what is going on:</p>
<p>0 0.000000<br />
671088 450359099392.000000<br />
1342176 1801436397568.000000<br />
2013264 4053231992832.000000<br />
2684352 7205745590272.000000<br />
3355440 11258977845248.000000<br />
4026528 16212927971328.000000<br />
4697616 22067596492800.000000<br />
5368704 28822982361088.000000<br />
6039792 36479086624768.000000<br />
6710880 45035911380992.000000<br />
7381968 54493450338304.000000<br />
8053056 64851711885312.000000<br />
8724144 76110687633408.000000<br />
9395232 88270385971200.000000<br />
10066320 101330794315776.000000<br />
10737408 115291929444352.000000<br />
11408496 130153782968320.000000<br />
12079584 145916346499072.000000<br />
12750672 162579628425216.000000<br />
13421760 180143645523968.000000<br />
14092848 198608364240896.000000<br />
14763936 217973801353216.000000<br />
15435024 238239973638144.000000<br />
16106112 259406847541248.000000<br />
16777200 281474439839744.000000<br />
17448288 304442750533632.000000<br />
18119376 328311796400128.000000<br />
18790464 353081543884800.000000<br />
19461552 378751992987648.000000<br />
20132640 405323177263104.000000<br />
20803728 432795096711168.000000<br />
21474816 461167717777408.000000<br />
22145904 490441074016256.000000<br />
22816992 520615131873280.000000<br />
23488080 551689891348480.000000<br />
24159168 583665385996288.000000<br />
24830256 616541615816704.000000<br />
25501344 650318513700864.000000<br />
26172432 684996213866496.000000<br />
26843520 720574582095872.000000<br />
27514608 757053685497856.000000<br />
28185696 794433456963584.000000<br />
28856784 832713963601920.000000<br />
29527872 871895205412864.000000<br />
30198960 911977182396416.000000<br />
30870048 952959894552576.000000<br />
31541136 994843274772480.000000<br />
32212224 1037627390164992.000000<br />
32883312 1081312240730112.000000<br />
33554400 1125897759358976.000000<br />
CUDA execution time = 4.708800 ms</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sotohictuts</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-131</link>
		<dc:creator>Sotohictuts</dc:creator>
		<pubDate>Wed, 19 Nov 2008 18:39:09 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-131</guid>
		<description>Hello. It is test.</description>
		<content:encoded><![CDATA[<p>Hello. It is test.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Final Year Project Diary &#187; Blog Archive &#187; Terminology research</title>
		<link>http://llpanorama.wordpress.com/2008/06/11/threads-and-blocks-and-grids-oh-my/#comment-127</link>
		<dc:creator>Final Year Project Diary &#187; Blog Archive &#187; Terminology research</dc:creator>
		<pubDate>Tue, 28 Oct 2008 13:13:15 +0000</pubDate>
		<guid isPermaLink="false">http://llpanorama.wordpress.com/?p=25#comment-127</guid>
		<description>[...]   [...]</description>
		<content:encoded><![CDATA[<p>[...]   [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
