Skeeloo's Laboratory: June 2010

Monday, June 21, 2010

CUDA: Know your limits on global memory

I was coding away on an assignment when I ran into a conundrum: I was getting weird results when attempting to copy data onto the device. There would be instances when arrays copied onto the device would be accessible, yet inaccessible during another run.

The fact I'm coding CUDA kernels on OS X gives way to a dilemma: cuda-gdb is not available (yet) on OS X. I have to rely on old school debugging techniques ... a code walk through and print statements! After numerous tests and frustrations ... I figured out I was running into a problem with global memory:

marklagatuz$ /Developer/GPU\ Computing/C/bin/darwin/release/deviceQuery

Device 0: "GeForce 9400M"
Total amount of global memory: 265945088 bytes

The above reads approximately 265MB of global memory. I had 4 arrays consisting of 67MB each being copied onto the device. I was clearly running into memory issues. This would explain why each time a different array would cause problems.

Lesson learned: Check your device(s) limitations before coding away! Then again ... you should be doing that anyways!

Tuesday, June 15, 2010

Learned Something New (or actually a review of something old)!

Since I'm forcing myself to think in terms of OO (Object Orientation), I forgot that computers are still 1 and 0's! As I'm reading code to understand design patterns, algorithms, and methods others folks are using, I came across something I've never used before (at least in my own code): shift operators.

I've always thought of the chevrons as output redirection in scripting or in C++. I've forgotten they actually shift the bits either to the left or right:

(1 << 24) == 0001 1111 1111 1111 1111 1111 1111

CUDA + THRUST + Eclipse

Quickstart

Assumptions: A working CUDA environment (I'm using OS X for this example).

nvcc --version --> should display CUDA Version, built date, and version of tools installed.
./deviceQuery from /Developer/GPU Computing/C/bin/darwin/release (for OS X) should produce output for your device

1. Download the current library from the Thrust Project (currently 1.3.0) - http://code.google.com/p/thrust/downloads/list

2. Select a location and unzip the thrust library. You can unzip the library into the default cuda include location (/usr/local/cuda/include). I prefer to unzip the library in my home directory, (specifically the Downloads directory) but it's up to the user!

unzip thrust-v1.3.0.zip

This will create a directory called thrust

3. Add the libraries within your project in Eclipse

Project Name --> Properties
C/C++ Build --> Settings
CUDA NVCC Compiler --> Includes
Add (On the same line as Include Paths - green + button)

I originally added /Users/marklagatuz/Downloads/thrust, but was receiving the following

errors: error: thrust/host_vector.h: No such file or directory

The code compiled after removing /thrust from the -I on the command line (absolute path up to the thrust library).

--

References:

1. Thrust QuickStartGuide

http://code.google.com/p/thrust/wiki/QuickStartGuide

Thursday, June 10, 2010

CUDA Quick Tips, Reference, and Cheat Sheets

Here are some quick tips and references I strung together while I'm learning CUDA

A. Size of a Grid:

gridDim.x (1Dimensional)
gridDim.x (2Dimensional, assuming a N x N Grid)

B. Size of a Block:

blockDim.x (1Dimensional)
blockDim.x (2Dimensional, assuming a N x N Block)

C. Thread Local Index within its block (assuming a 1Dimensional Block):

threadIdx.x

D. Block Local Index

blockIdx.x (1Dimensional)
blockIdx.x (2Dimensional) --> Current Column Index (Length) of a N x N Block
blockIdx.y (2Dimensional) --> Current Row Index (Height) of a N x N Block

E. Thread Global Index across the entire grid (assuming a 1 Dimensional Grid):

(blockDim.x * blockIdx.x) + threadIdx.x

F. Thread Local Index within its block (assuming a 2Dimensional Block):

F-1.Obtain current column index (assuming you have a N x N Block):

(blockIdx.x * blockDimx.x) + threadIdx.x

F-2. Obtain current row index (assuming you have a N x N Block):

(blockIdx.y * blockDimx.x) + threadIdx.y

Since you have a N x N Block, the Length and Height are the same.

Quick Example

N = 1024. You have to process N x N elements (1024 x 1024). You could decompose the grid as so: You could set the blockSize to 64. Then gridSize = numElements / blockSize --> gridSize = 1024 / 64 = 16. Maybe not the most efficient way, but since it's only an example it will do!

So your grid is composed of 4096 Blocks (64 x 64), and each Block is composed of 256 threads (16 x 16).

Total Blocks * Total Threasd per Block = 4096 * 256 = 1,048576 = N * N = 1024 * 1024.

To process each element serially, you would probably have a nested for loop:

for (each col)
for (each row)
process element

To access each element for processing in CUDA (assuming you are storing results in a 1D array):

(Global Row * Number of Elements) + Global Column
Global Row = (blockIdx.y * blockDimx.x + threadIdx.y)
Global Column = (blockIdx.x * blockDimx.x + threadIdx.x)
Number of Elements = N = Number of elements Length wise (1024 in my example)

More quick tips in the future ...

Tuesday, June 1, 2010

Quickstart: CUDA using Bayreuth University CUDA Toolchain for Eclipse

I've been trolling through Google for a simple solution in integrating CUDA with Eclipse, and found a University which built an Eclipse plugin. This is a fantastic solution because my previous attempts required me to create my own Makefile (which partially defeats the purpose of using Eclipse!)

Here is my Quickstart for the plugin

Assumptions:

A fully functional C/C++ working environment (within the Eclipse IDE and on the command line)
A fully functional CUDA environment (including the CUDA Driver, Toolkit, and SDK
This assumes you are using OS X (Linux should be quite similar)

1. Install the Plugin (Trivial)

Help --> Install New Software -->
Name = ( for me)
Location = http://www.ai3.inf.uni-bayreuth.de/software/eclipsecudaqt/updates
Click on Uncategorized --> Toolchains for CUDA and QT Development
Accept the License Agreement
Restart Eclipse

2. Add nvcc to your Path

Go to Eclipse --> Preferences
Click on C/C++ --> Environment
Under Environment variables to set --> click Add
Name = PATH (Note: Make sure PATH are all upper case)
Value = /usr/local/cuda/bin
Apply and OK

3. Create a new CUDA Project and Setup Compile and Build Environment

Ctrl + mouse click --> New --> C++ project
Under Project type box --> Executable --> select Empty Project
Name your project
Uncheck the following: Show project types and toolchains only if they are supported on the platform
Under Toolchains --> select CUDA Toolchain
Click Next
Click on Advanced Settings
Under C/C++ Build -->Environment --> Confirm PATH is set from previous step (should be USER: PREFS under Origin Column)
Under C/C++ Build --> Settings --> Tool Settings Tab --> CUDA NVCC Compiler --> Includes --> add /usr/local/cuda/include
Under C/C++ Build --> C++ Linker --> change Command from g++ to nvcc
Under C/C++ Build --> C++ Linker --> Libraries --> add cudart to Libraries (-l) and add /usr/local/cuda/lib to Library search path (-L)
Apply and OK

At this point you should have a fully functional CUDA Eclipse environment to develop CUDA Applications. Drop in some pre-built (non SDK dependent code) into the project and build it. If you want to run some of the SDK dependent code (located in /Developer/GPU Computing/C/bin/darwin/release), please follow the instructions located at Life Of A Programmer Geek.

*** UPDATE ***

When attempting to build my project, I was getting the following error message during the build phase:

make all
Building target: CUDAToolchainProject
ld: unknown option: -oCUDAToolchainProject

I tracked the problem down to not having "whitespaces" in between the following:

${OUTPUT_FLAG}${OUTPUT_PREFIX}${OUTPUT}
This is located at --> --> Properties --> C/C++ Build --> Settings --> C++ Linker
Under Expert Settings --> Command line pattern

To mitigate the problem ... just add "whitespaces" in between the following:

${COMMAND} ${FLAGS} ${OUTPUT_FLAG} ${OUTPUT_PREFIX} ${OUTPUT} ${INPUTS}

However, I came across another error during the build phase:

Invoking: C++ Linker
g++ -L/usr/local/cuda/lib -o "CUDAToolchainProject" ./src/cu_mandelbrotCUDA_D.o ./src/cu_mandelbrotCUDA_H.o -lcudart
ld: warning: in ./src/cu_mandelbrotCUDA_D.o, file is not of required architecture
ld: warning: in ./src/cu_mandelbrotCUDA_H.o, file is not of required architecture
ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture
Undefined symbols:
"_main", referenced from:
start in crt1.10.6.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [CUDAToolchainProject] Error 1

To mitigate this problem ... I changed the C++ Linker from g++ to nvcc

Properties --> C/C++ Build --> Settings --> C++ Linker
Command --> change from g++ to nvcc

The build phase completed successfully and an executable was generated!

The next steps are optional (If you want to follow Eclipse's general project structure, follow the next steps

4. Create Source Folders (Trivial)

Ctrl + mouse click --> New --> Source Folder
Name your folder

--

Resources

1. Bayreuth University Website

http://www.ai3.inf.uni-bayreuth.de/software/eclipsecudaqt/updates

2. NVIDIA CUDA forum: thread 160564

http://forums.nvidia.com/index.php?showtopic=160564

3. Life Of A Programmer Geek

http://lifeofaprogrammergeek.blogspot.com/2008/07/using-eclipse-for-cuda-development.html

4. Trial & Error

Skeeloo's Laboratory

About Me