About Me

A Computer Enthusiast who's learning as much as I can, while attempting to share and re-distribute the knowledge! Enjoy! :D

Monday, June 21, 2010

CUDA: Know your limits on global memory

I was coding away on an assignment when I ran into a conundrum: I was getting weird results when attempting to copy data onto the device. There would be instances when arrays copied onto the device would be accessible, yet inaccessible during another run.

The fact I'm coding CUDA kernels on OS X gives way to a dilemma: cuda-gdb is not available (yet) on OS X. I have to rely on old school debugging techniques ... a code walk through and print statements! After numerous tests and frustrations ... I figured out I was running into a problem with global memory:

marklagatuz$ /Developer/GPU\ Computing/C/bin/darwin/release/deviceQuery

Device 0: "GeForce 9400M"
Total amount of global memory: 265945088 bytes

The above reads approximately 265MB of global memory. I had 4 arrays consisting of 67MB each being copied onto the device. I was clearly running into memory issues. This would explain why each time a different array would cause problems.

Lesson learned: Check your device(s) limitations before coding away! Then again ... you should be doing that anyways!

Tuesday, June 15, 2010

Learned Something New (or actually a review of something old)!

Since I'm forcing myself to think in terms of OO (Object Orientation), I forgot that computers are still 1 and 0's! As I'm reading code to understand design patterns, algorithms, and methods others folks are using, I came across something I've never used before (at least in my own code): shift operators.

  • <<
  • >>
I've always thought of the chevrons as output redirection in scripting or in C++. I've forgotten they actually shift the bits either to the left or right:

(1 << 24) == 0001 1111 1111 1111 1111 1111 1111

CUDA + THRUST + Eclipse

Quickstart

Assumptions: A working CUDA environment (I'm using OS X for this example).
  • nvcc --version --> should display CUDA Version, built date, and version of tools installed.
  • ./deviceQuery from /Developer/GPU Computing/C/bin/darwin/release (for OS X) should produce output for your device
1. Download the current library from the Thrust Project (currently 1.3.0) - http://code.google.com/p/thrust/downloads/list

2. Select a location and unzip the thrust library. You can unzip the library into the default cuda include location (/usr/local/cuda/include). I prefer to unzip the library in my home directory, (specifically the Downloads directory) but it's up to the user!
  • unzip thrust-v1.3.0.zip
This will create a directory called thrust

3. Add the libraries within your project in Eclipse
  • Project Name --> Properties
  • C/C++ Build --> Settings
  • CUDA NVCC Compiler --> Includes
  • Add (On the same line as Include Paths - green + button)
I originally added /Users/marklagatuz/Downloads/thrust, but was receiving the following
errors: error: thrust/host_vector.h: No such file or directory

The code compiled after removing /thrust from the -I on the command line (absolute path up to the thrust library).

--

References:

1. Thrust QuickStartGuide
  • http://code.google.com/p/thrust/wiki/QuickStartGuide

Thursday, June 10, 2010

CUDA Quick Tips, Reference, and Cheat Sheets

Here are some quick tips and references I strung together while I'm learning CUDA

A. Size of a Grid:
  • gridDim.x (1Dimensional)
  • gridDim.x (2Dimensional, assuming a N x N Grid)

B. Size of a Block:
  • blockDim.x (1Dimensional)
  • blockDim.x (2Dimensional, assuming a N x N Block)

C. Thread Local Index within its block (assuming a 1Dimensional Block):
  • threadIdx.x

D. Block Local Index
  • blockIdx.x (1Dimensional)
  • blockIdx.x (2Dimensional) --> Current Column Index (Length) of a N x N Block
  • blockIdx.y (2Dimensional) --> Current Row Index (Height) of a N x N Block

E. Thread Global Index across the entire grid (assuming a 1 Dimensional Grid):
  • (blockDim.x * blockIdx.x) + threadIdx.x

F. Thread Local Index within its block (assuming a 2Dimensional Block):

F-1.Obtain current column index (assuming you have a N x N Block):
  • (blockIdx.x * blockDimx.x) + threadIdx.x
F-2. Obtain current row index (assuming you have a N x N Block):
  • (blockIdx.y * blockDimx.x) + threadIdx.y
Since you have a N x N Block, the Length and Height are the same.

Quick Example

N = 1024. You have to process N x N elements (1024 x 1024). You could decompose the grid as so: You could set the blockSize to 64. Then gridSize = numElements / blockSize --> gridSize = 1024 / 64 = 16. Maybe not the most efficient way, but since it's only an example it will do!

So your grid is composed of 4096 Blocks (64 x 64), and each Block is composed of 256 threads (16 x 16).

Total Blocks * Total Threasd per Block = 4096 * 256 = 1,048576 = N * N = 1024 * 1024.

To process each element serially, you would probably have a nested for loop:
for (each col)
for (each row)
process element

To access each element for processing in CUDA (assuming you are storing results in a 1D array):

  • (Global Row * Number of Elements) + Global Column
  • Global Row = (blockIdx.y * blockDimx.x + threadIdx.y)
  • Global Column = (blockIdx.x * blockDimx.x + threadIdx.x)
  • Number of Elements = N = Number of elements Length wise (1024 in my example)

More quick tips in the future ...

Tuesday, June 1, 2010

Quickstart: CUDA using Bayreuth University CUDA Toolchain for Eclipse

I've been trolling through Google for a simple solution in integrating CUDA with Eclipse, and found a University which built an Eclipse plugin. This is a fantastic solution because my previous attempts required me to create my own Makefile (which partially defeats the purpose of using Eclipse!)

Here is my Quickstart for the plugin

Assumptions:
  • A fully functional C/C++ working environment (within the Eclipse IDE and on the command line)
  • A fully functional CUDA environment (including the CUDA Driver, Toolkit, and SDK
  • This assumes you are using OS X (Linux should be quite similar)
1. Install the Plugin (Trivial)
2. Add nvcc to your Path
  • Go to Eclipse --> Preferences
  • Click on C/C++ --> Environment
  • Under Environment variables to set --> click Add
  • Name = PATH (Note: Make sure PATH are all upper case)
  • Value = /usr/local/cuda/bin
  • Apply and OK
3. Create a new CUDA Project and Setup Compile and Build Environment
  • Ctrl + mouse click --> New --> C++ project
  • Under Project type box --> Executable --> select Empty Project
  • Name your project
  • Uncheck the following: Show project types and toolchains only if they are supported on the platform
  • Under Toolchains --> select CUDA Toolchain
  • Click Next
  • Click on Advanced Settings
  • Under C/C++ Build -->Environment --> Confirm PATH is set from previous step (should be USER: PREFS under Origin Column)
  • Under C/C++ Build --> Settings --> Tool Settings Tab --> CUDA NVCC Compiler --> Includes --> add /usr/local/cuda/include
  • Under C/C++ Build --> C++ Linker --> change Command from g++ to nvcc
  • Under C/C++ Build --> C++ Linker --> Libraries --> add cudart to Libraries (-l) and add /usr/local/cuda/lib to Library search path (-L)
  • Apply and OK
At this point you should have a fully functional CUDA Eclipse environment to develop CUDA Applications. Drop in some pre-built (non SDK dependent code) into the project and build it. If you want to run some of the SDK dependent code (located in /Developer/GPU Computing/C/bin/darwin/release), please follow the instructions located at Life Of A Programmer Geek.

*** UPDATE ***

When attempting to build my project, I was getting the following error message during the build phase:

make all
Building target: CUDAToolchainProject
ld: unknown option: -oCUDAToolchainProject
I tracked the problem down to not having "whitespaces" in between the following:
  • ${OUTPUT_FLAG}${OUTPUT_PREFIX}${OUTPUT}
  • This is located at --> --> Properties --> C/C++ Build --> Settings --> C++ Linker
  • Under Expert Settings --> Command line pattern
To mitigate the problem ... just add "whitespaces" in between the following:
  • ${COMMAND} ${FLAGS} ${OUTPUT_FLAG} ${OUTPUT_PREFIX} ${OUTPUT} ${INPUTS}
However, I came across another error during the build phase:

Invoking: C++ Linker
g++ -L/usr/local/cuda/lib -o "CUDAToolchainProject" ./src/cu_mandelbrotCUDA_D.o ./src/cu_mandelbrotCUDA_H.o -lcudart
ld: warning: in ./src/cu_mandelbrotCUDA_D.o, file is not of required architecture
ld: warning: in ./src/cu_mandelbrotCUDA_H.o, file is not of required architecture
ld: warning: in /usr/local/cuda/lib/libcudart.dylib, file is not of required architecture
Undefined symbols:
"_main", referenced from:
start in crt1.10.6.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
make: *** [CUDAToolchainProject] Error 1

To mitigate this problem ... I changed the C++ Linker from g++ to nvcc
  • Properties --> C/C++ Build --> Settings --> C++ Linker
  • Command --> change from g++ to nvcc

The build phase completed successfully and an executable was generated!

The next steps are optional (If you want to follow Eclipse's general project structure, follow the next steps

4. Create Source Folders (Trivial)
  • Ctrl + mouse click --> New --> Source Folder
  • Name your folder
--

Resources

1. Bayreuth University Website
  • http://www.ai3.inf.uni-bayreuth.de/software/eclipsecudaqt/updates
2. NVIDIA CUDA forum: thread 160564
  • http://forums.nvidia.com/index.php?showtopic=160564
3. Life Of A Programmer Geek
  • http://lifeofaprogrammergeek.blogspot.com/2008/07/using-eclipse-for-cuda-development.html
4. Trial & Error