I was coding away on an assignment when I ran into a conundrum: I was getting weird results when attempting to copy data onto the device. There would be instances when arrays copied onto the device would be accessible, yet inaccessible during another run.
The fact I'm coding CUDA kernels on OS X gives way to a dilemma: cuda-gdb is not available (yet) on OS X. I have to rely on old school debugging techniques ... a code walk through and print statements! After numerous tests and frustrations ... I figured out I was running into a problem with global memory:
marklagatuz$ /Developer/GPU\ Computing/C/bin/darwin/release/deviceQuery
Device 0: "GeForce 9400M"
Total amount of global memory: 265945088 bytes
The above reads approximately 265MB of global memory. I had 4 arrays consisting of 67MB each being copied onto the device. I was clearly running into memory issues. This would explain why each time a different array would cause problems.
Lesson learned: Check your device(s) limitations before coding away! Then again ... you should be doing that anyways!