GPGPU programming

Forum

Forum
General C++ Programming
GPGPU programming

Hi,

This summer, I'll be getting a new desktop. It's going to be used only for programming. For the moment, I have a 15inch laptop, and it's hard working on it.

The project at which I'm working on involves AI. I have to simulate a "population" of a kind of specimens in an environment. There ought to be thousands of specimens.
I'm currently trying to create the design of the program.

Logic, tells me, that a very good way of making the program more efficient is taking advantage of the GPU.
The idea is that, the specimen's algorithm will have a lot of branches, and logic conditions, which I think will slow down the GPU. The algorithm must have a lot of branches due the specimen's theory and definition of how it behaves.

I don't know if I shall focus more on the CPU and leave behind the GPU or go on both on them.
And if you think the GPU is going to improve my code, what shall I choose: CUDA or OpenCL?
I heard that CUDA is easier to code, and also has better documentation, apart from OpenCL. On the other hand, it is said that OpenCL is way more portable.

This decision affects what I'll learn: CUDA, OpenCL or none. It'll also affect the GPU brand I'll be getting.

Thank you,
Robert

Last edited on

htirwin (1208)

CUDA definitely has advantages; more examples, libraries, and better documentation. But OpenCL actually isn't really that bad in that department. Intel, NVIDIA, and AMD, each have a whole lot of OpenCL resources; guides, examples etc. When I was making the decision which to use a while back, I chose OpenCL because I am a stickler for portability.

It's kind of strange how OpenCL is still implemented separately for each vendor: NVIDIA, AMD, INTEL. It seams like NVIDIA, even though they are a major player in developing OpenCL, like to discourage OpenCL and push you towards using CUDA. For example, you go on their website to get the OpenCL implementation, and they put you in an infinite loop of redirection. The NIVIDIA OpenCL stuff you need is part of the GPU computing SDK. You go to the GPU computing sdk site, and it says that it's available on the CUDA downloads page, but it's not. Well, you have to follow the link to legacy versions of the CUDA sdk, then 4.1, and there is a link to the GPU computing sdk, which I guess must be the latest version.

I also noticed that of Universities that offer courses covering GPGPU, usually use CUDA. It also seams more popular in many of the industries which use GPGPU because it doesn't matter if it's portable, they are building a large systems, and they just run all day to crunch numbers.

I think that if your application is targeting consumers, then it is a good idea to go for OpenCL, to not shut out a lot of potential users/customers.

Last edited on

RobertEagle (33)

1. So, choosing between CUDA and OpenCL is more a matter of what you want to target.

2. You seem to know OpenCL. From your inside knowledge, does a GPU do a better work than a CPU when it comes to thousands of "specimens", all working concurrently? Especially, when the "specimen's" code is highly branched?

I'm trying to see if the GPU computing is reliable when it comes to this. I don't want to learn a platform and then realize that I don't need it.

Last edited on

htirwin (1208)

1. CUDA is a better choice if you only care about supporting NVIDIA systems. It's not so much just a matter of who you want to target. If your program will be used on systems other than your own, then I think OpenCL is a good choice.

Then again, if you're really serious about both performance and portability, then you might release both an OpenCL version, and a CUDA version, maybe even AMD Stream.

2. GPU's are not good at highly branched code.

They do well though when each thread is independent, and they do not need to communicate with each other.

I'm not an expert though, and I haven't even seen the details of your problem. My experience with GPGPU has been for problems with little to no branching. Take my advice with a grain of salt, and get more opinions.

If processing each specimen can be done concurrently without communication, and there are no race conditions, then you might want to try and find ways to reduce branching, and go for it.

I would ask more questions at stackoverflow, or at the NVIDIA developer zone, or AMD Developer forums.

Last edited on

Topic archived. No new replies allowed.