From: Oliver Woodford on
Hi all

Does anyone use the Jacket toolbox (by Accelereyes) for doing MATLAB computations on a CUDA (NVidia) GPUs? I notice they've recently reduced the cost of their license. Do people think it's worth forking out for?

I've been using my Tesla GPU a fair bit recently for doing faster parallel computations, and since I like MATLAB for fast development I thought it would be good to use the two together. I've been making some CUDA mex files, but wanted the GPU acceleration built in to MATLAB, so this toolbox sounds ideal.

I played around with the Jacket toolbox on trial recently, trying to speed up parallel for loops doing dynamic progamming using the gfor functionality. So, rather than having one nested for loop in another, I had a for loop nested in a gfor loop. When I finally got it to run, the gpu version was much slower (2 orders of magnitude!) than the cpu version. This is despite the fact that the inner for loop was very simple, and I know that the whole thing would work very well on a gpu if programmed in C++/CUDA.

I wondered what other people's impressions were. I know that Accelereyes are continually updating the toolbox, introducing new functionality, so maybe my particular test case will be improved soon. I'd be interested to learn what people have found is accelerated well using the toolbox.

Regards,
Oliver
From: omegayen on
"Oliver Woodford" <o.j.woodford.98(a)cantab.net> wrote in message <hsc3gk$psb$1(a)fred.mathworks.com>...
> Hi all
>
> Does anyone use the Jacket toolbox (by Accelereyes) for doing MATLAB computations on a CUDA (NVidia) GPUs? I notice they've recently reduced the cost of their license. Do people think it's worth forking out for?
>
> I've been using my Tesla GPU a fair bit recently for doing faster parallel computations, and since I like MATLAB for fast development I thought it would be good to use the two together. I've been making some CUDA mex files, but wanted the GPU acceleration built in to MATLAB, so this toolbox sounds ideal.
>
> I played around with the Jacket toolbox on trial recently, trying to speed up parallel for loops doing dynamic progamming using the gfor functionality. So, rather than having one nested for loop in another, I had a for loop nested in a gfor loop. When I finally got it to run, the gpu version was much slower (2 orders of magnitude!) than the cpu version. This is despite the fact that the inner for loop was very simple, and I know that the whole thing would work very well on a gpu if programmed in C++/CUDA.
>
> I wondered what other people's impressions were. I know that Accelereyes are continually updating the toolbox, introducing new functionality, so maybe my particular test case will be improved soon. I'd be interested to learn what people have found is accelerated well using the toolbox.
>
> Regards,
> Oliver

I never have tested out Jacket but reviewed it last year before deciding whether to purchase some high power GPUs or some CPUs. I went the CPU route because at the time mldivide was not supported in Jacket.... I see that now it is however it also requires the purchase of JacketDLA. In addition on the current website the speedup for mldivide is only around 2 times versus a CPU. So depending on your application it may still be better to go the CPU route unless you already own some GPUs that will work.

I know that recently Maple added in support for matrix multiply using GPUs, I wonder if eventually Matlab will add some of Jackets functionality into the package itself or keep things separate.
From: John Melonakos on
Hi Oliver,

Thanks for the conversation starter on this topic. I'm one of the guys that started the Jacket project, so I'm going to be careful here, but I thought I might help on a few items:

>
> Does anyone use the Jacket toolbox (by Accelereyes) for doing MATLAB computations on a CUDA (NVidia) GPUs? I notice they've recently reduced the cost of their license. Do people think it's worth forking out for?
>

We did chop prices in half recently. Your question is a good thread starter to help us know how others, especially the more avid MATLAB users that browse these forums, feel about the new price point!

> I've been using my Tesla GPU a fair bit recently for doing faster parallel computations, and since I like MATLAB for fast development I thought it would be good to use the two together. I've been making some CUDA mex files, but wanted the GPU acceleration built in to MATLAB, so this toolbox sounds ideal.
>

CUDA programming is tough (see the following blog post for a clue as to the big disparity between naive CUDA programming and expert CUDA programming, wrapped into a median filtering example): http://blog.accelereyes.com/blog/2010/03/04/median-filtering-cuda-tips-and-tricks/

When people need better performance, they can either go lower-level or go higher-level (i.e. vectorize your code) letting another compiler type tool (e.g. Jacket) optimize the code for you. There are pros/cons to both approaches. Going lower-level gives you more control, but is a time sink, leverages your own knowledge only, is not portable (e.g. what if CUDA is not the way GPU computing is done in the future...), and suffers from readability. The latter makes you reliant on the compiler, but gives you all the other features. But most compilers will do better than humans can do by hand these days, which is what we're striving for with Jacket. Of course, we're not yet there for all problems, but we're getting there... more about this below.

> I played around with the Jacket toolbox on trial recently, trying to speed up parallel for loops doing dynamic progamming using the gfor functionality. So, rather than having one nested for loop in another, I had a for loop nested in a gfor loop. When I finally got it to run, the gpu version was much slower (2 orders of magnitude!) than the cpu version. This is despite the fact that the inner for loop was very simple, and I know that the whole thing would work very well on a gpu if programmed in C++/CUDA.
>

The GFOR functionality is currently undergoing a major revamp. It was the major feature that we released when we moved from our Beta phase to Jacket 1.0, and we've not really done anything to it since then. But we will have a majorly improved version of that available in a few months. Right now, you're probably better off if you can possibly figure out how to vectorize your inner loop. For some helpful vectorization examples, see: http://wiki.accelereyes.com/wiki/index.php/Code_Vectorization_Examples

> I wondered what other people's impressions were. I know that Accelereyes are continually updating the toolbox, introducing new functionality, so maybe my particular test case will be improved soon. I'd be interested to learn what people have found is accelerated well using the toolbox.
>

We recently posted a bunch of examples from people that have found success with Jacket, which may be helpful to you, here: http://www.accelereyes.com/successstories Of course, Jacket is not yet tuned to everyone's problem (and it sounds like it's not quite ready to tackle your via the GFOR route), so we encourage people to play with the Trial version to get a feel for Jacket's utility in your own line of work.

Thanks again for the great post! I'm always happy to chat with people more by email, john.melonakos(a)accelereyes.com, if I can answer any other questions offline.

Best,

John
From: Oliver Woodford on
"John Melonakos" wrote:
> The GFOR functionality is currently undergoing a major revamp. It was the major feature that we released when we moved from our Beta phase to Jacket 1.0, and we've not really done anything to it since then. But we will have a majorly improved version of that available in a few months. Right now, you're probably better off if you can possibly figure out how to vectorize your inner loop. For some helpful vectorization examples, see: http://wiki.accelereyes.com/wiki/index.php/Code_Vectorization_Examples

Unfortunately it's not possible to parallelize the for loop in my dynamic progamming problem, as the output of each iteration is used as the input to the next iteration. However, that's good news on the GFOR functionality front. I'll be sure to check it out when the new version appears.

> We recently posted a bunch of examples from people that have found success with Jacket, which may be helpful to you, here: http://www.accelereyes.com/successstories
..

Those look interesting, but I'm after something a bit more generic, to make it easier to work out what will accelerate well. E.g. a list:

- All vector and matrix operations
- For loops with independent iterations
- Image filtering
etc.

but slightly higher level than that, though not quite application specific.

Best,
Oliver