From: francios hoo on
hello,
i am trying to do a programme that will execute 2 parts of looping.
part A: create a solution A each iteration whereby each iteration requires 30minutes.
part B: require to wait for the solution A and the simulation of part B only need around 15 seconds for all the iteration.
hence, in overall, my idea is running part B using solution A of iteration i while part A is running iteration i+1

however, if i combine the part A and part B, i would need one day to run it because (30x60s) x 15s = 2700s(7.5 hours) for every single iteration!!!

thank you very much
francios
From: Walter Roberson on
francios hoo wrote:
> hello,
> i am trying to do a programme that will execute 2 parts of looping.
> part A: create a solution A each iteration whereby each iteration
> requires 30minutes.
> part B: require to wait for the solution A and the simulation of part B
> only need around 15 seconds for all the iteration.
> hence, in overall, my idea is running part B using solution A of
> iteration i while part A is running iteration i+1
>
> however, if i combine the part A and part B, i would need one day to run
> it because (30x60s) x 15s = 2700s(7.5 hours) for every single iteration!!!

I am not clear as to why you are multiplying by 15 seconds instead of
adding 15 seconds??
From: Saurabh Mahapatra on
These are my thoughts: It all depends on how you distribute your tasks, the architecture of your computing stages and how much processors you are willing to buy (or have access to).

For example, let us explore a two stage architecture consisting of M processors in first stage and N processors in the second. I am also assuming that you are using a scheduling algorithm that treats every task equally.

1. Let us say you parallelize first part of the problem on M processors. For the first pass, you will have to wait 30 minutes before you get A's from each of these M processors.

2. Say you farm these out to N processors that take 15 seconds to give you the result for each A. Please choose the correct value for this. I am not sure if this is 15s or not. If this is significantly less that what I am assuming for each A, this may be an overkill. You ignore rest of my argument and simply use 1 processor in the second stage and work on parallelizing the first stage across as many M processors that you can get.

However, let us assume 15 seconds. Since stage 2 is faster than stage 1, the number of processors you should use to maintain the "flow" or maximum utility would need to be calculated below:

The N processor stage has 30 minutes before it gets clogged by the next M inputs. If we parallelize M inputs, we reduce the time from 15*M to 15*M/N

For maximum utility, these two should be equal:
30*60=15*M/N

N=15*M/(30*60)=M/120;

For your problem, it seems (with the 15s constraint on the second stage for each A computation), you can get maximum utility if you use M=120 processors (N=1). For M<120 and this architecture, your strategy should be on trying to parallelize the first stage and ignore the second.

Another strategy could be: create M A's from stage 1. Then pass B through them. You will only need 15 seconds. This may look very optimal if you have a large dataset of size Z, 15*Z can be a huge overhead. But you save buying N extra machines.

Thanks,

Saurabh