Mex file to speed up a data remapping problem [Matlab]

Prev: Vehicle Detection
Next: Optimization of a loop

From: Dan on 4 Aug 2010 15:52

Dear all,

I am new to mex file and was wondering if it is possible to have an easy solution to the following data remapping problem. I have one large array A containing 32,000,000 complex numbers (double type) and another same size array of index I that map the data to a smaller size array B of 16,777,216 complex elements. In the matlab, it looks like this:

B = zeros(16777216,1)+j*ones(16777216,1)*10^-20; %initialize B to a complex array
numRepeat = zeros(16777216,1,'uint32'); % store the number of elements that are duplicate at the same index in B
for ii = 1:length(A)
index = I(ii);
B(index) = (B(index)*numRepeat+A(ii))/(numRepeat + 1);
end

Because there is a lot of duplicate elements in A that has the same index in B need to be mapped to the same element, average value of all elements is used for that B element. The problem with this approach is that it is very slow because of the long for loop. I was wondering if it is possible to write a mex file to speed up the loop. However, one thing that causes the problem for me now is that the size of array is very large, if everytime B and I are passed into and then out of the mex file, it will do many numbers of array copies which slows down the whole process. Is there any where to avoid that and somehow treat the arrays as a global variable and only pointers are needed to operate on the elements of the array?
Any input is appreciated.

Best regards,
Dan

From: Walter Roberson on 4 Aug 2010 16:08

Dan wrote:

> I am new to mex file and was wondering if it is possible to have an easy
> solution to the following data remapping problem. I have one large array
> A containing 32,000,000 complex numbers (double type) and another same
> size array of index I that map the data to a smaller size array B of
> 16,777,216 complex elements. In the matlab, it looks like this:
>
> B = zeros(16777216,1)+j*ones(16777216,1)*10^-20; %initialize B to
> a complex array
> numRepeat = zeros(16777216,1,'uint32'); % store the number of
> elements that are duplicate at the same index in B
> for ii = 1:length(A)
> index = I(ii);
> B(index) = (B(index)*numRepeat+A(ii))/(numRepeat + 1);
> end
>
> Because there is a lot of duplicate elements in A that has the same
> index in B need to be mapped to the same element, average value of all
> elements is used for that B element.

I do not follow what your code is doing, but it looks suspicious.
numRepeat is a vector, so multiplying that by B(index) will be a vector,
adding A(ii) will keep it as a vector, and then you use the "/" operator
against what is a vector on the bottom. The numerator of the / operation will
be a column vector and the divisor is a column vector of the same size: the
result of that "/" operation is going to try to be a square matrix with the
same number of rows and columns (each) as the length of that column vector. If
_somehow_ you managed to have enough memory for that, it would then try to
store that entire array into the single array element B(index), and that would
surely fail.

Remember, "/" is the "matrix right divide" operator.

Based upon your text description of what you are trying to do, it seems
plausible to me that perhaps you may wish to use accumarray, possibly

B = accumarray(I, A, @avg);

From: Dan on 4 Aug 2010 16:41

Thank you for the quick response. Sorry I made some mistake in the original posting. Here should be the correct code:
B = zeros(16777216,1)+j*ones(16777216,1)*10^-20; %initialize B to a complex array
numRepeat = zeros(16777216,1,'uint32'); % store the count of elements that are duplicate at the same index in B
for ii = 1:length(A)
index = I(ii);
B(index) = (B(index)*numRepeat(index)+A(ii))/(numRepeat(index) + 1);
numRepeat(index) = numRepeat(index)+1;
end

It seems that the function you mentioned should work. I will try exploring that option. A related question is that in the case of the Index array is not an integer, I would like to perform interpolation. For example, if the 100th element of A, its value is 1000 and its corresponding index I(100) = 56.8. I would like to split the A value into two elements to store in B with a weighing factor, that is:

B(56) = (0.8*numRepeat(56) + A(100))/(numRepeat(56)+1);
B(57) = (0.2*numRepeat(57) + A(100))/(numRepeat(57)+1);

Is there a similar function like accumarray to do just that (remapping and linear interpolation)? If not, do I need to go with mex file?

Thanks a lot for your help.
Dan

From: Walter Roberson on 4 Aug 2010 17:00

Dan wrote:

> It seems that the function you mentioned should work. I will try
> exploring that option. A related question is that in the case of the
> Index array is not an integer, I would like to perform interpolation.
> For example, if the 100th element of A, its value is 1000 and its
> corresponding index I(100) = 56.8. I would like to split the A value
> into two elements to store in B with a weighing factor, that is:
>
> B(56) = (0.8*numRepeat(56) + A(100))/(numRepeat(56)+1);
> B(57) = (0.2*numRepeat(57) + A(100))/(numRepeat(57)+1);
>
> Is there a similar function like accumarray to do just that (remapping
> and linear interpolation)? If not, do I need to go with mex file?

Correction: earlier where I wrote @avg I should have written @mean

mod1 = mod(I, 1);
Bsum = accumarray([I;I+1], [A.*(1-mod1); A.*mod1]);

However, you would now have to figure out what kind of averaging you want to
use. Is it fair to use a full extra count of 1 for an instance that is only
contributing 0.2 of a sample ? If it is fair, then why would it not also be
fair to use a full extra repeat count of 1 for an instance that is
contributing eps() of a simple, or contributing 0 of a sample?

I deliberately omitted the @mean call in the above (letting it default to
@sum) because an index I(K) that happens to be exactly an integer would
contribute a full sample to the location I(K) and would also contribute a 0 to
the location I(K)+1 and I suspect that you do not want that 0 to count towards
the repeat count in that case... but if you don't, then what if the index
value is not _exactly_ an integer, and so on.

If your A values cannot be 0 naturally, then you could define

meanNonZero = @(V) mean(V(V~=0));

and use meanNonZero as the function to supply to accumarray . If you do that,
though, watch out as you could end up with some NaN values.

From: James Tursa on 4 Aug 2010 17:18

"Dan " <danfuppd(a)gmaill.com> wrote in message <i3cgd4$iv8$1(a)fred.mathworks.com>...
>
> However, one thing that causes the problem for me now is that the size of array is very large, if everytime B and I are passed into and then out of the mex file, it will do many numbers of array copies which slows down the whole process. Is there any where to avoid that and somehow treat the arrays as a global variable and only pointers are needed to operate on the elements of the array?

Arguments passed to mex files are passed by reference ... there are no copies made.

James Tursa

| Next | Last
Pages: 1 2 3
Prev: Vehicle Detection
Next: Optimization of a loop