Prev: Vehicle Detection
Next: Optimization of a loop
From: Dan on 4 Aug 2010 15:52 Dear all, I am new to mex file and was wondering if it is possible to have an easy solution to the following data remapping problem. I have one large array A containing 32,000,000 complex numbers (double type) and another same size array of index I that map the data to a smaller size array B of 16,777,216 complex elements. In the matlab, it looks like this: B = zeros(16777216,1)+j*ones(16777216,1)*10^-20; %initialize B to a complex array numRepeat = zeros(16777216,1,'uint32'); % store the number of elements that are duplicate at the same index in B for ii = 1:length(A) index = I(ii); B(index) = (B(index)*numRepeat+A(ii))/(numRepeat + 1); end Because there is a lot of duplicate elements in A that has the same index in B need to be mapped to the same element, average value of all elements is used for that B element. The problem with this approach is that it is very slow because of the long for loop. I was wondering if it is possible to write a mex file to speed up the loop. However, one thing that causes the problem for me now is that the size of array is very large, if everytime B and I are passed into and then out of the mex file, it will do many numbers of array copies which slows down the whole process. Is there any where to avoid that and somehow treat the arrays as a global variable and only pointers are needed to operate on the elements of the array? Any input is appreciated. Best regards, Dan
From: Walter Roberson on 4 Aug 2010 16:08 Dan wrote: > I am new to mex file and was wondering if it is possible to have an easy > solution to the following data remapping problem. I have one large array > A containing 32,000,000 complex numbers (double type) and another same > size array of index I that map the data to a smaller size array B of > 16,777,216 complex elements. In the matlab, it looks like this: > > B = zeros(16777216,1)+j*ones(16777216,1)*10^-20; %initialize B to > a complex array > numRepeat = zeros(16777216,1,'uint32'); % store the number of > elements that are duplicate at the same index in B > for ii = 1:length(A) > index = I(ii); > B(index) = (B(index)*numRepeat+A(ii))/(numRepeat + 1); > end > > Because there is a lot of duplicate elements in A that has the same > index in B need to be mapped to the same element, average value of all > elements is used for that B element. I do not follow what your code is doing, but it looks suspicious. numRepeat is a vector, so multiplying that by B(index) will be a vector, adding A(ii) will keep it as a vector, and then you use the "/" operator against what is a vector on the bottom. The numerator of the / operation will be a column vector and the divisor is a column vector of the same size: the result of that "/" operation is going to try to be a square matrix with the same number of rows and columns (each) as the length of that column vector. If _somehow_ you managed to have enough memory for that, it would then try to store that entire array into the single array element B(index), and that would surely fail. Remember, "/" is the "matrix right divide" operator. Based upon your text description of what you are trying to do, it seems plausible to me that perhaps you may wish to use accumarray, possibly B = accumarray(I, A, @avg);
From: Dan on 4 Aug 2010 16:41 Thank you for the quick response. Sorry I made some mistake in the original posting. Here should be the correct code: B = zeros(16777216,1)+j*ones(16777216,1)*10^-20; %initialize B to a complex array numRepeat = zeros(16777216,1,'uint32'); % store the count of elements that are duplicate at the same index in B for ii = 1:length(A) index = I(ii); B(index) = (B(index)*numRepeat(index)+A(ii))/(numRepeat(index) + 1); numRepeat(index) = numRepeat(index)+1; end It seems that the function you mentioned should work. I will try exploring that option. A related question is that in the case of the Index array is not an integer, I would like to perform interpolation. For example, if the 100th element of A, its value is 1000 and its corresponding index I(100) = 56.8. I would like to split the A value into two elements to store in B with a weighing factor, that is: B(56) = (0.8*numRepeat(56) + A(100))/(numRepeat(56)+1); B(57) = (0.2*numRepeat(57) + A(100))/(numRepeat(57)+1); Is there a similar function like accumarray to do just that (remapping and linear interpolation)? If not, do I need to go with mex file? Thanks a lot for your help. Dan
From: Walter Roberson on 4 Aug 2010 17:00 Dan wrote: > It seems that the function you mentioned should work. I will try > exploring that option. A related question is that in the case of the > Index array is not an integer, I would like to perform interpolation. > For example, if the 100th element of A, its value is 1000 and its > corresponding index I(100) = 56.8. I would like to split the A value > into two elements to store in B with a weighing factor, that is: > > B(56) = (0.8*numRepeat(56) + A(100))/(numRepeat(56)+1); > B(57) = (0.2*numRepeat(57) + A(100))/(numRepeat(57)+1); > > Is there a similar function like accumarray to do just that (remapping > and linear interpolation)? If not, do I need to go with mex file? Correction: earlier where I wrote @avg I should have written @mean mod1 = mod(I, 1); Bsum = accumarray([I;I+1], [A.*(1-mod1); A.*mod1]); However, you would now have to figure out what kind of averaging you want to use. Is it fair to use a full extra count of 1 for an instance that is only contributing 0.2 of a sample ? If it is fair, then why would it not also be fair to use a full extra repeat count of 1 for an instance that is contributing eps() of a simple, or contributing 0 of a sample? I deliberately omitted the @mean call in the above (letting it default to @sum) because an index I(K) that happens to be exactly an integer would contribute a full sample to the location I(K) and would also contribute a 0 to the location I(K)+1 and I suspect that you do not want that 0 to count towards the repeat count in that case... but if you don't, then what if the index value is not _exactly_ an integer, and so on. If your A values cannot be 0 naturally, then you could define meanNonZero = @(V) mean(V(V~=0)); and use meanNonZero as the function to supply to accumarray . If you do that, though, watch out as you could end up with some NaN values.
From: James Tursa on 4 Aug 2010 17:18
"Dan " <danfuppd(a)gmaill.com> wrote in message <i3cgd4$iv8$1(a)fred.mathworks.com>... > > However, one thing that causes the problem for me now is that the size of array is very large, if everytime B and I are passed into and then out of the mex file, it will do many numbers of array copies which slows down the whole process. Is there any where to avoid that and somehow treat the arrays as a global variable and only pointers are needed to operate on the elements of the array? Arguments passed to mex files are passed by reference ... there are no copies made. James Tursa |