From: Jan Simon on
Dear Pasan,
>
> > 1. dt_numbers=cellfun(@(x)datenum(x,31),dt_strings);
> > 2. dt_numbers=datenum(dt_strings,31);

Steven Lord told me, that this is not the correct calling of DATENUM:
http://www.mathworks.com/matlabcentral/newsreader/view_thread/285808

Much faster:
dt_numbers=datenum(dt_strings, 'yyyy-mm-dd HH:MM:SS');
This competes with my M-function, but the C-Mex is published soon (ready to send).

Jan
From: Pasan on

> Steven Lord told me, that this is not the correct calling of DATENUM:
> http://www.mathworks.com/matlabcentral/newsreader/view_thread/285808
>
> Much faster:
> dt_numbers=datenum(dt_strings, 'yyyy-mm-dd HH:MM:SS');

Dear Jan
thanks again for pointing out the incorrect datenum() function call. These are my results for the correct format for Matlab 2009b 64bit Intel C2D P9300.

for a datetime cell array of 602870 elements,

dt_numbers=cellfun(@(x)datenum(x,'yyyy-mm-dd HH:MM:SS'),dt_strings); ->
365 s

dt_numbers=datenum(dt_strings,'yyyy-mm-dd HH:MM:SS');
4.1 s

The dt_strings cell array has almost sequential data with intermittent gaps. Does this have any effect on the speed boost calling datenum directly with dt_strings cell array over the cellfun method?

Does this mean cellfun method should be avoided if possible?

I'm new to matlab, So I'm confused with cellfun and arrayfun. Is it true that the most functions can work on arrays and cellarrays directly much faster than used with arrayfun and cellfun?
From: Jan Simon on
Dear Pasan,

> The dt_strings cell array has almost sequential data with intermittent gaps. Does this have any effect on the speed boost calling datenum directly with dt_strings cell array over the cellfun method?

No. The conversion cannot profit from a pattern in the data.

> Does this mean cellfun method should be avoided if possible?

CELLFUN has a remarkable overhead. If does nearly this:
nC = numel(C);
for iC = 1:nC
call the function for C{iC}
end

The referencing of C{iC} takes some time and calling the function needs additional overhead.
When you call DATENUM with the cell as input, the referencing is done inside the compiled DATENUM and no further overhead appears.

> I'm new to matlab, So I'm confused with cellfun and arrayfun. Is it true that the most functions can work on arrays and cellarrays directly much faster than used with arrayfun and cellfun?

Yes - with some exceptions.
CELLFUN is very fast, if the function is defined as a string:
isreal, isempty, islogical, length, ndims, prodofsize, isclass, size
Then the job is solved inside the compiled CELLFUN. If the function is defined by a function handle, CELLFUN has to call an external function for each element.

Most functions, e.g. SIN, +, SUM, MEAN, ..., can operate on arrays directly, which is much faster than asking ARRAYFUN for help.
Some functions operate on cell strings, e.g. DATENUM or STRCMP. Then CELLFUN would waste a lot of time, if it is forced to call the external function for each element.

However, even the programmers of Matlab use CELLFUN sometimes for really strange tasks:
SAVEPATH of Matlab 2009a, line 224:
mlr_dirs = cellfun(@(x) ismember(1, x), strfind(dirnames, mlroot));
Simplified and faster:
mlr_dirs = strncmp(dirnames, mlroot, length(mlroot));

Welcome to Matlab, Jan
From: Oleg Komarov on
Jan outlined pretty well the whole picture but I believe you'll get it with time.

In general:
- try to avoid cellfun except when using 'isreal', 'isempty', 'islogical', 'length', 'ndims', 'prodofsize', 'isclass', 'size' (example: cellfun('isempty',cellArray)).

- but don't waste too much time on avoiding cellfun since the case you showed is extreme. Usually cellfun behaves well and is much more compact and readable than other solutions.

Oleg
From: Steven Lord on

"Pasan " <prasan.remove.me(a)wellassa.org> wrote in message
news:i0f39r$b5l$1(a)fred.mathworks.com...
>
>> Steven Lord told me, that this is not the correct calling of DATENUM:
>> http://www.mathworks.com/matlabcentral/newsreader/view_thread/285808
>>
>> Much faster:
>> dt_numbers=datenum(dt_strings, 'yyyy-mm-dd HH:MM:SS');
>
> Dear Jan
> thanks again for pointing out the incorrect datenum() function call. These
> are my results for the correct format for Matlab 2009b 64bit Intel C2D
> P9300.
>
> for a datetime cell array of 602870 elements,
>
> dt_numbers=cellfun(@(x)datenum(x,'yyyy-mm-dd HH:MM:SS'),dt_strings); ->
> 365 s

This calls DATENUM 602,870 times with a scalar cell array each time. This
means you're running into the overhead of calling the function six hundred
thousand times. The function calling overhead may be small, but even a
small overhead can lead to a large delay if incurred that many times.

> dt_numbers=datenum(dt_strings,'yyyy-mm-dd HH:MM:SS');
> 4.1 s

Here, you're calling DATENUM _one_ time with a large cell array, so you only
incur the function call overhead once. Thus most of the time is spent
actually computing date numbers.

> The dt_strings cell array has almost sequential data with intermittent
> gaps. Does this have any effect on the speed boost calling datenum
> directly with dt_strings cell array over the cellfun method?

I doubt it.

> Does this mean cellfun method should be avoided if possible?

In this case, DATENUM can accept a nonscalar cell array of strings, so
there's no real reason to call CELLFUN. But if you had a function that you
couldn't modify (for whatever reason) that only operates on scalar cells,
then a FOR loop or a CELLFUN call may be appropriate.

> I'm new to matlab, So I'm confused with cellfun and arrayfun. Is it true
> that the most functions can work on arrays and cellarrays directly much
> faster than used with arrayfun and cellfun?

I would say that _many_ of the functions included with MATLAB are vectorized
and can work on arrays -- but I can't speak for any of the functions that
users may have written.

--
Steve Lord
slord(a)mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ
To contact Technical Support use the Contact Us link on
http://www.mathworks.com