regexprep [Matlab]

Prev: Tanengrad (Tanenbaum`s Method)
Next: Could not start JVM while installing

From: Oleg Komarov on 23 Dec 2009 07:19

> cellfun(@(x) regexp(x,'(\S+)(\d+)','match'),myCellstr,'un',0);
>
> Branko
Thanks Branko!
just two thoughts:
cellfun(@(x) regexp(x,'(\S+)(\d+)','match'),myCellstr); % no need to enclose in cell output
this solutions is slightly slower on my system.

Oleg

From: Jason Breslau on 23 Dec 2009 09:36

regexp and regexprep both support cell arrays, so you don't need the
call to cellfun:

regexprep(myCellstr,'\([A-z]+\)','')

-=>J

From: Jan Simon on 23 Dec 2009 16:15

Dear Oleg!

> regexprep(myCellstr,'\([A-z]+\)','')

REGEXPREP can be slow for large cell strings. So let's trying something different:
myStr = [myCellStr{:}];
newCellStr = dataread('string', myStr, '%s', 'delimiter', '()');
cleanCellStr = newCellStr(1:2:length(newCellStr));

For large cell strings, the [C{:}] can be accelerated by CStr2String:
http://www.mathworks.com/matlabcentral/fileexchange/26077

I'll try it with some test data and post the times soon. Jan

From: Jan Simon on 23 Dec 2009 16:40

Dear Oleg!
>
> > regexprep(myCellstr,'\([A-z]+\)','')
>
> REGEXPREP can be slow for large cell strings. So let's trying something different:
> myStr = [myCellStr{:}];
> newCellStr = dataread('string', myStr, '%s', 'delimiter', '()');
> cleanCellStr = newCellStr(1:2:length(newCellStr));
>
> For large cell strings, the [C{:}] can be accelerated by CStr2String:
> http://www.mathworks.com/matlabcentral/fileexchange/26077

Nope, that does not help in Matlab 2009a anymore, but in Matlab 6.5. The REGEXPREP is twice as fast as the DATAREAD(CAT) method for a {10000 x 1} cell now.

Another idea:
Has the initial part always the same lengths?
cleanCellStr = dataread('string', sprintf('%.12s#', myCellStr{:}), '%s', 'delimiter', '#');
But forget this: REGEXPREP is 12 times faster...

Better:
cleanCellStr = cellfun(@(x) x(1:12), myCellstr, 'UniformOutput', false);
This is 3.5 times faster than the REGEXPREP, but assuming the equal length of the inital part may be too sloppy. More general:
cleanCellStr = cellfun(@(x) x(1:findstr(x, 'C')), myCellstr, 'UniformOutput', false);
This is at least 35% faster than the REGEXPREP method, but it fails if there is not exactly one opening bracket.

Kind regards, Jan

From: Jan Simon on 23 Dec 2009 19:11

Typo:

Replace 'C' by '(':
cleanCellStr = cellfun(@(x) x(1:findstr(x, 'C')), myCellstr, 'UniformOutput', false);
==>
cleanCellStr = cellfun(@(x) x(1:findstr(x, '(')), myCellstr, 'UniformOutput', false);

Jan

|
Pages: 1
Prev: Tanengrad (Tanenbaum`s Method)
Next: Could not start JVM while installing