dynamic regexp [Matlab]

Prev: fmincon and Hessian
Next: Concatenate from a for loop

From: Walter Roberson on 25 Mar 2010 17:55

Greg Thom wrote:

> corrected the last post with this
>
> [toks mat] = regexp(str1,'Test-Results:
> ,\w*>\w*,,,(\d+),(??@repmat(''(\d+),([-+]?\d*\.?\d*),'', 1,
> str2num($1)))3,','tokens')

Oh, good point about using str2num there! (though str2double would be more
efficient, as str2num uses eval.)

> but can anyone explain why is it returning empty , I know for sure that
> the repmat is executing correctly , what I don't know is what the final
> regexp expression looks like, is there anyway to debug and view the
> actuall regexp string that is executing ?

For future reference of anyone who might be following this thread: the repmat
part could be debugged by calling instead a user-written function that
reported on its inputs and then did the repmat.

As to why the regexp is not working: the posting from the fellow from
Mathworks saying that tokens cannot be captured within dynamic regexp patterns
is the key. But if you toss a () around the (??@...) expression, then the
overall pattern matched by the dynamic expression should be returned, all as
one piece. You'd then have to break it up, but that could be done by cellfun
of regexp() with a pattern of ',' and the parameter 'split' .

Okay, let's simplify this whole lot. Since you are going to have to post split
anyhow, you don't care how many there are on the line or what they look
like. So...

[toks, mat] = regexp(str1, '^.+?,,\d+,(.+),\d+,$', 'tokens');

That is, skip everything until you find two commas followed by a number (the ?
after the .+ makes it a "lazy quantifier"), skip over the number (which is the
count of the number of pairs), skip the comma after that, then capture
everything heading towards the end of line, but back up and skip over the last
comma number comma at the end of line.

Then take the tokens that result and split them at the commas. Unless you have
a corrupted entry, you will automatically get the correct number of pairs.

Okay, now I'm going to make it even more simple, *provided* that the value is
certain to have a decimal point somewhere in it:

pairs = regexp(str1, '(?<pt>\d+),(?<val>[-+]?\d*\.\d*)', 'names');

This will return a structure array with fields pt and val, so
pairs(1).pt = '1'
pairs(1).val = '3.5400'
pairs(2).pt = '2'

and so on.

However, if the value parameter might look just like an integer, then we
cannot use this automatic splitting on str1 as such. But in that case, you could:

leadinlen = regexp(str1, ',,,\d+', 'end');
pairs = regexp(str1(leadinlen+1:end), '(?<pt>\d+),(?<val>[^,]+)', 'names');

I think you'll find this approach much easier than continuing with dynamic
patterns.

First | Prev |
Pages: 1 2
Prev: fmincon and Hessian
Next: Concatenate from a for loop