Regex driving me crazy... [Python]

Prev: remote multiprocessing, shared object
Next: ftp and python

From: J on 7 Apr 2010 17:40

Can someone make me un-crazy?

I have a bit of code that right now, looks like this:

status = getoutput('smartctl -l selftest /dev/sda').splitlines()[6]
status = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', ":",status)
print status

Basically, it pulls the first actual line of data from the return you
get when you use smartctl to look at a hard disk's selftest log.

The raw data looks like this:

# 1 Short offline Completed without error 00% 679 -

Unfortunately, all that whitespace is arbitrary single space
characters. And I am interested in the string that appears in the
third column, which changes as the test runs and then completes. So
in the example, "Completed without error"

The regex I have up there doesn't quite work, as it seems to be
subbing EVERY space (or at least in instances of more than one space)
to a ':' like this:

# 1: Short offline:::::: Completed without error:::::: 00%:::::: 679:::::::: -

Ultimately, what I'm trying to do is either replace any space that is
> one space wiht a delimiter, then split the result into a list and
get the third item.

OR, if there's a smarter, shorter, or better way of doing it, I'd love to know.

The end result should pull the whole string in the middle of that
output line, and then I can use that to compare to a list of possible
output strings to determine if the test is still running, has
completed successfully, or failed.

Unfortunately, my google-fu fails right now, and my Regex powers were
always rather weak anyway...

So any ideas on what the best way to proceed with this would be?

From: Grant Edwards on 7 Apr 2010 17:47

On 2010-04-07, J <dreadpiratejeff(a)gmail.com> wrote:

> Can someone make me un-crazy?

Definitely. Regex is driving you crazy, so don't use a regex.

inputString = "# 1 Short offline Completed without error 00% 679 -"

print ' '.join(inputString.split()[4:-3])

> So any ideas on what the best way to proceed with this would be?

Anytime you have a problem with a regex, the first thing you should
ask yourself: "do I really, _really_ need a regex?

Hint: the answer is usually "no".

--
Grant Edwards grant.b.edwards Yow! I'm continually AMAZED
at at th'breathtaking effects
gmail.com of WIND EROSION!!

From: Patrick Maupin on 7 Apr 2010 20:49

On Apr 7, 4:40 pm, J <dreadpiratej...(a)gmail.com> wrote:
> Can someone make me un-crazy?
>
> I have a bit of code that right now, looks like this:
>
> status = getoutput('smartctl -l selftest /dev/sda').splitlines()[6]
> status = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', ":",status)
> print status
>
> Basically, it pulls the first actual line of data from the return you
> get when you use smartctl to look at a hard disk's selftest log.
>
> The raw data looks like this:
>
> # 1 Short offline Completed without error 00% 679 -
>
> Unfortunately, all that whitespace is arbitrary single space
> characters. And I am interested in the string that appears in the
> third column, which changes as the test runs and then completes. So
> in the example, "Completed without error"
>
> The regex I have up there doesn't quite work, as it seems to be
> subbing EVERY space (or at least in instances of more than one space)
> to a ':' like this:
>
> # 1: Short offline:::::: Completed without error:::::: 00%:::::: 679:::::::: -
>
> Ultimately, what I'm trying to do is either replace any space that is> one space wiht a delimiter, then split the result into a list and
>
> get the third item.
>
> OR, if there's a smarter, shorter, or better way of doing it, I'd love to know.
>
> The end result should pull the whole string in the middle of that
> output line, and then I can use that to compare to a list of possible
> output strings to determine if the test is still running, has
> completed successfully, or failed.
>
> Unfortunately, my google-fu fails right now, and my Regex powers were
> always rather weak anyway...
>
> So any ideas on what the best way to proceed with this would be?

You mean like this?

>>> import re
>>> re.split(' {2,}', '# 1 Short offline Completed without error 00%')
['# 1', 'Short offline', 'Completed without error', '00%']
>>>

Regards,
Pat

From: Patrick Maupin on 7 Apr 2010 20:50

On Apr 7, 4:47 pm, Grant Edwards <inva...(a)invalid.invalid> wrote:
> On 2010-04-07, J <dreadpiratej...(a)gmail.com> wrote:
>
> > Can someone make me un-crazy?
>
> Definitely. Regex is driving you crazy, so don't use a regex.
>
> inputString = "# 1 Short offline Completed without error 00% 679 -"
>
> print ' '.join(inputString.split()[4:-3])
>
> > So any ideas on what the best way to proceed with this would be?
>
> Anytime you have a problem with a regex, the first thing you should
> ask yourself: "do I really, _really_ need a regex?
>
> Hint: the answer is usually "no".
>
> --
> Grant Edwards grant.b.edwards Yow! I'm continually AMAZED
> at at th'breathtaking effects
> gmail.com of WIND EROSION!!

OK, fine. Post a better solution to this problem than:

>>> import re
>>> re.split(' {2,}', '# 1 Short offline Completed without error 00%')
['# 1', 'Short offline', 'Completed without error', '00%']
>>>

Regards,
Pat

From: Patrick Maupin on 7 Apr 2010 21:03

On Apr 7, 7:49 pm, Patrick Maupin <pmau...(a)gmail.com> wrote:
> On Apr 7, 4:40 pm, J <dreadpiratej...(a)gmail.com> wrote:
>
>
>
> > Can someone make me un-crazy?
>
> > I have a bit of code that right now, looks like this:
>
> > status = getoutput('smartctl -l selftest /dev/sda').splitlines()[6]
> > status = re.sub(' (?= )(?=([^"]*"[^"]*")*[^"]*$)', ":",status)
> > print status
>
> > Basically, it pulls the first actual line of data from the return you
> > get when you use smartctl to look at a hard disk's selftest log.
>
> > The raw data looks like this:
>
> > # 1 Short offline Completed without error 00% 679 -
>
> > Unfortunately, all that whitespace is arbitrary single space
> > characters. And I am interested in the string that appears in the
> > third column, which changes as the test runs and then completes. So
> > in the example, "Completed without error"
>
> > The regex I have up there doesn't quite work, as it seems to be
> > subbing EVERY space (or at least in instances of more than one space)
> > to a ':' like this:
>
> > # 1: Short offline:::::: Completed without error:::::: 00%:::::: 679:::::::: -
>
> > Ultimately, what I'm trying to do is either replace any space that is> one space wiht a delimiter, then split the result into a list and
>
> > get the third item.
>
> > OR, if there's a smarter, shorter, or better way of doing it, I'd love to know.
>
> > The end result should pull the whole string in the middle of that
> > output line, and then I can use that to compare to a list of possible
> > output strings to determine if the test is still running, has
> > completed successfully, or failed.
>
> > Unfortunately, my google-fu fails right now, and my Regex powers were
> > always rather weak anyway...
>
> > So any ideas on what the best way to proceed with this would be?
>
> You mean like this?
>
> >>> import re
> >>> re.split(' {2,}', '# 1 Short offline Completed without error 00%')
>
> ['# 1', 'Short offline', 'Completed without error', '00%']
>
>
>
> Regards,
> Pat

BTW, although I find it annoying when people say "don't do that" when
"that" is a perfectly good thing to do, and although I also find it
annoying when people tell you what not to do without telling you what
*to* do, and although I find the regex solution to this problem to be
quite clean, the equivalent non-regex solution is not terrible, so I
will present it as well, for your viewing pleasure:

>>> [x for x in '# 1 Short offline Completed without error 00%'.split(' ') if x.strip()]
['# 1', 'Short offline', ' Completed without error', ' 00%']

Regards,
Pat

| Next | Last
Pages: 1 2 3 4
Prev: remote multiprocessing, shared object
Next: ftp and python