Matching strings in a line and printing them [Shell]

Prev: awk and getline
Next: awk "not contain"

From: Lao Ming on 18 May 2010 18:00

On May 17, 8:35 pm, Jon LaBadie <jlaba...(a)aXcXm.org> wrote:
> jplee3 wrote:
> > Hi all,
>
> > I was wondering if there's a way to match more than one string in a
> > line and then print/output the strings to the screen or a file, etc.
>
> > I'm trying to achieve something similar to "grep -o "matching_string"
> > file123" but with more than one string, and where each matching string
> > is printed subsequently in a line.
>
> > So I would want to match "matching_string1" and "matching_string2"
> > from one line and print both those matches on a single line.
> > Effectively, cutting out the text I don't want and printing out only
> > what I want from a line.
>
> > Is this even possible with grep or awk?
>
> sed -n 's/.*$matching_string1$.*$matching_string2$.*/\1\2/p' input_file

This is kind of a long-sought after command for me but it outputs the
two strings as:

matching_string1matching_string2

which would be difficult to use without a delimiter. I tried to do
this myself but got "garbled" errors.
What would one do make the output delimited? What if one or both of
the matching_strings were variables? How would that be implemented?

Thanks.

From: Janis Papanagnou on 18 May 2010 19:49

Lao Ming wrote:
> On May 17, 8:35 pm, Jon LaBadie <jlaba...(a)aXcXm.org> wrote:
>> jplee3 wrote:
>>> Hi all,
>>> I was wondering if there's a way to match more than one string in a
>>> line and then print/output the strings to the screen or a file, etc.
>>> I'm trying to achieve something similar to "grep -o "matching_string"
>>> file123" but with more than one string, and where each matching string
>>> is printed subsequently in a line.
>>> So I would want to match "matching_string1" and "matching_string2"
>>> from one line and print both those matches on a single line.
>>> Effectively, cutting out the text I don't want and printing out only
>>> what I want from a line.
>>> Is this even possible with grep or awk?
>> sed -n 's/.*$matching_string1$.*$matching_string2$.*/\1\2/p' input_file
>
> This is kind of a long-sought after command for me but it outputs the
> two strings as:
>
> matching_string1matching_string2
>
> which would be difficult to use without a delimiter. I tried to do
> this myself but got "garbled" errors.
> What would one do make the output delimited?

sed -n 's/.*$matching_string1$.*$matching_string2$.*/\1 \2/p'

> What if one or both of
> the matching_strings were variables? How would that be implemented?

var1=matching_string1
var2=matching_string2
sed -n "s/.*$$var1$.*$$var2$.*/\1 \2/p"

But keep in mind that Jon's basic solution (and the derived ones that I
posted) require strict ordering of string1 and string2; the proposed way
wouldn't work with lines where string2 comes before string1 in a line.

Janis

>
> Thanks.

From: Thomas 'PointedEars' Lahn on 18 May 2010 21:44

Janis Papanagnou wrote:

> Lao Ming wrote:
>> What if one or both of the matching_strings were variables? How would
>> that be implemented?
>
> var1=matching_string1
> var2=matching_string2
> sed -n "s/.*$$var1$.*$$var2$.*/\1 \2/p"
>
> But keep in mind that Jon's basic solution (and the derived ones that I
> posted) require strict ordering of string1 and string2; the proposed way
> wouldn't work with lines where string2 comes before string1 in a line.

But that can be easily remedied:

var1='matching_string1'
var2='matching_string2'

# POSIX-compliant
sed -ne "s/.*$$var1$.*$$var2$.*/\1 \2/p" \
-e "s/.*$$var2$.*$$var1$.*/\1 \2/p" ...

# or
sed -n "s/.*$$var1$.*$$var2$.*/\1 \2/p;
s/.*$$var2$.*$$var1$.*/\1 \2/p" ...

# GNU sed
sed -n "s/.*$\($var1$.*$$var2$\|$$var2$.*$$var1$\).*/\2\4 \3\5/p"\
...

PointedEars

From: Michael Paoli on 20 May 2010 12:50

On May 18, 10:12am, jplee3 <jplee3(a)gmail.com> wrote:
> On May 18, 10:00am, jplee3 <jpl...(a)gmail.com> wrote:
> > On May 17, 9:31pm, Ed Morton <mortons...(a)gmail.com> wrote:
> > > On 5/17/2010 8:05 PM, jplee3 wrote:
> > > > I was wondering if there's a way to match more than one string in a
> > > > line and then print/output the strings to the screen or a file, etc.
> > > Do you mean find a string and print that string or do you mean find a regular
> > > expression and print the string that matches it?
> > > > I'm trying to achieve something similar to "grep -o "matching_string"
> > > > file123" but with more than one string, and where each matching string
> > > > is printed subsequently in a line.
> > > > So I would want to match "matching_string1" and "matching_string2"
> > > > from one line and print both those matches on a single line.
> > > > Effectively, cutting out the text I don't want and printing out only
> > > > what I want from a line.
> > > > Is this even possible with grep or awk?
> > > Yes, but provide some small sample input and the expected output given that
> > > input so we're not guessing too much.
> > Sorry for the confusion. It is in fact a regex string that I would be
> > looking for.
> > For instance:
> > say the original line is: "<ID>=apache_server1_12345678,
> > <ID2>=blahblahblah, <ID3>=FUBAR=apache_server2_12345678|
> > IPADDRESS=192.168.1.1|MSG=hello"
> > there are hundreds of lines like these...
> > I want to extract so that I will get the following result from each
> > line (a list of IPs and hostnames): "192.168.1.1
> > apache_server1_12345678"
> > Thanks guys!
> Sorry, the formatting is screwed up on this one: I would want the list
> to be in this format "192.168.1.1 server1" - where there's a space or
> tab delimiter (not a carriage return/newline). Also, I'd want to
> extract "server1" from "apache_server1_12345678"
> Jon LaBadie's (thanks Jon!) sed command partially worked, but there's
> no delimiter between the server name and IP. Also, I'm not 100% sure
> what the regex would look like. For the IP address, I was using
> "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" for "grep -o"
> but not sure if that will differ for what I'm trying to do. I'm not
> sure what the regex for extracting the server hostname would be
> either.

sed -ne 's/^.*<ID>=apache_$[^_][^_]*$.* IPADDRESS=$[.0-9][.0-9]*$.*
$/\2 \1/p' << \__EOT__
<ID>=apache_server1_12345678, <ID2>=blahblahblah,
<ID3>=FUBAR=apache_server2_12345678| IPADDRESS=192.168.1.1|MSG=hello
192.168.1.1 server1

From: jplee3 on 20 May 2010 17:40

On May 20, 9:50 am, Michael Paoli <michael1...(a)yahoo.com> wrote:
> On May 18, 10:12am, jplee3 <jpl...(a)gmail.com> wrote:
>
> > On May 18, 10:00am, jplee3 <jpl...(a)gmail.com> wrote:
> > > On May 17, 9:31pm, Ed Morton <mortons...(a)gmail.com> wrote:
> > > > On 5/17/2010 8:05 PM, jplee3 wrote:
> > > > > I was wondering if there's a way to match more than one string in a
> > > > > line and then print/output the strings to the screen or a file, etc.
> > > > Do you mean find a string and print that string or do you mean find a regular
> > > > expression and print the string that matches it?
> > > > > I'm trying to achieve something similar to "grep -o "matching_string"
> > > > > file123" but with more than one string, and where each matching string
> > > > > is printed subsequently in a line.
> > > > > So I would want to match "matching_string1" and "matching_string2"
> > > > > from one line and print both those matches on a single line.
> > > > > Effectively, cutting out the text I don't want and printing out only
> > > > > what I want from a line.
> > > > > Is this even possible with grep or awk?
> > > > Yes, but provide some small sample input and the expected output given that
> > > > input so we're not guessing too much.
> > > Sorry for the confusion. It is in fact a regex string that I would be
> > > looking for.
> > > For instance:
> > > say the original line is: "<ID>=apache_server1_12345678,
> > > <ID2>=blahblahblah, <ID3>=FUBAR=apache_server2_12345678|
> > > IPADDRESS=192.168.1.1|MSG=hello"
> > > there are hundreds of lines like these...
> > > I want to extract so that I will get the following result from each
> > > line (a list of IPs and hostnames): "192.168.1.1
> > > apache_server1_12345678"
> > > Thanks guys!
> > Sorry, the formatting is screwed up on this one: I would want the list
> > to be in this format "192.168.1.1 server1" - where there's a space or
> > tab delimiter (not a carriage return/newline). Also, I'd want to
> > extract "server1" from "apache_server1_12345678"
> > Jon LaBadie's (thanks Jon!) sed command partially worked, but there's
> > no delimiter between the server name and IP. Also, I'm not 100% sure
> > what the regex would look like. For the IP address, I was using
> > "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" for "grep -o"
> > but not sure if that will differ for what I'm trying to do. I'm not
> > sure what the regex for extracting the server hostname would be
> > either.
>
> sed -ne 's/^.*<ID>=apache_$[^_][^_]*$.* IPADDRESS=$[.0-9][.0-9]*$..*
> $/\2 \1/p' << \__EOT__
> <ID>=apache_server1_12345678, <ID2>=blahblahblah,
> <ID3>=FUBAR=apache_server2_12345678| IPADDRESS=192.168.1.1|MSG=hello
> 192.168.1.1 server1

Thanks all for the input! I've been so busy that I totally forgot
mention that I figured it out after modifying Jon's solution and
playing around with some regex. I'm able to extract what I need at
this point, and it's very useful for extracting other data too. Thanks
all for the help!

First | Prev | Next | Last
Pages: 1 2 3
Prev: awk and getline
Next: awk "not contain"