Prev: awk and getline
Next: awk "not contain"
From: Lao Ming on 18 May 2010 18:00 On May 17, 8:35 pm, Jon LaBadie <jlaba...(a)aXcXm.org> wrote: > jplee3 wrote: > > Hi all, > > > I was wondering if there's a way to match more than one string in a > > line and then print/output the strings to the screen or a file, etc. > > > I'm trying to achieve something similar to "grep -o "matching_string" > > file123" but with more than one string, and where each matching string > > is printed subsequently in a line. > > > So I would want to match "matching_string1" and "matching_string2" > > from one line and print both those matches on a single line. > > Effectively, cutting out the text I don't want and printing out only > > what I want from a line. > > > Is this even possible with grep or awk? > > sed -n 's/.*\(matching_string1\).*\(matching_string2\).*/\1\2/p' input_file This is kind of a long-sought after command for me but it outputs the two strings as: matching_string1matching_string2 which would be difficult to use without a delimiter. I tried to do this myself but got "garbled" errors. What would one do make the output delimited? What if one or both of the matching_strings were variables? How would that be implemented? Thanks.
From: Janis Papanagnou on 18 May 2010 19:49 Lao Ming wrote: > On May 17, 8:35 pm, Jon LaBadie <jlaba...(a)aXcXm.org> wrote: >> jplee3 wrote: >>> Hi all, >>> I was wondering if there's a way to match more than one string in a >>> line and then print/output the strings to the screen or a file, etc. >>> I'm trying to achieve something similar to "grep -o "matching_string" >>> file123" but with more than one string, and where each matching string >>> is printed subsequently in a line. >>> So I would want to match "matching_string1" and "matching_string2" >>> from one line and print both those matches on a single line. >>> Effectively, cutting out the text I don't want and printing out only >>> what I want from a line. >>> Is this even possible with grep or awk? >> sed -n 's/.*\(matching_string1\).*\(matching_string2\).*/\1\2/p' input_file > > This is kind of a long-sought after command for me but it outputs the > two strings as: > > matching_string1matching_string2 > > which would be difficult to use without a delimiter. I tried to do > this myself but got "garbled" errors. > What would one do make the output delimited? sed -n 's/.*\(matching_string1\).*\(matching_string2\).*/\1 \2/p' > What if one or both of > the matching_strings were variables? How would that be implemented? var1=matching_string1 var2=matching_string2 sed -n "s/.*\($var1\).*\($var2\).*/\1 \2/p" But keep in mind that Jon's basic solution (and the derived ones that I posted) require strict ordering of string1 and string2; the proposed way wouldn't work with lines where string2 comes before string1 in a line. Janis > > Thanks.
From: Thomas 'PointedEars' Lahn on 18 May 2010 21:44 Janis Papanagnou wrote: > Lao Ming wrote: >> What if one or both of the matching_strings were variables? How would >> that be implemented? > > var1=matching_string1 > var2=matching_string2 > sed -n "s/.*\($var1\).*\($var2\).*/\1 \2/p" > > But keep in mind that Jon's basic solution (and the derived ones that I > posted) require strict ordering of string1 and string2; the proposed way > wouldn't work with lines where string2 comes before string1 in a line. But that can be easily remedied: var1='matching_string1' var2='matching_string2' # POSIX-compliant sed -ne "s/.*\($var1\).*\($var2\).*/\1 \2/p" \ -e "s/.*\($var2\).*\($var1\).*/\1 \2/p" ... # or sed -n "s/.*\($var1\).*\($var2\).*/\1 \2/p; s/.*\($var2\).*\($var1\).*/\1 \2/p" ... # GNU sed sed -n "s/.*\(\($var1\).*\($var2\)\|\($var2\).*\($var1\)\).*/\2\4 \3\5/p"\ ... PointedEars
From: Michael Paoli on 20 May 2010 12:50 On May 18, 10:12am, jplee3 <jplee3(a)gmail.com> wrote: > On May 18, 10:00am, jplee3 <jpl...(a)gmail.com> wrote: > > On May 17, 9:31pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > On 5/17/2010 8:05 PM, jplee3 wrote: > > > > I was wondering if there's a way to match more than one string in a > > > > line and then print/output the strings to the screen or a file, etc. > > > Do you mean find a string and print that string or do you mean find a regular > > > expression and print the string that matches it? > > > > I'm trying to achieve something similar to "grep -o "matching_string" > > > > file123" but with more than one string, and where each matching string > > > > is printed subsequently in a line. > > > > So I would want to match "matching_string1" and "matching_string2" > > > > from one line and print both those matches on a single line. > > > > Effectively, cutting out the text I don't want and printing out only > > > > what I want from a line. > > > > Is this even possible with grep or awk? > > > Yes, but provide some small sample input and the expected output given that > > > input so we're not guessing too much. > > Sorry for the confusion. It is in fact a regex string that I would be > > looking for. > > For instance: > > say the original line is: "<ID>=apache_server1_12345678, > > <ID2>=blahblahblah, <ID3>=FUBAR=apache_server2_12345678| > > IPADDRESS=192.168.1.1|MSG=hello" > > there are hundreds of lines like these... > > I want to extract so that I will get the following result from each > > line (a list of IPs and hostnames): "192.168.1.1 > > apache_server1_12345678" > > Thanks guys! > Sorry, the formatting is screwed up on this one: I would want the list > to be in this format "192.168.1.1 server1" - where there's a space or > tab delimiter (not a carriage return/newline). Also, I'd want to > extract "server1" from "apache_server1_12345678" > Jon LaBadie's (thanks Jon!) sed command partially worked, but there's > no delimiter between the server name and IP. Also, I'm not 100% sure > what the regex would look like. For the IP address, I was using > "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" for "grep -o" > but not sure if that will differ for what I'm trying to do. I'm not > sure what the regex for extracting the server hostname would be > either. sed -ne 's/^.*<ID>=apache_\([^_][^_]*\).* IPADDRESS=\([.0-9][.0-9]*\).* $/\2 \1/p' << \__EOT__ <ID>=apache_server1_12345678, <ID2>=blahblahblah, <ID3>=FUBAR=apache_server2_12345678| IPADDRESS=192.168.1.1|MSG=hello 192.168.1.1 server1
From: jplee3 on 20 May 2010 17:40
On May 20, 9:50 am, Michael Paoli <michael1...(a)yahoo.com> wrote: > On May 18, 10:12am, jplee3 <jpl...(a)gmail.com> wrote: > > > On May 18, 10:00am, jplee3 <jpl...(a)gmail.com> wrote: > > > On May 17, 9:31pm, Ed Morton <mortons...(a)gmail.com> wrote: > > > > On 5/17/2010 8:05 PM, jplee3 wrote: > > > > > I was wondering if there's a way to match more than one string in a > > > > > line and then print/output the strings to the screen or a file, etc. > > > > Do you mean find a string and print that string or do you mean find a regular > > > > expression and print the string that matches it? > > > > > I'm trying to achieve something similar to "grep -o "matching_string" > > > > > file123" but with more than one string, and where each matching string > > > > > is printed subsequently in a line. > > > > > So I would want to match "matching_string1" and "matching_string2" > > > > > from one line and print both those matches on a single line. > > > > > Effectively, cutting out the text I don't want and printing out only > > > > > what I want from a line. > > > > > Is this even possible with grep or awk? > > > > Yes, but provide some small sample input and the expected output given that > > > > input so we're not guessing too much. > > > Sorry for the confusion. It is in fact a regex string that I would be > > > looking for. > > > For instance: > > > say the original line is: "<ID>=apache_server1_12345678, > > > <ID2>=blahblahblah, <ID3>=FUBAR=apache_server2_12345678| > > > IPADDRESS=192.168.1.1|MSG=hello" > > > there are hundreds of lines like these... > > > I want to extract so that I will get the following result from each > > > line (a list of IPs and hostnames): "192.168.1.1 > > > apache_server1_12345678" > > > Thanks guys! > > Sorry, the formatting is screwed up on this one: I would want the list > > to be in this format "192.168.1.1 server1" - where there's a space or > > tab delimiter (not a carriage return/newline). Also, I'd want to > > extract "server1" from "apache_server1_12345678" > > Jon LaBadie's (thanks Jon!) sed command partially worked, but there's > > no delimiter between the server name and IP. Also, I'm not 100% sure > > what the regex would look like. For the IP address, I was using > > "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" for "grep -o" > > but not sure if that will differ for what I'm trying to do. I'm not > > sure what the regex for extracting the server hostname would be > > either. > > sed -ne 's/^.*<ID>=apache_\([^_][^_]*\).* IPADDRESS=\([.0-9][.0-9]*\)..* > $/\2 \1/p' << \__EOT__ > <ID>=apache_server1_12345678, <ID2>=blahblahblah, > <ID3>=FUBAR=apache_server2_12345678| IPADDRESS=192.168.1.1|MSG=hello > 192.168.1.1 server1 Thanks all for the input! I've been so busy that I totally forgot mention that I figured it out after modifying Jon's solution and playing around with some regex. I'm able to extract what I need at this point, and it's very useful for extracting other data too. Thanks all for the help! |