Prev: Can a shell script determine if its output will be redirected?
Next: xgawk script for display xml tags and data
From: Hongyi Zhao on 27 Mar 2010 05:20 Hi all, I've the following file which includes three fields in each line: "IP_ADDRESS" "ISP_NAME" "DOMAIN_NAME" "109.86.226.38" "-" "-" "117.18.75.235" "SUNNYVISION LIMITED" "SUNNYVISIONDATACENTRE.COM" "119.11.13.169" "-" "-" "119.11.42.164" "-" "-" "121.44.240.31" "INTERNET SERVICE PROVIDER" "ON.NET" "122.155.3.145" "CAT TELECOM PUBLIC COMPANY LTD" "-" "140.109.17.180" "T-SINICA.EDU.TW-NET" "-" "145.100.100.190" "UVA-MASTER-SNE-NET" "-" "149.9.0.57" "PSI" "BNA.COM" "149.9.0.58" "PSI" "BNA.COM" "149.9.0.59" "PSI" "BNA.COM" "151.15.8.46" "ITALIA ONLINE S.P.A" "15-151.IOL.IT" "151.16.191.218" "IUNET-BNET" "38-151.NET24.IT" "151.21.86.208" "FREE INTERNET DIAL-UP SERVICES" "21-151.LIBERO.IT" "151.23.7.196" "ITALIA ONLINE S.P.A" "PPP-POOL-23-0-10.IOL.IT" "151.48.43.174" "IUNET-BNET" "48-151.NET24.IT" "151.53.80.237" "IUNET-BNET" "38-151.NET24.IT" "151.54.214.97" "IUNET-BNET" "38-151.NET24.IT" Now, I want to delete some records from this file based on "ISP_NAME" or "DOMAIN_NAME". I describe the details of my requirements as follows: 1- If a record's "ISP_NAME" and "DOMAIN_NAME" fields are "-", delete it from the file. 2- Based on the given IP_ADDRESS, say, 151.48.43.174, delete the records which have the same "ISP_NAME" or "DOMAIN_NAME" as it has. In this case, the following records should be deleted from the file: "151.48.43.174" "IUNET-BNET" "48-151.NET24.IT" "151.53.80.237" "IUNET-BNET" "38-151.NET24.IT" "151.54.214.97" "IUNET-BNET" "38-151.NET24.IT" Any hints on this issue? BR. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Janis Papanagnou on 27 Mar 2010 05:46 Hongyi Zhao wrote: > Hi all, > > I've the following file which includes three fields in each line: > > "IP_ADDRESS" "ISP_NAME" "DOMAIN_NAME" > "109.86.226.38" "-" "-" > "117.18.75.235" "SUNNYVISION LIMITED" "SUNNYVISIONDATACENTRE.COM" > "119.11.13.169" "-" "-" > "119.11.42.164" "-" "-" > "121.44.240.31" "INTERNET SERVICE PROVIDER" "ON.NET" > "122.155.3.145" "CAT TELECOM PUBLIC COMPANY LTD" "-" > "140.109.17.180" "T-SINICA.EDU.TW-NET" "-" > "145.100.100.190" "UVA-MASTER-SNE-NET" "-" > "149.9.0.57" "PSI" "BNA.COM" > "149.9.0.58" "PSI" "BNA.COM" > "149.9.0.59" "PSI" "BNA.COM" > "151.15.8.46" "ITALIA ONLINE S.P.A" "15-151.IOL.IT" > "151.16.191.218" "IUNET-BNET" "38-151.NET24.IT" > "151.21.86.208" "FREE INTERNET DIAL-UP SERVICES" "21-151.LIBERO.IT" > "151.23.7.196" "ITALIA ONLINE S.P.A" "PPP-POOL-23-0-10.IOL.IT" > "151.48.43.174" "IUNET-BNET" "48-151.NET24.IT" > "151.53.80.237" "IUNET-BNET" "38-151.NET24.IT" > "151.54.214.97" "IUNET-BNET" "38-151.NET24.IT" > > Now, I want to delete some records from this file based on "ISP_NAME" > or "DOMAIN_NAME". I describe the details of my requirements as > follows: > > 1- If a record's "ISP_NAME" and "DOMAIN_NAME" fields are "-", delete > it from the file. awk '$NF !~ /"-"/ || $(NF-1) !~ /"-"/' your_file > > 2- Based on the given IP_ADDRESS, say, 151.48.43.174, delete the > records which have the same "ISP_NAME" or "DOMAIN_NAME" as it has. In > this case, the following records should be deleted from the file: This requires to operate twice on the data, first to find the respective name and then to remove all the addresses. For the first task[*]... awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file For the second task... awk -v ispname=... '$2 !~ ispname' your_file You can combine those commands, e.g. using command substitution where the variable ispname is set. Janis [*] This is, strictly speaking, not correct since the dots match any character in the first field, but your data seems to allow for that simplification. > > "151.48.43.174" "IUNET-BNET" "48-151.NET24.IT" > "151.53.80.237" "IUNET-BNET" "38-151.NET24.IT" > "151.54.214.97" "IUNET-BNET" "38-151.NET24.IT" > > Any hints on this issue? > > BR.
From: Hongyi Zhao on 27 Mar 2010 06:49 On Sat, 27 Mar 2010 10:46:32 +0100, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: >This requires to operate twice on the data, first to find the respective >name and then to remove all the addresses. > >For the first task[*]... > > awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file > >For the second task... > > awk -v ispname=... '$2 !~ ispname' your_file > >You can combine those commands, e.g. using command substitution where >the variable ispname is set. > >Janis > >[*] This is, strictly speaking, not correct since the dots match any >character in the first field, but your data seems to allow for that >simplification. Good, thanks a lot. But we must consider the following case: If the record corresponding to the given IP has the following characteristics: one of these two fields, i.e. "ISP_NAME" or "DOMAIN_NAME" has the value: "-", then the second task will be more dangerous because based on the value: "-", we may remove some records that should not be deleted. In order to solve this issue, I put the following requirement additionally: If the "ISP_NAME" of the record corresponding to the given IP has the value: "-", use the "DOMAIN_NAME" as the matching conditions to do the deteting operations, and vice versa. So how should the code be touched up? BR. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Janis Papanagnou on 27 Mar 2010 07:49 Hongyi Zhao wrote: > On Sat, 27 Mar 2010 10:46:32 +0100, Janis Papanagnou > <janis_papanagnou(a)hotmail.com> wrote: > >> This requires to operate twice on the data, first to find the respective >> name and then to remove all the addresses. >> >> For the first task[*]... >> >> awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file >> >> For the second task... >> >> awk -v ispname=... '$2 !~ ispname' your_file I just noticed that the ISP_NAME can contain spaces, so the suggested solution wouldn't work. Sorry. To fix that you can re-define the FS in awk as " " (i.e. as three characters quote, space, quote), and remove the quotes from the search pattern if you're comparing field 2. >> >> You can combine those commands, e.g. using command substitution where >> the variable ispname is set. >> >> Janis >> >> [*] This is, strictly speaking, not correct since the dots match any >> character in the first field, but your data seems to allow for that >> simplification. > > Good, thanks a lot. > > But we must consider the following case: > > If the record corresponding to the given IP has the following > characteristics: one of these two fields, i.e. "ISP_NAME" or > "DOMAIN_NAME" has the value: "-", then the second task will be more > dangerous because based on the value: "-", we may remove some records > that should not be deleted. This would just require an additional condition; make sure $2 is not "-". > > In order to solve this issue, I put the following requirement > additionally: > > If the "ISP_NAME" of the record corresponding to the given IP has the > value: "-", use the "DOMAIN_NAME" as the matching conditions to do the > deteting operations, and vice versa. In the code awk -v ipaddr="151.48.43.174" '$1 ~ ipaddr {print $2}' your_file you can print conditionally print (($2 !~ /"-"/) ? $2 : $3) but you need both values (or some discriminator). So see below... > > So how should the code be touched up? Return both values in the "first task" awk -v ipaddr= ... '$1 ~ ipaddr {print $2 SEP $3}' with an appropriatly choosen value for SEP, and adjust the condition for the "second task" awk -v isp_and_dom=... ' BEGIN { split(isp_and_dom,iad,SEP) } ($2 !~ /"-"/ && $2 !~ iad[1]) || ($2 ~ /"-"/ && $3 !~ iad[2]) ' your_file This requires to consider the space problem above and must be adjusted accordingly, as mentioned. Note that the code could be further simplified and it is of course untested. Janis > > BR.
From: Hongyi Zhao on 27 Mar 2010 22:35 On Sat, 27 Mar 2010 12:49:03 +0100, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: >I just noticed that the ISP_NAME can contain spaces, so the suggested >solution wouldn't work. Sorry. To fix that you can re-define the FS in >awk as " " (i.e. as three characters quote, space, quote), and remove >the quotes from the search pattern if you're comparing field 2. What about just use the quote as field separator and use $2,$4,$6 to catch the correponding three fields? $ echo '"121.44.240.31" "INTERNET SERVICE PROVIDER" "ON.NET"' | awk -F'"' '{print $2,$4,$6 }' 121.44.240.31 INTERNET SERVICE PROVIDER ON.NET -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
|
Next
|
Last
Pages: 1 2 3 Prev: Can a shell script determine if its output will be redirected? Next: xgawk script for display xml tags and data |