Prev: How to ls -lh on a FreeBSD?
Next: shell idiom to kick off background jobs and wait for completion
From: Hongyi Zhao on 25 Oct 2009 22:16 On Mon, 26 Oct 2009 03:25:57 +1100, Grant <g_r_a_n_t_(a)bugsplatter.id.au> wrote: > - - - > # binary search to suit Ed's script: > > lo = 1; hi = range > while (hi - lo > 1) { > mid = int((lo + hi) / 2) > if (start[mid] < curr) { lo = mid } else { hi = mid } > } > if (curr < start[hi]) --hi > > if (curr > end[hi]) { notFound[$0] } else { print $0,name[idx] } > - - - I replace your above code snip in Ed's script and get the following one: BEGIN{ FS="\t"; OFS="#"; scale=256 } function ip2nr(ip, nr,ipA) { # aaa.bbb.ccc.ddd split(ip,ipA,".") nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) + ipA[4] return nr } NR==FNR { start[range] = ip2nr($1) end[range] = ip2nr($2) name[range] = $3" "$4 range++ next } { curr = ip2nr($0) lo = 1; hi = range while (hi - lo > 1) { mid = int((lo + hi) / 2) if (start[mid] < curr) { lo = mid } else { hi = mid } } if (curr < start[hi]) --hi if (curr > end[hi]) { notFound[$0] } else { print $0,name[idx] } } notFound[$0] } END { for (ip in notFound) { print ip,"Not Found." | "cat>&2" } } But, run the above script will give me the the following errors: awk: whatever.awk:27: } awk: whatever.awk:27: ^ syntax error Best regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Hongyi Zhao on 25 Oct 2009 22:26 On Sun, 25 Oct 2009 09:18:46 -0500, Ed Morton <mortonspam(a)gmail.com> wrote: >Don't know about multi-threading, but if your file of IP addresses is >huge and your file of ranges and names isn't then this should be a lot >more efficient: In fact, the file of ranges and names in my case is more huge than the file of IP addresses. Best regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Hongyi Zhao on 25 Oct 2009 22:39 On Sun, 25 Oct 2009 09:40:46 -0500, Ed Morton <mortonspam(a)gmail.com> wrote: >> BEGIN{ FS="\t"; OFS="#"; scale=256 } >> function ip2nr(ip, nr,ipA) { >> # aaa.bbb.ccc.ddd >> split(ip,ipA,".") >> nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) + >> ipA[4] >> return nr >> } >> NR==FNR { >> start[FNR] = ip2nr($1) >> end[FNR] = ip2nr($2) >> name[FNR] = $3" "$4 >> range = FNR > >Change the above 4 lines to: > > start[range] = ip2nr($1) > end[range] = ip2nr($2) > name[range] = $3" "$4 > range++ > >to account for the first line of file2 containing a header rather than data. > >> next >> } >> { >> curr = ip2nr($0) >> for (idx=1; idx<=range; idx++) >> { >> if ((curr >= start[idx]) && (curr <= end[idx])) >> { >> print $0,name[idx] >> next >> } >> } >> notFound[$0] >> } >> END { >> for (ip in notFound) { >> print ip,"Not Found." | "cat>&2" >> } >> } >> >> Run it as: >> >> awk -f whatever.awk file2 file1 I've tested it, in my test, the file2 include 36490 IP non-overlap IP ranges and the file1 include 1627 lines IP addresses. The result shows that the above script will spend more time than your original one. Best regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Grant on 25 Oct 2009 23:01 On Mon, 26 Oct 2009 10:16:22 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: >On Mon, 26 Oct 2009 03:25:57 +1100, Grant ><g_r_a_n_t_(a)bugsplatter.id.au> wrote: > >> - - - >> # binary search to suit Ed's script: >> >> lo = 1; hi = range >> while (hi - lo > 1) { >> mid = int((lo + hi) / 2) >> if (start[mid] < curr) { lo = mid } else { hi = mid } >> } >> if (curr < start[hi]) --hi >> >> if (curr > end[hi]) { notFound[$0] } else { print $0,name[idx] } >> - - - > > >I replace your above code snip in Ed's script and get the following >one: > >BEGIN{ FS="\t"; OFS="#"; scale=256 } >function ip2nr(ip, nr,ipA) { > # aaa.bbb.ccc.ddd > split(ip,ipA,".") > nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) + >ipA[4] > return nr >} >NR==FNR { > start[range] = ip2nr($1) > end[range] = ip2nr($2) > name[range] = $3" "$4 > range++ > next >} >{ > curr = ip2nr($0) > lo = 1; hi = range > while (hi - lo > 1) { > mid = int((lo + hi) / 2) > if (start[mid] < curr) { lo = mid } else { hi = mid } > } > if (curr < start[hi]) --hi > > if (curr > end[hi]) { notFound[$0] } else { print $0,name[idx] kill this one here? --> } > } > notFound[$0] >} >END { > for (ip in notFound) { > print ip,"Not Found." | "cat>&2" > } >} > >But, run the above script will give me the the following errors: > >awk: whatever.awk:27: } >awk: whatever.awk:27: ^ syntax error Oh come now, you got to play a bit :) And take a bit more care with your indentaing so you can see the blocks. Alright, I rewrote it. BEGIN { # FS = "\t" # this is whitespace, unnecessary? OFS = "#" # is this used? } function ip2nr(ip, k) # dotquad IP to number { # aaa.bbb.ccc.ddd split(ip, k, ".") return ((k[1] * 256 + k[2]) * 256 + k[3]) * 256 + k[4] } # load lookup table NR==FNR { start[++range] = ip2nr($1) # using one-based array end[range] = ip2nr($2) name[range] = $3" "$4 next } # process IPs { addr = ip2nr($0); lo = 1; hi = range # binary search while (hi - lo > 1) { mid = int((lo + hi) / 2) if (start[mid] < addr) { lo = mid } else { hi = mid } } # adjust to closest less than when no match (likely) if (addr < start[hi]) --hi if (addr > end[hi]) { # if addr past end of this block we found undefined IP print $0, "Not Found" | "cat>&2" } else { # we found a match print $0, name[hi] } } # end Untested but ported from working code. Grant. -- http://bugsplatter.id.au
From: Ed Morton on 25 Oct 2009 23:45
Grant wrote: > On Mon, 26 Oct 2009 10:16:22 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: > >> On Mon, 26 Oct 2009 03:25:57 +1100, Grant >> <g_r_a_n_t_(a)bugsplatter.id.au> wrote: >> >>> - - - >>> # binary search to suit Ed's script: >>> >>> lo = 1; hi = range >>> while (hi - lo > 1) { >>> mid = int((lo + hi) / 2) >>> if (start[mid] < curr) { lo = mid } else { hi = mid } >>> } >>> if (curr < start[hi]) --hi >>> >>> if (curr > end[hi]) { notFound[$0] } else { print $0,name[idx] } >>> - - - >> >> I replace your above code snip in Ed's script and get the following >> one: >> >> BEGIN{ FS="\t"; OFS="#"; scale=256 } >> function ip2nr(ip, nr,ipA) { >> # aaa.bbb.ccc.ddd >> split(ip,ipA,".") >> nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) + >> ipA[4] >> return nr >> } >> NR==FNR { >> start[range] = ip2nr($1) >> end[range] = ip2nr($2) >> name[range] = $3" "$4 >> range++ >> next >> } >> { >> curr = ip2nr($0) >> lo = 1; hi = range >> while (hi - lo > 1) { >> mid = int((lo + hi) / 2) >> if (start[mid] < curr) { lo = mid } else { hi = mid } >> } >> if (curr < start[hi]) --hi >> >> if (curr > end[hi]) { notFound[$0] } else { print $0,name[idx] > > kill this one here? --> } > >> } >> notFound[$0] >> } >> END { >> for (ip in notFound) { >> print ip,"Not Found." | "cat>&2" >> } >> } >> >> But, run the above script will give me the the following errors: >> >> awk: whatever.awk:27: } >> awk: whatever.awk:27: ^ syntax error > > Oh come now, you got to play a bit :) And take a bit more care > with your indentaing so you can see the blocks. > > Alright, I rewrote it. > > BEGIN { > # FS = "\t" # this is whitespace, unnecessary? Yes, it's necessary since the names contain non-tab whitespace. - Note 1. > OFS = "#" # is this used? Yes, OP wants # separating output fields, not a space. - Note 2. > } > > function ip2nr(ip, k) # dotquad IP to number > { > # aaa.bbb.ccc.ddd > split(ip, k, ".") > return ((k[1] * 256 + k[2]) * 256 + k[3]) * 256 + k[4] > } > # load lookup table > NR==FNR { > start[++range] = ip2nr($1) # using one-based array Yes, but it'll store the titles in start[1] which is undesirable. That's why I incremented range after the assignments. > end[range] = ip2nr($2) > name[range] = $3" "$4 See Note 1 above. > next > } > # process IPs > { > addr = ip2nr($0); lo = 1; hi = range > > # binary search > while (hi - lo > 1) { > mid = int((lo + hi) / 2) > if (start[mid] < addr) { > lo = mid > } > else { > hi = mid > } > } > > # adjust to closest less than when no match (likely) > if (addr < start[hi]) > --hi > > if (addr > end[hi]) { > # if addr past end of this block we found undefined IP > print $0, "Not Found" | "cat>&2" > } > else { > # we found a match > print $0, name[hi] See Note 2 above. Ed. > } > } > # end > > Untested but ported from working code. > > Grant. |