Prev: How to ls -lh on a FreeBSD?
Next: shell idiom to kick off background jobs and wait for completion
From: Grant on 25 Oct 2009 00:57 On Sun, 25 Oct 2009 09:56:52 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: >On Wed, 21 Oct 2009 02:47:54 -0500, Ed Morton <mortonspam(a)gmail.com> >wrote: > >>$ cat tst.awk >>BEGIN{ FS="\t"; OFS="#"; scale=(scale ? scale : 256) } >>function ip2nr(ip, nr,ipA) { >> # aaa.bbb.ccc.ddd >> split(ip,ipA,".") >> nr = (((((ipA[1] * scale) + ipA[2]) * scale) + ipA[3]) * scale) + >>ipA[4] >> return nr >>} >>NR==FNR { addrs[$0] = ip2nr($0); next } >>FNR>1 { >> start = ip2nr($1) >> end = ip2nr($2) >> for (ip in addrs) { >> if ((addrs[ip] >= start) && (addrs[ip] <= end)) { >> print ip,$3" "$4 >> delete addrs[ip] >> } >> } >>} > >Another issue is that: if the file1 is a huge one, say including >several thousands entries in it, the above process will be time >consuming. So >is it possible to revise the above awk script with multithread >technology >to improve the efficiency? I already posted a more efficient method, read database file into memory, then binary search each IP to find matching block and retrieve the database name string. Grant. -- http://bugsplatter.id.au
From: Kaz Kylheku on 25 Oct 2009 01:04 On 2009-10-25, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: > Another issue is that: if the file1 is a huge one, say including > several thousands entries in it, the above process will be time > consuming. So > is it possible to revise the above awk script with multithread > technology > to improve the efficiency? What kind of machine are you using that a file of a mere several thousand entries causes a performance problem, yet threads somehow help?
From: Grant on 25 Oct 2009 01:38 On Sun, 25 Oct 2009 05:04:43 +0000 (UTC), Kaz Kylheku <kkylheku(a)gmail.com> wrote: >On 2009-10-25, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: >> Another issue is that: if the file1 is a huge one, say including >> several thousands entries in it, the above process will be time >> consuming. So >> is it possible to revise the above awk script with multithread >> technology >> to improve the efficiency? > >What kind of machine are you using that a file of a mere several thousand >entries causes a performance problem, yet threads somehow help? Dunno about threads being applicable, but his lookup file has 300k records -- no point re-reading that for each of the thousand input lines, which is what Ed's solution does. Grant. -- http://bugsplatter.id.au
From: Hongyi Zhao on 25 Oct 2009 03:34 On Sun, 25 Oct 2009 15:57:47 +1100, Grant <g_r_a_n_t_(a)bugsplatter.id.au> wrote: >I already posted a more efficient method, read database file into >memory, then binary search each IP to find matching block and >retrieve the database name string. Where have you posted, on your personal webpage or blog? Any hints on the corresponding url? Best regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Hongyi Zhao on 25 Oct 2009 03:36
On Sun, 25 Oct 2009 05:04:43 +0000 (UTC), Kaz Kylheku <kkylheku(a)gmail.com> wrote: >What kind of machine are you using that a file of a mere several thousand >entries causes a performance problem, yet threads somehow help? The lookup IP database used by on is also a huge one (including 373374 line in it). Furthermore, I use this script under cygwin box. Best regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :. |