Prev: How to ls -lh on a FreeBSD?
Next: shell idiom to kick off background jobs and wait for completion
From: Grant on 25 Oct 2009 23:57 On Sun, 25 Oct 2009 22:45:29 -0500, Ed Morton <mortonspam(a)gmail.com> wrote: .... >> # load lookup table >> NR==FNR { >> start[++range] = ip2nr($1) # using one-based array > >Yes, but it'll store the titles in start[1] which is undesirable. That's >why I incremented range after the assignments. No, it's a one-based array so we can detect IP addr lower than the first database IP block start. See the working code (I think, hard to tell 'cos of the data text locale). Grant. -- http://bugsplatter.id.au
From: Grant on 26 Oct 2009 02:22 On Mon, 26 Oct 2009 13:35:08 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: >On Mon, 26 Oct 2009 15:08:37 +1100, Grant .... >Another issue: it takes 6 or 7 seconds for you, while 37 seconds for >me, to read 373375 or 373374 records, why? Because I tested it on the fastest Slackware box on localnet here, it has a Core2Duo CPU :) Okay, here we preprocess the data to numeric addresses, on a slow machine here data table load went from 39 seconds down to 16 seconds grant(a)deltree:/home/common$ cat yyy #!/usr/bin/gawk -f # # script to massage database file to speed loading # # run as ./yyy input_file > output_file # function ip2nr(ip, k) { # aaa.bbb.ccc.ddd split(ip, k, ".") return ((k[1] * 256 + k[2]) * 256 + k[3]) * 256 + k[4] } /^#/ { print; next } # print comment lines (header info) /^$/ { next } # skip blanks { # convert dotquad IP to numeric IP printf "%s\t%s\t%s\n", ip2nr($1), ip2nr($2), $3" "$4 } # end grant(a)deltree:/home/common$ head -5 datafile-nr # StartIP EndIP Country Local 0 16777215 IANA CZ88.NET 16777216 20185087 IANA CZ88.NET 20185088 20250623 ÃÀ¹ú CZ88.NET 20250624 26869759 IANA CZ88.NET grant(a)deltree:/home/common$ ./xxx2 datafile-nr file1 Read 373375 records in 16 seconds. 28.232.110.16 Ó¢¹ú ½£ÇÅ´óѧ 28.31.0.34 ÃÀ¹ú ÂíÈøÖîÈûÖÝÃ׵¶ûÈû¿Ë˹Ïؽ£ÇÅÊÐÂéÊ¡Àí¹¤Ñ§Ôº 28.6.224.103 ÃÀ¹ú ÂÞ¸ñ˹´óѧ 28.83.194.97 ÃÀ¹ú µÂ¿ËÈø˹´óѧ°Ä˹͡·ÖУ 28.83.194.98 ÃÀ¹ú µÂ¿ËÈø˹´óѧ°Ä˹͡·ÖУ 29.133.8.31 ÈðÊ¿ CZ88.NET 29.21.126.99 ÃÀ¹ú Rochester¿Æ¼¼Ñ§Ôº 129.25.11.27 ÃÀ¹ú Drexel .... 8.251.7.53 ÃÀ¹ú ÂíÈøÖîÈûÖÝÃ׵¶ûÈû¿Ë˹Ïؽ£ÇÅÊÐÂéÊ¡Àí¹¤Ñ§Ôº 95.251.249.86 Èðµä CZ88.NET 5.110.229.102 ¼ÓÄôó Î÷Ãɸ¥À×Ôó´óѧ grant(a)deltree:/home/common$ cat xxx2 #!/usr/bin/gawk -f # # script to process IP addr # # run as $0 ip-block-2-name-table IP-addr-file # BEGIN { FS = "\t" format = "%-20s %s\n" started = systime() } function nr2ip(nr, j, k) { for (j = 4; j > 0; j--) { k[j] = and(nr, 255); nr /= 256 } return sprintf("%d.%d.%d.%d", k[1], k[2], k[3], k[4]) } # show data read progress NR==FNR && NR % 357 == 0 { printf "\rReading %d", NR } # skip comment or blank lines /^#|^$/ { next } # load lookup table NR==FNR { # reading datafile with numeric IP block start + end addr start[++range] = $1 end[range] = $2 name[range] = $3" "$4 next } # show data records read NR!=FNR && FNR == 1 { printf "\rRead %d records in %d seconds.\n", NR - 1, \ systime() - started } # process IPs from second file { a = ip2nr($0); lo = 1; hi = range # binary search while (hi - lo > 1) { mid = int((lo + hi) / 2) if (start[mid] < a) { lo = mid } else { hi = mid } } # adjust to closest less than when no exact match (likely) if (a < start[hi]) { --hi } # skip if IP undefined if (a > end[hi]) { next } printf format, nr2ip(a), name[hi] } # end Now add datafile header lines starting with '#' so they're ignored, then you have self-documenting data files. Grant. -- http://bugsplatter.id.au
From: Hongyi Zhao on 26 Oct 2009 03:41 On Mon, 26 Oct 2009 17:22:58 +1100, Grant <g_r_a_n_t_(a)bugsplatter.id.au> wrote: >Because I tested it on the fastest Slackware box on localnet here, >it has a Core2Duo CPU :) But I use a Core i7 920 CPU with 6Gb memory to do the above test under cygwin :) > >Okay, here we preprocess the data to numeric addresses, on a slow >machine here data table load went from 39 seconds down to 16 seconds [snipped] Wonderful work. Thanks a lot. This is a key step towards the practical application for huge IP lookup files and IP addresses. Best regards. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.
From: Grant on 26 Oct 2009 03:58 On Mon, 26 Oct 2009 15:41:38 +0800, Hongyi Zhao <hongyi.zhao(a)gmail.com> wrote: >On Mon, 26 Oct 2009 17:22:58 +1100, Grant ><g_r_a_n_t_(a)bugsplatter.id.au> wrote: > >>Because I tested it on the fastest Slackware box on localnet here, >>it has a Core2Duo CPU :) > >But I use a Core i7 920 CPU with 6Gb memory to do the above test under >cygwin :) You need a real OS --> linux, unix, *BSD, lots to choose from. I'm on win7 desktop using PuTTY terminals to the linux boxes here -- > >> >>Okay, here we preprocess the data to numeric addresses, on a slow >>machine here data table load went from 39 seconds down to 16 seconds > >[snipped] > >Wonderful work. Thanks a lot. This is a key step towards the >practical application for huge IP lookup files and IP addresses. Not under cygwin :) Grant. -- http://bugsplatter.id.au
From: Hongyi Zhao on 26 Oct 2009 05:09
On Mon, 26 Oct 2009 18:58:11 +1100, Grant <g_r_a_n_t_(a)bugsplatter.id.au> wrote: >Not under cygwin :) Thanks for your suggestion. Another issue: in the private mail I've sent to you, I told you that the ip lookup table used here is extracted from the following binary file: http://update.cz88.net/soft/qqwry.rar In the bottom of the following url, there is a demo code to read IP informations from the above binary file (this webpage is in Chinese, but of course the demo code itself is in English): http://lumaqq.linuxsir.org/article/qqwry_format_detail.html So I want to konw is it possible to read the above binary file directly to note the specific IP addresses. Thanks in advance. -- ..: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :. |