grep improve performance [Shell]

Prev: AWK question - split string into variables
Next: find improvement performance

From: Sven Mascheck on 25 Apr 2010 15:02

Eric wrote:

> fgrep (equivalent to grep -F) is faster

Not necessarily. f doesn't mean "fast" but "fixed".
There are even slower implementations because they
use a different algorith which is not as optimized.

From: Shurik on 25 Apr 2010 15:35

On Apr 25, 4:52 pm, Ed Morton <mortons...(a)gmail.com> wrote:
> On 4/25/2010 8:16 AM, Shurik wrote:
>
>
>
>
>
> > On Apr 25, 2:50 pm, Eric<e...(a)deptj.eu> wrote:
> >> On 2010-04-25, Shurik<shurikgef...(a)gmail.com> wrote:
>
> >>> Hi
>
> >>> I have ksh script that execute many times grep command ( in loop ) on
> >>> the same file ( big file ~ 7K lines )
>
> >>> Can I improve the grep command ? Like try to load the file to memory ..
>
> >>> The grep command like below:
>
> >>> grep "$FILE_NAME" myfile | read A B
>
> >>> Thanks
>
> >> fgrep (equivalent to grep -F) is faster, but you'd still be running it a
> >> lot of times. Probably better to think of what your loop is really
> >> trying to achieve - I wouldn't be too surprised if it was a case for
> >> awk. Perl may also be a reasonable idea (it can keep the file in memory)
> >> but it's out of scope for this newsgroup.
>
> >> If you want any useful help from here I think you need to give us the
> >> whole loop.
>
> >> E.
>
> > The script is:
> > #!/bin/ksh
> > SourceFile=$1
> > TargetFile=$2
> > TargetHost=HP1
> > SourceHost=SUN2
>
> > exec 3<${SourceFile}
> > while read -u3 Line
> > do
> > OLD_IFS="${IFS}"
> > IFS="|"
> > echo "$Line" | read PERM SIZE File
>
> > grep "|${File}$" ${TargetFile} | read TargetPerm TargetSize
> > XXXX
> > IFS="${OLD_IFS}"
>
> > if [ "${TargetSize}" = "" ]
> > then
> > echo "${File} MISSING on ${TargetHost}"
> > elif [ ${SIZE} -ne ${TargetSize} ]
> > then
> > echo "${File} SIZE is different on $
> > {TargetHost}"
> > elif [ "${TargetPerm}" != "${PERM}" ]
> > then
> > echo "${File} PERMISSION is different on $
> > {TargetHost}"
> > fi
>
> > done
>
> > The target and source files contain 7K lines:
>
> > -rw-r--r--|214890729|./ACEXML/apps/svcconf/
> > ACEXML_XML_Svc_Conf_Parser.pc.in
> > -rw-r--r--|1370781355|./ACEXML/apps/svcconf/
> > ACEXML_XML_Svc_Conf_Parser.bor
> > -rw-r--r--|3618598382|./ACEXML/apps/svcconf/
> > ACEXML_XML_Svc_Conf_Parser_Static.vcproj
> > -rw-r--r--|983012176|./ACEXML/apps/svcconf/
> > ACEXML_XML_Svc_Conf_Parser.vcproj
>
> Try this (untested):
>
> awk -v targetHost="$TargetHost" -F'|' '
> NR==FNR { perm[$3]=$1; size[$3]=$2; next }
> $1 != perm[$3] { print $3,"PERMISSION is different on",targetHost }
> $2 != size[$3] { print $3,"SIZE is different on",targetHost }
> { delete perm[$3] }
> END { for (file in perm) print file,"MISSING on",targetHost }
> ' "$SourceFile" "$TargetFile"
>
> Regards,
>
> Ed.

Ed, thanks a lot !!!

Before your help

real 1m10.62s
user 0m57.35s
sys 0m17.33s

Now it's taken

real 0m0.10s
user 0m0.09s
sys 0m0.00s

From: Andrew McDermott on 26 Apr 2010 05:34

Stachu 'Dozzie' K. wrote:

> On 2010-04-25, Shurik <shurikgefter(a)gmail.com> wrote:
>>> > I have ksh script that execute many times grep command ( in loop ) on
>>> > the same file ( big file ~ 7K lines )
>>>
>>> > Can I improve the grep command ? Like try to load the file to memory .
> [...]
>
>> exec 3<${SourceFile}
>> while read -u3 Line
>> do
>> OLD_IFS="${IFS}"
>> IFS="|"
>> echo "$Line" | read PERM SIZE File
>>
>> grep "|${File}$" ${TargetFile} | read TargetPerm TargetSize
>> XXXX
>
> For each line from $SourceFile you're running grep just to read
> $TargetPerm and $TargetSize. Just give up and write it in a language
> that has kind hashmap: AWK, or possibly Perl.
>

ksh93 does allow associative arrays:

typset -A TargetPerms TargetSizes
exec 3<${TargetFile}
while read -u3 perm size file
do
TargetPerms[$file]=perm
TargetSizes[$file]=size
done

....

exec 3<${SourceFile}
while read -u3 perm size file
do
targetPerm=${TargetPerms[$file]}
targetSize=${TargetSizes[$file]}
# processing here
done

(IFS variable omitted)

But it occurs to me that if you need to change IFS you are using the wrong
language. If the awk solution works, use it.

Andrew

From: stan on 27 Apr 2010 19:31

Shurik wrote:
> Hi
>
> I have ksh script that execute many times grep command ( in loop ) on
> the same file ( big file ~ 7K lines )
>
> Can I improve the grep command ? Like try to load the file to memory .
>
> The grep command like below:
>
> grep "$FILE_NAME" myfile | read A B

I don't know what you native language or preferred locale is, but I"ve
noticed a huge improvement but telling grep I'm dealing with simple
non-unicode types. If you are satisfied with ascii try

LC_ALL=C grep "$FILE_NAME" myfile

When many systems defaulted to utf8 I noticed a big drop in grep speed
and forcing things back to a simpler time seems to comfort grep :)

First | Prev |
Pages: 1 2
Prev: AWK question - split string into variables
Next: find improvement performance