Prev: grep improve performance
Next: alternative for ctime
From: Shurik on 25 Apr 2010 15:46 Hi I have the below command ( It's run on HP/Sun/AIX servers ) cd ${DIRECTORY_TO_CHECK} find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A= $1;B=$9;getline;C=$1; print A"|"C"|"B}' It's prepare the file with the following structure: h -rw-r--r--|214890729|./ACEXML/appans/svcconf/ ACEXML_XML_Svc_Conf_Parser.pc.in -rw-r--r--|1370781355|./ACEXML/apps/svcconf/ ACEXML_XML_Svc_Conf_Parser.bor -rw-r--r--|3618598382|./ACEXML/apps/svcconf/ ACEXML_XML_Svc_Conf_Parser_Static.vcproj The find takes a lot of time, can I change something in order to improve performance?
From: Stephane CHAZELAS on 25 Apr 2010 16:09 2010-04-25, 12:46(-07), Shurik: > Hi > > I have the below command ( It's run on HP/Sun/AIX servers ) > > cd ${DIRECTORY_TO_CHECK} > > find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name > "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name > "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A= > $1;B=$9;getline;C=$1; print A"|"C"|"B}' > > It's prepare the file with the following structure: > h > -rw-r--r--|214890729|./ACEXML/appans/svcconf/ > ACEXML_XML_Svc_Conf_Parser.pc.in > -rw-r--r--|1370781355|./ACEXML/apps/svcconf/ > ACEXML_XML_Svc_Conf_Parser.bor > -rw-r--r--|3618598382|./ACEXML/apps/svcconf/ > ACEXML_XML_Svc_Conf_Parser_Static.vcproj > > The find takes a lot of time, can I change something in order to > improve performance? What is taking time is executing 2 commands per file. mkfifo ls.fifo find -L . -type f ! -name "*.css " ! -name "*.jpg" ! -name \ "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name \ "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name \ "*.html" -perm -o+r -exec sh -c ' ls -lLd "$@" >&3 & cksum "$@" & wait' sh {} + 3> ls.fifo | paste ls.fifo - | awk -vOFS='|' '{print $1,$10,$9}' (note that will -L (formerly -follow), -type f returns true for symlinks to regular files (unless that file has already been accounted for), and cksum most probably does the checksum of the pointed file, so you probably want the -L option to ls (without which it won't work anyway because of the -> ... extra fields in ls output). Also, that solution won't work for filenames containing blanks or newline characters) -- Stéphane
From: Shurik on 26 Apr 2010 08:34 On Apr 25, 11:09 pm, Stephane CHAZELAS <stephane_chaze...(a)yahoo.fr> wrote: > 2010-04-25, 12:46(-07), Shurik: > > > > > > > Hi > > > I have the below command ( It's run on HP/Sun/AIX servers ) > > > cd ${DIRECTORY_TO_CHECK} > > > find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name > > "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name > > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name > > "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A= > > $1;B=$9;getline;C=$1; print A"|"C"|"B}' > > > It's prepare the file with the following structure: > > h > > -rw-r--r--|214890729|./ACEXML/appans/svcconf/ > > ACEXML_XML_Svc_Conf_Parser.pc.in > > -rw-r--r--|1370781355|./ACEXML/apps/svcconf/ > > ACEXML_XML_Svc_Conf_Parser.bor > > -rw-r--r--|3618598382|./ACEXML/apps/svcconf/ > > ACEXML_XML_Svc_Conf_Parser_Static.vcproj > > > The find takes a lot of time, can I change something in order to > > improve performance? > > What is taking time is executing 2 commands per file. > > mkfifo ls.fifo > find -L . -type f ! -name "*.css " ! -name "*.jpg" ! -name \ > "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name \ > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name \ > "*.html" -perm -o+r -exec sh -c ' > ls -lLd "$@" >&3 & cksum "$@" & wait' sh {} + 3> ls.fifo | > paste ls.fifo - | awk -vOFS='|' '{print $1,$10,$9}' > > (note that will -L (formerly -follow), -type f returns true for > symlinks to regular files (unless that file has already been > accounted for), and cksum most probably does the checksum of the > pointed file, so you probably want the -L option to ls (without > which it won't work anyway because of the -> ... extra fields in > ls output). Also, that solution won't work for filenames > containing blanks or newline characters) > > -- > Stéphane Stephane, thanks a lot, but I didn't get any output from your command : (
From: Shurik on 26 Apr 2010 16:25 On Apr 26, 3:34 pm, Shurik <shurikgef...(a)gmail.com> wrote: > On Apr 25, 11:09 pm, Stephane CHAZELAS <stephane_chaze...(a)yahoo.fr> > wrote: > > > > > > > 2010-04-25, 12:46(-07), Shurik: > > > > Hi > > > > I have the below command ( It's run on HP/Sun/AIX servers ) > > > > cd ${DIRECTORY_TO_CHECK} > > > > find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name > > > "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name > > > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name > > > "*.html" -perm -o+r -exec ls -la {} \; -exec cksum {} \; | awk '{ A= > > > $1;B=$9;getline;C=$1; print A"|"C"|"B}' > > > > It's prepare the file with the following structure: > > > h > > > -rw-r--r--|214890729|./ACEXML/appans/svcconf/ > > > ACEXML_XML_Svc_Conf_Parser.pc.in > > > -rw-r--r--|1370781355|./ACEXML/apps/svcconf/ > > > ACEXML_XML_Svc_Conf_Parser.bor > > > -rw-r--r--|3618598382|./ACEXML/apps/svcconf/ > > > ACEXML_XML_Svc_Conf_Parser_Static.vcproj > > > > The find takes a lot of time, can I change something in order to > > > improve performance? > > > What is taking time is executing 2 commands per file. > > > mkfifo ls.fifo > > find -L . -type f ! -name "*.css " ! -name "*.jpg" ! -name \ > > "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name \ > > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name \ > > "*.html" -perm -o+r -exec sh -c ' > > ls -lLd "$@" >&3 & cksum "$@" & wait' sh {} + 3> ls.fifo | > > paste ls.fifo - | awk -vOFS='|' '{print $1,$10,$9}' > > > (note that will -L (formerly -follow), -type f returns true for > > symlinks to regular files (unless that file has already been > > accounted for), and cksum most probably does the checksum of the > > pointed file, so you probably want the -L option to ls (without > > which it won't work anyway because of the -> ... extra fields in > > ls output). Also, that solution won't work for filenames > > containing blanks or newline characters) > > > -- > > Stéphane > > Stephane, thanks a lot, but I didn't get any output from your command : > ( I split my find to the following code: TEMP_FILE=/tmp/check_3th_$$ cd ${DIRECTORY_TO_CHECK} find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name "*.html" -perm -o+r > ${TEMP_FILE} cat ${TEMP_FILE} | xargs -n 20 cksum > ${TEMP_FILE}_1 cat ${TEMP_FILE} | xargs -n 1 ls -la > ${TEMP_FILE}_2 paste ${TEMP_FILE}_1 ${TEMP_FILE}_2 | awk '{ print $4"|"$1"|"$3}' rm -f ${TEMP_FILE}_1 ${TEMP_FILE}_2 ${TEMP_FILE} Before the split it was taken 9 minutes to run, after the split it's 2 minutes. Can I still improve performance?
From: Jon LaBadie on 26 Apr 2010 21:43 Shurik wrote: > > I split my find to the following code: > > TEMP_FILE=/tmp/check_3th_$$ > > cd ${DIRECTORY_TO_CHECK} > > find . -follow -type f ! -name "*.css " ! -name "*.jpg" ! -name > "*.hpp" ! -name "*.gif" ! -name "*.c" ! -name "*.cpp" ! -name > "*.txt" ! -name "*.log" ! -name "*.java" ! -name "*.h" ! -name > "*.html" -perm -o+r > ${TEMP_FILE} > > cat ${TEMP_FILE} | xargs -n 20 cksum > ${TEMP_FILE}_1 > > cat ${TEMP_FILE} | xargs -n 1 ls -la > ${TEMP_FILE}_2 > paste ${TEMP_FILE}_1 ${TEMP_FILE}_2 | awk '{ print $4"|"$1"|"$3}' > > rm -f ${TEMP_FILE}_1 ${TEMP_FILE}_2 ${TEMP_FILE} > > Before the split it was taken 9 minutes to run, after the split it's 2 > minutes. Can I still improve performance? separating the file finding and name exclusion "may" help, particularly on a multi-cpu system. Eg. find . -follow -type f -perm -o+r | grep -E -v '(\.css|\.jpg|\.hpp|\.gif ... |\.html)$' > ${TEMP_FILE} But I suspect that cksum is using the bulk of your time. It might be worthwhile to use 'time' to check how long each statement takes so you know where to try and optimize.
|
Pages: 1 Prev: grep improve performance Next: alternative for ctime |