Prev: FAQ 4.74 How do I print out or copy a recursive data structure?
Next: Where to get the document for a library function in command line (linux)?
From: Jorge on 20 May 2010 15:19 I have a Linux directory full of shell scripts (~100 scripts with average size of ~60 lines) with many of them calling other scripts as depencencies. Not being the author of the scripts and with the immediate need to become familiar with the overall scope of this script directory, I needed to create a cross- reference chart that shows what script depends on what other script. My Perl script (below) works in terms of doing what I need, however, IMO, it appears to run a bit slow. I timed it at ~3.5 sec. Possibly that is it's max performance but something tells me (from other Perl script experiences) this one just isn't up to speed. I would appreciate any ideas or comments. Thanks, Jorge #!/usr/bin/perl -l use strict; use warnings; &scripts_dir('/some/dir/that/holds/scripts'); sub scripts_dir { # set some vars my $dir = shift; my(@paths, $paths, @scripts, $scripts, $string); my($lines, $leaf, $header, $header2); local($_); # check dir can be opened for read unless (opendir(DIR, $dir)) { die "can't open $dir $!\n"; closedir(DIR); return; } # # read dir, skip over system files and bak files # build array of script names # build array of full-path script names # foreach (readdir(DIR)) { next if $_ eq '.' || $_ eq '..' || $_ =~ /\.bak$/; push(@scripts, $_); $paths = $dir."/".$_; push(@paths, $paths); } closedir(DIR); # open ouput file, format and print headers open OUT, ">output" or die "cannot open output for writing $! ... \n"; $header = sprintf("%-25s%-25s%s", "SCRIPT NAME", "FOUND IN", "USAGE"); $header2 = sprintf("%-25s%-25s%s", "===========", "========", "====="); print OUT $header; print OUT $header2; # loop through each script name foreach $scripts(@scripts){ # loop through each script in directory foreach my $paths(@paths){ # get last leaf of script being searched -- # if it matches itself; skip $leaf = get_leaf($paths); if($scripts eq $leaf) { next;} # open each script for searching open F, "$paths" or die "cannot open $paths for reading $! ...\n"; while(my $lines = <F>) { # -l switch in place chomp($lines); # search for matches to the commonly-used command syntax if($lines =~ /\$\($scripts / || $lines =~ /\` $scripts /){ # format to line up with headers $string = sprintf("%-25s%-25s%s", $scripts, $leaf, $lines); # print to file print OUT $string; } } } } # close I/O streams close(F); close(OUT); } #====================== # subroutine get_leaf #====================== sub get_leaf { # get arg(s) my($pathname) = @_; # split on leafs my @split_pathname = split( /[\\\/]/, $pathname); # grab last leaf my $leaf = pop( @split_pathname ); # return bare script name (leaf) return( $leaf ); } __END__
From: Uri Guttman on 20 May 2010 15:47 >>>>> "J" == Jorge <awkster(a)yahoo.com> writes: J> #!/usr/bin/perl -l J> use strict; J> use warnings; good J> &scripts_dir('/some/dir/that/holds/scripts'); don't call subs with & as it isn't needed with () and it is perl4 style. there are other subtle issues that can bite you. J> sub scripts_dir { J> # set some vars J> my $dir = shift; J> my(@paths, $paths, @scripts, $scripts, $string); J> my($lines, $leaf, $header, $header2); don't declare vars before you need them. in many cases you can declare then when first used. J> local($_); why? actually i recommend against using $_ as much as possible (it does have its places) so you can used named vars which are easier to read and make the code better J> # check dir can be opened for read J> unless (opendir(DIR, $dir)) { use a lexical handle unless (opendir(my $dirh, $dir)) { J> die "can't open $dir $!\n"; J> closedir(DIR); why close the handle if it never opened? J> return; you can just exit here instead. J> } better yet, use File::Slurp's read_dir. J> # J> # read dir, skip over system files and bak files J> # build array of script names J> # build array of full-path script names J> # J> foreach (readdir(DIR)) { J> next if $_ eq '.' || $_ eq '..' || $_ =~ /\.bak$/; read_dir skips . and .. for you. J> push(@scripts, $_); J> $paths = $dir."/".$_; J> push(@paths, $paths); J> } J> closedir(DIR); that can all be done so much simpler like this (untested): my @paths = map "$dir/$_", grep !/\.bak$/, read_dir $dir ; J> # open ouput file, format and print headers J> open OUT, ">output" or die "cannot open output for writing $! J> ... ditto for lexical handles J> \n"; J> $header = sprintf("%-25s%-25s%s", "SCRIPT NAME", "FOUND IN", J> "USAGE"); J> $header2 = sprintf("%-25s%-25s%s", "===========", "========", J> "====="); you use the same sprintf format three times. put that into a variable above so you can change it in one place. J> print OUT $header; J> print OUT $header2; no need for two print calls. print OUT $header, $header2 ; J> # loop through each script name J> foreach $scripts(@scripts){ foreach my $scripts (@scripts) { that is how you can declare vars locally. and use more horizontal whitespace. others need to read your code so think about them when you write it. and YOU are an other too! :) i noticed this below but the point is true here. don't use plural names for singular things. $scripts has a single script name J> # loop through each script in directory J> foreach my $paths(@paths){ you used my there. why not in the previous loop? J> # get last leaf of script being searched -- J> # if it matches itself; skip J> $leaf = get_leaf($paths); J> if($scripts eq $leaf) { next;} slightly faster as you don't need a block entry on next. also a better style for simple flow control like this. next if $scripts eq $leaf ; J> # open each script for searching J> open F, "$paths" or die "cannot open $paths for reading don't quote scalar vars as it can lead to subtle bugs. it is not needed here. J> $! ...\n"; J> while(my $lines = <F>) { $lines is a single line so make that singular. using a plural name implies an array or array ref J> # -l switch in place that only affects one liners using -p or -n. no effect on regular code J> chomp($lines); J> # search for matches to the commonly-used command J> syntax J> if($lines =~ /\$\($scripts / || $lines =~ /\` J> $scripts /){ this doesn't make sense. are you checking if the current script refers to itself?? J> # format to line up with headers J> $string = sprintf("%-25s%-25s%s", $scripts, $leaf, J> $lines); J> # print to file J> print OUT $string; since you just print the string, you can use printf directly. J> } J> } J> } J> } J> # close I/O streams J> close(F); J> close(OUT); J> } J> #====================== J> # subroutine get_leaf J> #====================== J> sub get_leaf J> { J> # get arg(s) J> my($pathname) = @_; J> # split on leafs J> my @split_pathname = split( /[\\\/]/, $pathname); gack!! File::Basename does this for you and simpler. J> # grab last leaf J> my $leaf = pop( @split_pathname ); you could just return the popped value directly. hell, you can do that in the split line too: return( (split( /[\\\/]/, $pathname)[-1] ); as for speedups, i can't help much since i don't have your input data. nothing seems oddly slow looking but your core loops are deep and can be done better. slurping in the entire file (File::Slurp) and scanning for those lines in one regex would be noticeably faster. and your core logic is suspect as it doesn't seem to check for calling other scripts from a given one. uri -- Uri Guttman ------ uri(a)stemsystems.com -------- http://www.sysarch.com -- ----- Perl Code Review , Architecture, Development, Training, Support ------ --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
From: J. Gleixner on 20 May 2010 16:02 Jorge wrote: > I have a Linux directory full of shell scripts (~100 scripts with > average > size of ~60 lines) with many of them calling other scripts as > depencencies. > > Not being the author of the scripts and with the immediate need to > become familiar with > the overall scope of this script directory, I needed to create a cross- > reference chart > that shows what script depends on what other script. > > My Perl script (below) works in terms of doing what I need, however, > IMO, it appears to run a bit slow. I timed it at ~3.5 sec. Possibly > that is it's max performance but something tells me (from other Perl > script experiences) this one just isn't up to speed. 3 seconds isn't fast enough??.. just write it, run it, and move on.. sheesh.. > > I would appreciate any ideas or comments. > > Thanks, Jorge > > #!/usr/bin/perl -l > > use strict; > use warnings; > > &scripts_dir('/some/dir/that/holds/scripts'); Remove the '&' there. > > sub scripts_dir { > > # set some vars > my $dir = shift; > my(@paths, $paths, @scripts, $scripts, $string); > my($lines, $leaf, $header, $header2); > local($_); Declare your vars in the smallest possible scope. > > # check dir can be opened for read > unless (opendir(DIR, $dir)) { > die "can't open $dir $!\n"; After die is called, nothing after this point is called.. > closedir(DIR); > return; > } > > # > # read dir, skip over system files and bak files > # build array of script names > # build array of full-path script names > # > foreach (readdir(DIR)) { > next if $_ eq '.' || $_ eq '..' || $_ =~ /\.bak$/; > push(@scripts, $_); > $paths = $dir."/".$_; > push(@paths, $paths); > } > closedir(DIR); > > # open ouput file, format and print headers > open OUT, ">output" or die "cannot open output for writing $! ... > \n"; use 3 argument open. open( my $out, '>', 'output' ) || die "can't open output: $!"; > $header = sprintf("%-25s%-25s%s", "SCRIPT NAME", "FOUND IN", > "USAGE"); > $header2 = sprintf("%-25s%-25s%s", "===========", "========", > "====="); > print OUT $header; > print OUT $header2; Just use printf. printf OUT "%-25s%-25s%s\n", "SCRIPT NAME", "FOUND IN", "USAGE"; or printf $out... > > # loop through each script name > foreach $scripts(@scripts){ for my $script( @scripts ) > > # loop through each script in directory > foreach my $paths(@paths){ $path seems better than $paths > > # get last leaf of script being searched -- > # if it matches itself; skip > $leaf = get_leaf($paths); my $leaf = ( split( /\//, $paths ) )[-1]; > if($scripts eq $leaf) { next;} next if $scripts eq $leaf; > > # open each script for searching > open F, "$paths" or die "cannot open $paths for reading > $! ...\n"; open( my $file, '<', $path ) || die "can't open $path: $!; > > while(my $lines = <F>) { my $line instead of my $lines. > > # -l switch in place > chomp($lines); > > # search for matches to the commonly-used command > syntax > if($lines =~ /\$\($scripts / || $lines =~ /\` > $scripts /){ > > # format to line up with headers > $string = sprintf("%-25s%-25s%s", $scripts, $leaf, > $lines); > > # print to file > print OUT $string; again printf > } > } Should close F or $file here. > } > } > # close I/O streams > close(F); ^^^ should be just after the end of the while.. above. > close(OUT); > } > > #====================== > # subroutine get_leaf > #====================== > sub get_leaf > { > # get arg(s) > my($pathname) = @_; > > # split on leafs > my @split_pathname = split( /[\\\/]/, $pathname); > > # grab last leaf > my $leaf = pop( @split_pathname ); > > # return bare script name (leaf) > return( $leaf ); > } > > __END__ > >
From: Jorge on 20 May 2010 16:35 I wasn't expecting a full critique but I'll take it -- learning is good so thanks for your effort and time. Most of your points are obvious (once read) and some I will have to digest. Thanks again On May 20, 12:47 pm, "Uri Guttman" <u...(a)StemSystems.com> wrote: > >>>>> "J" == Jorge <awks...(a)yahoo.com> writes: > > J> #!/usr/bin/perl -l > > J> use strict; > J> use warnings; > > good > > J> &scripts_dir('/some/dir/that/holds/scripts'); > > don't call subs with & as it isn't needed with () and it is perl4 > style. there are other subtle issues that can bite you. > > J> sub scripts_dir { > > J> # set some vars > J> my $dir = shift; > J> my(@paths, $paths, @scripts, $scripts, $string); > J> my($lines, $leaf, $header, $header2); > > don't declare vars before you need them. in many cases you can declare > then when first used. > > J> local($_); > > why? actually i recommend against using $_ as much as possible (it does > have its places) so you can used named vars which are easier to read and > make the code better > > J> # check dir can be opened for read > J> unless (opendir(DIR, $dir)) { > > use a lexical handle > > unless (opendir(my $dirh, $dir)) { > > J> die "can't open $dir $!\n"; > J> closedir(DIR); > > why close the handle if it never opened? > > J> return; > > you can just exit here instead. > > J> } > > better yet, use File::Slurp's read_dir. > > J> # > J> # read dir, skip over system files and bak files > J> # build array of script names > J> # build array of full-path script names > J> # > J> foreach (readdir(DIR)) { > J> next if $_ eq '.' || $_ eq '..' || $_ =~ /\.bak$/; > > read_dir skips . and .. for you. > > J> push(@scripts, $_); > J> $paths = $dir."/".$_; > J> push(@paths, $paths); > J> } > J> closedir(DIR); > > that can all be done so much simpler like this (untested): > > my @paths = map "$dir/$_", grep !/\.bak$/, read_dir $dir ; > > J> # open ouput file, format and print headers > J> open OUT, ">output" or die "cannot open output for writing $! > J> ... > > ditto for lexical handles > > J> \n"; > J> $header = sprintf("%-25s%-25s%s", "SCRIPT NAME", "FOUND IN", > J> "USAGE"); > J> $header2 = sprintf("%-25s%-25s%s", "===========", "========", > J> "====="); > > you use the same sprintf format three times. put that into a variable > above so you can change it in one place. > > J> print OUT $header; > J> print OUT $header2; > > no need for two print calls. > > print OUT $header, $header2 ; > > J> # loop through each script name > J> foreach $scripts(@scripts){ > > foreach my $scripts (@scripts) { > > that is how you can declare vars locally. and use more horizontal > whitespace. others need to read your code so think about them when you > write it. and YOU are an other too! :) > > i noticed this below but the point is true here. don't use plural names > for singular things. $scripts has a single script name > > J> # loop through each script in directory > J> foreach my $paths(@paths){ > > you used my there. why not in the previous loop? > > J> # get last leaf of script being searched -- > J> # if it matches itself; skip > J> $leaf = get_leaf($paths); > J> if($scripts eq $leaf) { next;} > > slightly faster as you don't need a block entry on next. also a better > style for simple flow control like this. > > next if $scripts eq $leaf ; > > J> # open each script for searching > J> open F, "$paths" or die "cannot open $paths for reading > > don't quote scalar vars as it can lead to subtle bugs. it is not needed here. > > J> $! ...\n"; > > J> while(my $lines = <F>) { > > $lines is a single line so make that singular. using a plural name > implies an array or array ref > > J> # -l switch in place > > that only affects one liners using -p or -n. no effect on regular code > > J> chomp($lines); > > J> # search for matches to the commonly-used command > J> syntax > J> if($lines =~ /\$\($scripts / || $lines =~ /\` > J> $scripts /){ > > this doesn't make sense. are you checking if the current script refers > to itself?? > > J> # format to line up with headers > J> $string = sprintf("%-25s%-25s%s", $scripts, $leaf, > J> $lines); > > J> # print to file > J> print OUT $string; > > since you just print the string, you can use printf directly. > > J> } > J> } > J> } > J> } > J> # close I/O streams > J> close(F); > J> close(OUT); > J> } > > J> #====================== > J> # subroutine get_leaf > J> #====================== > J> sub get_leaf > J> { > J> # get arg(s) > J> my($pathname) = @_; > > J> # split on leafs > J> my @split_pathname = split( /[\\\/]/, $pathname); > > gack!! > > File::Basename does this for you and simpler. > > J> # grab last leaf > J> my $leaf = pop( @split_pathname ); > > you could just return the popped value directly. hell, you can do that > in the split line too: > > return( (split( /[\\\/]/, $pathname)[-1] ); > > as for speedups, i can't help much since i don't have your input > data. nothing seems oddly slow looking but your core loops are deep and > can be done better. slurping in the entire file (File::Slurp) and > scanning for those lines in one regex would be noticeably faster. and > your core logic is suspect as it doesn't seem to check for calling other > scripts from a given one. > > uri > > -- > Uri Guttman ------ u...(a)stemsystems.com -------- http://www.sysarch.com-- > ----- Perl Code Review , Architecture, Development, Training, Support ------ > --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com---------
From: Jim Gibson on 20 May 2010 17:30
In article <6bf31a76-1cd6-48be-8bea-a91b1c6b83b6(a)y21g2000vba.googlegroups.com>, Jorge <awkster(a)yahoo.com> wrote: > I have a Linux directory full of shell scripts (~100 scripts with > average > size of ~60 lines) with many of them calling other scripts as > depencencies. > > Not being the author of the scripts and with the immediate need to > become familiar with > the overall scope of this script directory, I needed to create a cross- > reference chart > that shows what script depends on what other script. > > My Perl script (below) works in terms of doing what I need, however, > IMO, it appears to run a bit slow. I timed it at ~3.5 sec. Possibly > that is it's max performance but something tells me (from other Perl > script experiences) this one just isn't up to speed. > > I would appreciate any ideas or comments. > > Thanks, Jorge > Uri and J. have given you thorough critiques of your posted program. I can only add a few additional suggestions. Concerning speedup, you have implemented an O(N**2) algorithm. You are reading each of N files N-1 times, looking for a different script name each time you read each file. You should read each file only once, and look for all of your script names in each line. You shouldn't exclude the script your are reading from this search, since scripts can call themselves. I would also not look in comments, since it is very common to put script names in comments, and you don't want those. For how to search for many things at once, see the advice in 'perldoc -q many' "How do I efficiently match many regular expressions at once?" You are concatenating the directory name and file name into a path, then taking many steps to strip the directory name from the path to retrieve the file name. Don't bother, since you are only working in one directory and the $dir variable only contains one value. Add $dir to the open statement or a temp variable: my $filename = "$dir/$scripts"; open( my $f, '<', $filename ) ... Then you don't need the @paths array at all. [program snipped] -- Jim Gibson |