Prev: awk: is it possible to use some charcters' combination as the field-separator?
Next: awk: is it possible to use some charcters' combination as thefield-separator?
From: Andreas Marschke on 22 Feb 2010 10:19 > Probably not what you're looking for, but it seems this does the same > thing (I see you're using GNU sed) Interesting... How different is GNU sed from some of the BSD's ones or are you referring to other unicese like HP-UX or Solaris? > wget http://www.jargon.net/ -O- 2>/dev/null | sed -n '\:<A > HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>: {s/<[^>]*>//g;s/ > */ /gp;}' > > However, keep in mind that parsing html with sed/grep and other > regex-based tools is difficult if you can't count on the input having a > fixed, known format. Thats quite true. But basically its only a bit more typing effort for the good hacker.
From: pk on 22 Feb 2010 10:31 Andreas Marschke wrote: >> Probably not what you're looking for, but it seems this does the same >> thing (I see you're using GNU sed) > > Interesting... How different is GNU sed from some of the BSD's ones or are > you referring to other unicese like HP-UX or Solaris? GNU sed supports a number of extensions, like using \|, \+ or \? in regexps, and it also supports extended regexps (plus a load of other features not found in standard sed). >> However, keep in mind that parsing html with sed/grep and other >> regex-based tools is difficult if you can't count on the input having a >> fixed, known format. > Thats quite true. But basically its only a bit more typing effort for the > good hacker. I suppose the truly good hacker uses a parser to minimize both typing and error likeliness, but that's just my opinion.
From: Janis Papanagnou on 22 Feb 2010 11:19 Andreas Marschke wrote: > Hi ! > > I was just wondering wether somebody wants to share his/her best shell > script snippets here on the list. Im interested in everything that can do > something nifty to a system or a website. Pick your favourite shell wether > its bash,sh,dash,ksh,csh,fish or whatever just have fun hacking and share > your jewels! > > To start it off Here is a simple bash script scraping the daily JARGON off > the website for the new hackers dictionary: > > |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| > #!/bin/bash > > wget http://www.jargon.net/ -O- 2>/dev/null | grep '<A HREF="/jargonfile/[a- > z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed 's:\(<[a-zA-Z0-9]*>\|</[a-zA- > Z0-9]*>\|<A HREF="/[a-zA-Z0-9]*/[a-z]/[a-zA-Z0-9]*\.html">\|<[a-z]*>\|</[a- > z]*>\)::g' | sed s/\ \ */\ /g > |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Whenever I see a command pipe like this one with the grep|sed|sed sequence I wonder why the programmer does not use just a single tool, preferable one that results in clearer, better legible, and easier maintainable code. Janis > > Cheers and happy hacking! > > Andreas Marschke.
From: Andreas Marschke on 22 Feb 2010 14:40 > I suppose the truly good hacker uses a parser to minimize both typing and > error likeliness, but that's just my opinion. What do you mean by "parser" ?
From: Janis Papanagnou on 22 Feb 2010 14:43
Andreas Marschke wrote: >> I suppose the truly good hacker uses a parser to minimize both typing and >> error likeliness, but that's just my opinion. > > What do you mean by "parser" ? > A tool aware of the specific syntax of the data. Janis |