Prev: awk: is it possible to use some charcters' combination as the field-separator?
Next: awk: is it possible to use some charcters' combination as thefield-separator?
From: Andreas Marschke on 22 Feb 2010 05:43 Hi ! I was just wondering wether somebody wants to share his/her best shell script snippets here on the list. Im interested in everything that can do something nifty to a system or a website. Pick your favourite shell wether its bash,sh,dash,ksh,csh,fish or whatever just have fun hacking and share your jewels! To start it off Here is a simple bash script scraping the daily JARGON off the website for the new hackers dictionary: |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| #!/bin/bash wget http://www.jargon.net/ -O- 2>/dev/null | grep '<A HREF="/jargonfile/[a- z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed 's:\(<[a-zA-Z0-9]*>\|</[a-zA- Z0-9]*>\|<A HREF="/[a-zA-Z0-9]*/[a-z]/[a-zA-Z0-9]*\.html">\|<[a-z]*>\|</[a- z]*>\)::g' | sed s/\ \ */\ /g |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Cheers and happy hacking! Andreas Marschke.
From: pk on 22 Feb 2010 06:19 Andreas Marschke wrote: > Hi ! > > I was just wondering wether somebody wants to share his/her best shell > script snippets here on the list. Im interested in everything that can do > something nifty to a system or a website. Pick your favourite shell wether > its bash,sh,dash,ksh,csh,fish or whatever just have fun hacking and share > your jewels! > > To start it off Here is a simple bash script scraping the daily JARGON off > the website for the new hackers dictionary: > > |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| > #!/bin/bash > > wget http://www.jargon.net/ -O- 2>/dev/null | grep '<A > HREF="/jargonfile/[a- z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed > 's:\(<[a-zA-Z0-9]*>\|</[a-zA- Z0-9]*>\|<A > HREF="/[a-zA-Z0-9]*/[a-z]/[a-zA-Z0-9]*\.html">\|<[a-z]*>\|</[a- z]*>\)::g' > | sed s/\ \ */\ /g > |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| Probably not what you're looking for, but it seems this does the same thing (I see you're using GNU sed) wget http://www.jargon.net/ -O- 2>/dev/null | sed -n '\:<A HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>:{s/<[^>]*>//g;s/ */ /gp;}' However, keep in mind that parsing html with sed/grep and other regex-based tools is difficult if you can't count on the input having a fixed, known format.
From: Murat D. Kadirov on 22 Feb 2010 08:16 On Mon, Feb 22, 2010 at 11:19:15AM +0000, pk wrote: > Probably not what you're looking for, but it seems this does the same thing > (I see you're using GNU sed) > > wget http://www.jargon.net/ -O- 2>/dev/null | sed -n '\:<A > HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>:{s/<[^>]*>//g;s/ > */ /gp;}' > > However, keep in mind that parsing html with sed/grep and other regex-based > tools is difficult if you can't count on the input having a fixed, known > format. It is not worked for me (linux). No output at all. -- Murat D. Kadirov PGP fingerprint: 3081 EBFA 5CB9 BD24 4DB6 76EE 1B97 0A0E CEC0 6AA0
From: mop2 on 22 Feb 2010 09:06 On Mon, 22 Feb 2010 07:43:38 -0300, Andreas Marschke <xxtjaxx(a)gmail.com> wrote: > Hi ! > > I was just wondering wether somebody wants to share his/her best > shell > script snippets here on the list. Im interested in everything that > can do > something nifty to a system or a website. Pick your favourite > shell wether > its bash,sh,dash,ksh,csh,fish or whatever just have fun hacking > and share > your jewels! > > To start it off Here is a simple bash script scraping the daily > JARGON off > the website for the new hackers dictionary: > > |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| > #!/bin/bash > > wget http://www.jargon.net/ -O- 2>/dev/null | grep '<A > HREF="/jargonfile/[a- > z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>' | sed > 's:\(<[a-zA-Z0-9]*>\|</[a-zA- > Z0-9]*>\|<A > HREF="/[a-zA-Z0-9]*/[a-z]/[a-zA-Z0-9]*\.html">\|<[a-z]*>\|</[a- > z]*>\)::g' | sed s/\ \ */\ /g > |+-+-+-+-+-+--+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+| > > Cheers and happy hacking! > > Andreas Marschke. An alternative for one line mono spaced, specific for that site: echo `wget -qO- http://www.jargon.net/|grep HR|sed 's/<[^>]*>//g'` For fragment of web pages, I think 3 generic functions are convenient: f1 - get the page f2 - filter the desired fragment f3 - remove html tags and display as text, monospaced and honoring newlines and, perhaps, bold tags Or perhaps one function with 3 parameters PS: bash can manage tcp connection it self, a convenience with micro systems
From: Mick on 22 Feb 2010 10:10
Murat D. Kadirov wrote: > On Mon, Feb 22, 2010 at 11:19:15AM +0000, pk wrote: >> Probably not what you're looking for, but it seems this does the same thing >> (I see you're using GNU sed) >> >> wget http://www.jargon.net/ -O- 2>/dev/null | sed -n '\:<A >> HREF="/jargonfile/[a-z]/[a-zA-Z0-9]*.html">[a-zA-Z0-9]*</A>:{s/<[^>]*>//g;s/ >> */ /gp;}' >> >> However, keep in mind that parsing html with sed/grep and other regex-based >> tools is difficult if you can't count on the input having a fixed, known >> format. > > It is not worked for me (linux). No output at all. > Although it was multiple lines in your newsgroup reader if you are using cut & paste they need to be re-concatenated into a single line. |