From: James Harris (es) on 11 Jun 2010 13:05 "Bob Melson" <amia9018(a)mypacks.net> wrote in message news:D4OdnbLEnJAPUpPRnZ2dnUVZ_sCdnZ2d(a)earthlink.com... .... > Another thing to consider is that many folks killfile all gmail, > googlemail > and googlegroups addresses because of the huge amount spam originating on > them and google's refusal to do anything about it. Many of us don't see > those original posts, just the rare responses. Understood. Google's lack of policing or even their lack of adequate response to spam reports is very bad. The touble is it's just too useful. The Usenet service providers I use seem to filter spam - including that from Google - but keep legitimate posts. To anyone who didn't see the original query, I'm trying to use wget -r to back up http://sundry.wikispaces.com/ but despite what I try I only ever get the home page. Any ideas why wget is not recursing to linked pages on the same site? James
From: James Harris (es) on 11 Jun 2010 13:11 "Christian" <cgregoir99(a)yahoo.com> wrote in message news:hutero$hsp$1(a)writer.imaginet.fr... > >"James Harris" <james.harris.1(a)googlemail.com> a �crit dans le message de > >news: > >daed461b-a37a-445a-8c7d-4791875fc4fe(a)t10g2000yqg.googlegroups.com... >>On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote: > >>> I'm trying to use wget -r to back up >>> >>> http://sundry.wikispaces.com/ >>> >>> but it fails to back up more than the home page. The same command >>> works fine elsewhere and I've tried various options for the above web >>> site to no avail. The site seems to use a session id - if that's >>> important - but the home page as downloaded clearly has the <a href >>> links to further pages so I'm not sure why wget fails to follow them. >>> >>> Any ideas? > >>No response from comp.unix.admin. Trying comp.unix.shell. Maybe >>someone there has an idea to fix the wget problem...? > >>James > > Try with a 'standard' user-agent : wget --user-agent="Mozilla/4.0 > (compatible; MSIE 7.0; Windows NT 6.0; GTB6.4; SLCC1; .NET CLR 2.0.50727; > Media Center PC 5.0; Tablet PC 2.0; .NET CLR 3.5.21022; .NET CLR > 3.5.30729; .NET CLR 3.0.30729)" ... Also a good idea. I've just tried with a couple of user-agent strings but it still doesn't work. I don't think it can be the user-agent id as wget loads the specified page successfully and that page looks alright. It contains embedded <a href=...> links. Unfortunately wget -r fails to follow them. James
From: Chris Nehren on 12 Jun 2010 05:46 ["Followup-To:" header set to comp.unix.admin.] On 2010-06-11, Christian scribbled these curious markings: >>"James Harris" <james.harris.1(a)googlemail.com> a écrit dans le message de >>news: daed461b-a37a-445a-8c7d-4791875fc4fe(a)t10g2000yqg.googlegroups.com... >>On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote: > >>> I'm trying to use wget -r to back up >>> >>> http://sundry.wikispaces.com/ >>> >>> but it fails to back up more than the home page. The same command >>> works fine elsewhere and I've tried various options for the above web >>> site to no avail. The site seems to use a session id - if that's >>> important - but the home page as downloaded clearly has the <a href >>> links to further pages so I'm not sure why wget fails to follow them. >>> >>> Any ideas? > >>No response from comp.unix.admin. Trying comp.unix.shell. Maybe >>someone there has an idea to fix the wget problem...? > >>James > > Try with a 'standard' user-agent : wget --user-agent="Mozilla/4.0 > (compatible; MSIE 7.0; Windows NT 6.0; GTB6.4; SLCC1; .NET CLR 2.0.50727; > Media Center PC 5.0; Tablet PC 2.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729; > .NET CLR 3.0.30729)" ... In addition: have you turned on debugging yet? Have you asked wget to print the HTTP headers of the requests and responses yet? The server is giving wget information that it's using to determine to not go any further. Ask it for this information and you should be able to discern why it's behaving the way it is. Otherwise you're just guessing in an engineering discipline. -- Thanks and best regards, Chris Nehren
From: Use-Author-Supplied-Address-Header on 16 Jun 2010 14:26 James Harris <james.harris.1(a)googlemail.com> wrote: : On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote: [cut] : No response from comp.unix.admin. Trying comp.unix.shell. Maybe : someone there has an idea to fix the wget problem...? The best place to deal with this is the wget mailing list. See, http://lists.gnu.org/mailman/listinfo/bug-wget. For an nntp 'mirror' see also NG gmane.comp.web.wget.general on the gmane server at news.gmane.org. HTH Tom. Ps. The email address in the header is just a spam-trap. -- Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill, Egham, Surrey, TW20 0EX, England. Email: T.Crane at rhul dot ac dot uk Fax: +44 (0) 1784 472794
First
|
Prev
|
Pages: 1 2 Prev: sed queation - remove all characters after a hyphen Next: fgrep,grep and egrep |