From: James Harris on 7 Jun 2010 17:10 On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote: > I'm trying to use wget -r to back up > > http://sundry.wikispaces.com/ > > but it fails to back up more than the home page. The same command > works fine elsewhere and I've tried various options for the above web > site to no avail. The site seems to use a session id - if that's > important - but the home page as downloaded clearly has the <a href > links to further pages so I'm not sure why wget fails to follow them. > > Any ideas? No response from comp.unix.admin. Trying comp.unix.shell. Maybe someone there has an idea to fix the wget problem...? James
From: Tony on 8 Jun 2010 17:19 On 07/06/2010 22:10, James Harris wrote: > On 4 June, 23:47, James Harris<james.harri...(a)googlemail.com> wrote: > >> I'm trying to use wget -r to back up >> >> http://sundry.wikispaces.com/ >> Any ideas? > > No response from comp.unix.admin. Trying comp.unix.shell. Maybe > someone there has an idea to fix the wget problem...? Does the site's robots.txt file preclude the links you're trying to spider? wget plays nice by default. -- Tony Evans Saving trees and wasting electrons since 1993 blog -> http://perceptionistruth.com/ books -> http://www.bookthing.co.uk [ anything below this line wasn't written by me ]
From: Bob Melson on 8 Jun 2010 19:17 On Tuesday 08 June 2010 15:19, Tony (tony(a)darkstorm.invalid) opined: > On 07/06/2010 22:10, James Harris wrote: >> On 4 June, 23:47, James Harris<james.harri...(a)googlemail.com> wrote: >> >>> I'm trying to use wget -r to back up >>> >>> http://sundry.wikispaces.com/ > >>> Any ideas? >> >> No response from comp.unix.admin. Trying comp.unix.shell. Maybe >> someone there has an idea to fix the wget problem...? > > Does the site's robots.txt file preclude the links you're trying to > spider? wget plays nice by default. > > > -- > Tony Evans > Saving trees and wasting electrons since 1993 > blog -> http://perceptionistruth.com/ > books -> http://www.bookthing.co.uk > [ anything below this line wasn't written by me ] Another thing to consider is that many folks killfile all gmail, googlemail and googlegroups addresses because of the huge amount spam originating on them and google's refusal to do anything about it. Many of us don't see those original posts, just the rare responses. -- Robert G. Melson | Rio Grande MicroSolutions | El Paso, Texas ----- Nothing astonishes men so much as common sense and plain dealing. Ralph Waldo Emerson
From: Christian on 11 Jun 2010 09:56 >"James Harris" <james.harris.1(a)googlemail.com> a �crit dans le message de >news: daed461b-a37a-445a-8c7d-4791875fc4fe(a)t10g2000yqg.googlegroups.com... >On 4 June, 23:47, James Harris <james.harri...(a)googlemail.com> wrote: >> I'm trying to use wget -r to back up >> >> http://sundry.wikispaces.com/ >> >> but it fails to back up more than the home page. The same command >> works fine elsewhere and I've tried various options for the above web >> site to no avail. The site seems to use a session id - if that's >> important - but the home page as downloaded clearly has the <a href >> links to further pages so I'm not sure why wget fails to follow them. >> >> Any ideas? >No response from comp.unix.admin. Trying comp.unix.shell. Maybe >someone there has an idea to fix the wget problem...? >James Try with a 'standard' user-agent : wget --user-agent="Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; GTB6.4; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; Tablet PC 2.0; .NET CLR 3.5.21022; .NET CLR 3.5.30729; ..NET CLR 3.0.30729)" ... Christian
From: James Harris (es) on 11 Jun 2010 12:59 "Tony" <tony(a)darkstorm.invalid> wrote in message news:humc4g$5p7$1(a)matrix.darkstorm.co.uk... > On 07/06/2010 22:10, James Harris wrote: >> On 4 June, 23:47, James Harris<james.harri...(a)googlemail.com> wrote: >> >>> I'm trying to use wget -r to back up >>> >>> http://sundry.wikispaces.com/ > >>> Any ideas? >> >> No response from comp.unix.admin. Trying comp.unix.shell. Maybe >> someone there has an idea to fix the wget problem...? > > Does the site's robots.txt file preclude the links you're trying to > spider? wget plays nice by default. Good idea. I've been checking it and it doesn't seem to be the problem. It has lines such as User-agent: * Disallow: /file/rename Disallow: /file/delete but these don't disallow the data pages that I want to back up. There is also a sitemap.xml. To my untutored eye it looks fine too. James
|
Next
|
Last
Pages: 1 2 Prev: sed queation - remove all characters after a hyphen Next: fgrep,grep and egrep |