From: Ron Hardin on 4 Feb 2010 06:25 Worldnet is terminating its personal web pages. Apparently wget can save a copy of what I have there on my HD, but I am snowed by options. A recursive retrieve looks right, starting from index.html, and restricted to http://home.att.net/ What's a safe wget command to try that won't flood the server or download the entire world wide web? (I can fix links myself with sed scripts once it's on the HD, so just need a copy of what's there unconverted) wget is under cygwin on XP Home but ought to work the same as linux. Maybe somebody is an expert user and can give the obvious command line. -- rhhardin(a)mindspring.com On the internet, nobody knows you're a jerk.
From: Nuno J. Silva on 4 Feb 2010 14:53 Ron Hardin <rhhardin(a)mindspring.com> writes: > Worldnet is terminating its personal web pages. > > Apparently wget can save a copy of what I have there on my HD, > but I am snowed by options. wget can save copies of files you have there *and* that are accessible by following links. That is, if you uploaded some file to make it available but you never link to it, wget via http won't guess it's there. > A recursive retrieve looks right, starting from index.html, and > restricted to http://home.att.net/ I've already used wget, but I can't recall how does it decide when should other hosts be accessed. > What's a safe wget command to try that won't flood the server or > download the entire world wide web? To avoid overloading the server, you should use --wait=seconds (other units are allowed, too). > > (I can fix links myself with sed scripts once it's on the HD, so > just need a copy of what's there unconverted) wget can fix links (--convert-links) and extensions (--adjust-extension). If it works the way you want, maybe it's worth doing it in wget. Maybe you should also look at --page-requisites (to download pictures and other inlined items), and at --continue (it will avoid duplicates when wget accesses the same file twice). > wget is under cygwin on XP Home but ought to work the same as > linux. > > Maybe somebody is an expert user and can give the obvious command > line. -- Nuno J. Silva gopher://sdf-eu.org/1/users/njsg
From: root on 4 Feb 2010 15:21 Nuno J. Silva <nunojsilva(a)invalid.invalid> wrote: > >> A recursive retrieve looks right, starting from index.html, and >> restricted to http://home.att.net/ > > I've already used wget, but I can't recall how does it decide when > should other hosts be accessed. > >> What's a safe wget command to try that won't flood the server or >> download the entire world wide web? > If you direct wget -r to a URL which has a link UP instead of DOWN, you will end up pulling in much more than you want, up to the point you download the whole site.
From: Sidney Lambe on 4 Feb 2010 15:51 On comp.os.linux.misc, Ron Hardin <rhhardin(a)mindspring.com> wrote: > Worldnet is terminating its personal web pages. > > Apparently wget can save a copy of what I have there on my HD, > but I am snowed by options. > > A recursive retrieve looks right, starting from index.html, and > restricted to http://home.att.net/ > > What's a safe wget command to try that won't flood the server or > download the entire world wide web? > > (I can fix links myself with sed scripts once it's on the HD, so > just need a copy of what's there unconverted) > > wget is under cygwin on XP Home but ought to work the same as > linux. > > Maybe somebody is an expert user and can give the obvious command > line. > -- > rhhardin(a)mindspring.com > > On the internet, nobody knows you're a jerk. Posting the url so that we could look it over ourselves would make a lot of sense. And I'd post the question on comp.unix.shell. Sid
From: Nuno J. Silva on 4 Feb 2010 17:06 root <NoEMail(a)home.org> writes: > Nuno J. Silva <nunojsilva(a)invalid.invalid> wrote: >> >>> A recursive retrieve looks right, starting from index.html, and >>> restricted to http://home.att.net/ >> >> I've already used wget, but I can't recall how does it decide when >> should other hosts be accessed. >> >>> What's a safe wget command to try that won't flood the server or >>> download the entire world wide web? >> > > If you direct wget -r to a URL which has a link UP instead > of DOWN, you will end up pulling in much more than you > want, up to the point you download the whole site. To avoid that (going up on the same host) you can use --no-parent. -- Nuno J. Silva gopher://sdf-eu.org/1/users/njsg
|
Next
|
Last
Pages: 1 2 Prev: Mobile Phone Compatibility = Dead End Next: mdadm /dev/md0 raid not being created |