From: Albert Schlef on 18 Jul 2010 19:53 Hello. I want to download some HTML page, but I also want to save with it the images it contains. I was thinking about saving it as a MHT file, this will make my life easier because I won't have to handle the files. I've checked both my browsers (Firefox and Opera) but I see that there's no command-line switch that allows me to save URLs as MHT files. I also searched the net for a Ruby library but found one that seems to only work on Windows (it's provided with a DLL) which is not good for me because I'm using Ubuntu. So, my question is: Given a URL, how can I save this page as MHT? (My program is in Ruby, but I don't mind delegating this part to a command-line utility.) -- Posted via http://www.ruby-forum.com/.
From: Nicholas Orr on 18 Jul 2010 20:20 [Note: parts of this message were removed to make it a legal post.] According to http://en.wikipedia.org/wiki/MHTML <http://en.wikipedia.org/wiki/MHTML>pursuing the mht file format seems like a lot of effort for not much gain... On Mon, Jul 19, 2010 at 9:53 AM, Albert Schlef <albertschlef(a)gmail.com>wrote: > Hello. > > I want to download some HTML page, but I also want to save with it the > images it contains. I was thinking about saving it as a MHT file, this > will make my life easier because I won't have to handle the files. I've > checked both my browsers (Firefox and Opera) but I see that there's no > command-line switch that allows me to save URLs as MHT files. I also > searched the net for a Ruby library but found one that seems to only > work on Windows (it's provided with a DLL) which is not good for me > because I'm using Ubuntu. > > So, my question is: > > Given a URL, how can I save this page as MHT? > > (My program is in Ruby, but I don't mind delegating this part to a > command-line utility.) > -- > Posted via http://www.ruby-forum.com/. > >
From: Colin Bartlett on 18 Jul 2010 21:34 [Note: parts of this message were removed to make it a legal post.] On Mon, Jul 19, 2010 at 12:53 AM, Albert Schlef <albertschlef(a)gmail.com>wrote: > Hello. > > I want to download some HTML page, but I also want to save with it the > images it contains. I was thinking about saving it as a MHT file, this > will make my life easier because I won't have to handle the files. I've > checked both my browsers (Firefox and Opera) but I see that there's no > command-line switch that allows me to save URLs as MHT files. I also > searched the net for a Ruby library but found one that seems to only > work on Windows (it's provided with a DLL) which is not good for me > because I'm using Ubuntu. > > So, my question is: > > Given a URL, how can I save this page as MHT? > > (My program is in Ruby, but I don't mind delegating this part to a > command-line utility.) > Although another post cites wikipedia as implying that using the mht file format seems like a lot of effort for not much gain, I have found it useful to save web pages (including images) to MHT (using all of Opera, Firefox and Internet Explorer), and then extract what I want (including images) from the MHT file. That said, once a web page is saved (if necessary using plugins) as MHT, as a file with images etc in a subdir, or as zip archives, it should be fairly easy to take out what you want from whatever the save format is. So: is the problem saving as MHT from the command line, or one of saving anything - MHT or HTML+Images - from the command line? Can you use Watir or http://watij.com + JRuby? From a quick look at their websites these may work, but I haven't tried them yet because the initial learning curve looks a bit steep, and because at the moment (on Microsoft Windows) I can use AutoIt with Ruby to (programatically) switch from a Ruby DosBox to the browser, and send keystrokes to save the page as MHT or plain HTML or whatever. It's not exactly elegant, but it does (mostly!) work. If all else fails, can you do something similar in Linux? If you find a reasonably elegant solution, then I'd be very interested.
|
Pages: 1 Prev: The price of a modern list architecture? Next: Shoes: Force Maintenance? |