From: Tuxedo on 14 Jun 2010 19:18 Hi, I customer just gave me a massive mail file in mbox format which has accrued over several years. The file was rescued from an old drive of a previous but now broken system, and so I would like to restore the mailbox in a mail application on a new system. The mail file was readable on the previous system in Mozilla Thunderbird, as there it had a corresponding .msf index. However, the .msf file no longer exists and the mbox itself is nearly 3GB. When placing this in a new T-Bird mail folder, the mail application tries but soon fails to generate the index which is necessary to display the messages. At first I thought the file may be corrupt so I tried running: formail -zds < big_mbox >> fixed_mbox But soon after formail began munching its way into the big_mbox there was an "Out of memory" error returned by the shell, which I guess was also what the mail client silently did. I guess I need more ram to process such big file and that any mail application, formail included, simply needs more than the filesize, which unfortunately I do not have. In any case, I think the file is probably Ok since it worked fine on the previous system. What methods exists to process and restore this huge file? How about for example splitting it into parts, such as 5 or 10 different files, obviously cut at the right points between messages. I guess the individual mbox files can then easily be readable in more or less any mail application. Can this be done via the shell and if so how? Are there any particular Unix tools to split such huge message files or create an .msf index without running out of memory in the process? Many thanks for any ideas and advise. Tuxedo
From: John Kelly on 14 Jun 2010 19:55 On Tue, 15 Jun 2010 01:18:19 +0200, Tuxedo <tuxedo(a)mailinator.com> wrote: >customer just gave me a massive mail file in mbox format >Are there any particular Unix tools to split such huge message files http://en.wikipedia.org/wiki/Mbox says: mbox is a generic term for a family of related file formats used for holding collections of electronic mail messages. All messages in an mbox mailbox are concatenated and stored as plain text in a single file. The beginning of each message is indicated by a line whose first five characters consist of "From" followed by a space (the so-called "From_ line" or "'From ' line") and the return path e-mail address. A blank line is appended to the end of each message. IOW, it's not hard identify message boundaries. You can use common text processing tools to split the big file into smaller ones. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php
From: Chris F.A. Johnson on 14 Jun 2010 20:07 On 2010-06-14, Tuxedo wrote: > Hi, > > I customer just gave me a massive mail file in mbox format which has > accrued over several years. The file was rescued from an old drive of a > previous but now broken system, and so I would like to restore the mailbox > in a mail application on a new system. > > The mail file was readable on the previous system in Mozilla Thunderbird, > as there it had a corresponding .msf index. However, the .msf file no > longer exists and the mbox itself is nearly 3GB. When placing this in a new > T-Bird mail folder, the mail application tries but soon fails to generate > the index which is necessary to display the messages. > > At first I thought the file may be corrupt so I tried running: > formail -zds < big_mbox >> fixed_mbox > > But soon after formail began munching its way into the big_mbox there was > an "Out of memory" error returned by the shell, which I guess was also what > the mail client silently did. > > I guess I need more ram to process such big file and that any mail > application, formail included, simply needs more than the filesize, which > unfortunately I do not have. In any case, I think the file is probably Ok > since it worked fine on the previous system. > > What methods exists to process and restore this huge file? How about for > example splitting it into parts, such as 5 or 10 different files, obviously > cut at the right points between messages. I guess the individual mbox files > can then easily be readable in more or less any mail application. Can this > be done via the shell and if so how? > > Are there any particular Unix tools to split such huge message files or > create an .msf index without running out of memory in the process? Use formail: formail -s savemail < "$mbox" Where savemail is a script containing: cat > $(date +%Y-%m-%d_%H:%M:%S)-$(uuidgen) This will put each message in a separate file. Adjust to taste if you want to put more than one message into each file or to use different filenames. -- Chris F.A. Johnson, author <http://shell.cfajohnson.com/> =================================================================== Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress) Pro Bash Programming: Scripting the GNU/Linux Shell (2009, Apress)
From: Maxwell Lol on 14 Jun 2010 21:17 John Kelly <jak(a)isp2dial.com> writes: > On Tue, 15 Jun 2010 01:18:19 +0200, Tuxedo <tuxedo(a)mailinator.com> > wrote: > >>customer just gave me a massive mail file in mbox format > >>Are there any particular Unix tools to split such huge message files > > > http://en.wikipedia.org/wiki/Mbox says: > > mbox is a generic term for a family of related file formats used for > holding collections of electronic mail messages. All messages in an > mbox mailbox are concatenated and stored as plain text in a single > file. The beginning of each message is indicated by a line whose first > five characters consist of "From" followed by a space (the so-called > "From_ line" or "'From ' line") and the return path e-mail address. A > blank line is appended to the end of each message. > > IOW, it's not hard identify message boundaries. You can use common text > processing tools to split the big file into smaller ones. > You can even use perl and use something like @mail = split(/\nFrom /,$mboxfile); That assume your mail system uses the "put a '>' before 'From' in all email" option.
From: John Kelly on 14 Jun 2010 22:49
On Mon, 14 Jun 2010 21:17:26 -0400, Maxwell Lol <nospam(a)com.invalid> wrote: >John Kelly <jak(a)isp2dial.com> writes: > >> On Tue, 15 Jun 2010 01:18:19 +0200, Tuxedo <tuxedo(a)mailinator.com> >> wrote: >>>customer just gave me a massive mail file in mbox format >>>Are there any particular Unix tools to split such huge message files >> IOW, it's not hard identify message boundaries. You can use common text >> processing tools to split the big file into smaller ones. > >You can even use perl and use something like > > @mail = split(/\nFrom /,$mboxfile); That will read it into memory all at once, which may cause thrashing with his 3GB file. In his scenario, better to read and write one line at a time, and open a new output file every so many messages. It's easy to shoot yourself in the foot with Perl. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php |