From: Loki Harfagr on 15 Jun 2010 05:10 Tue, 15 Jun 2010 09:57:01 +0200, Tuxedo did cat : > Chris F.A. Johnson wrote: > > [...] > >> Use formail: >> >> formail -s savemail < "$mbox" >> >> Where savemail is a script containing: >> >> cat > $(date +%Y-%m-%d_%H:%M:%S)-$(uuidgen) >> >> This will put each message in a separate file. Adjust to taste if >> you want to put more than one message into each file or to use >> different filenames. > > Thanks for this proceure, it works fine on a not-too-large mbox. > However, it fails with the huge file that that the system runs out of > memory, as I guess cat or formail tries to read in the full file to > process. But it's a good example how to split an mbox into individual > files. I will probably use this idea for something else. > > Many thanks, > Tuxedo. maybe try this variant, just hoping it woulb be less greedy and won't eat all the process: $ export FILENO=000000 ; formail -n32 +1ds procmail -p 'DEFAULT=/tmp/_mb_$FILENO' /dev/null<yourMbox
From: Chris Nehren on 15 Jun 2010 06:43 On 2010-06-15, Tuxedo scribbled these curious markings: > Chris Nehren wrote: > >> Use a module, like Mail::Box or >> Email::Folder::Mbox, something that's been tested and in production use >> at large ESPs for decades. > > How can I use these Perl modules to split the mbox? Will they not also > attempt to read the entire file in one go and run out of memory... Borrowing from the Email::Folder docs: #!/usr/bin/perl use strict; use warnings; use Email::Folder; my $folder = Email::Folder->new("some_file"); while(my $message = $folder->next_message) { print $message->header('Subject'), "\n"; } Or thereabouts. No, it will not read the entire file all at once, unless you call ->messages on the Email::Folder object. For more information on what you can do with the $message object, see Email::Simple's docs. Mail::Box not covered here because, while it is the swiss-army chainsaw of mail modules, it's also more complex with a higher learning curve. -- Thanks and best regards, Chris Nehren
From: Ben Bacarisse on 15 Jun 2010 07:06 Tuxedo <tuxedo(a)mailinator.com> writes: <snip> > Thanks for any further tips. Another plan might be to use the "reformail" tool. I've used it in similar situations though nothing on quite the same scale. In particular the -s option runs a program for each mail in the mbox file; the message is provided on stdin and an environment variable provides access to a counter so you can simply number the messages. It is often part of the "maildrop" package though I think it was originally part of the courier mail system. -- Ben.
From: Ben Bacarisse on 15 Jun 2010 07:12 Ben Bacarisse <ben.usenet(a)bsb.me.uk> writes: > Tuxedo <tuxedo(a)mailinator.com> writes: > <snip> >> Thanks for any further tips. > > Another plan might be to use the "reformail" tool. I see that formail has been suggested already. I am not sure of reformail is another implementation (in which it case it may be worth trying) or just a renaming of formail (in which case it might also have trouble with the mbox size). Maybe someone who knows both can comment. <snip> -- Ben.
From: John Kelly on 15 Jun 2010 07:33
On Tue, 15 Jun 2010 09:39:16 +0200, Tuxedo <tuxedo(a)mailinator.com> wrote: >John Kelly wrote: >> IOW, it's not hard identify message boundaries. You can use common text >> processing tools to split the big file into smaller ones. > >Thanks for the tip but I'm not sure what processing tools can be used to >split the file into smaller ones? At least no editor that I know will open >the file. It's simply too big. I was not talking about text editors, where you read the whole file into memory all at once. Tools like grep, sed, and awk read one line at at time. Or you could write a simple while loop in bash to read a file one line at a time. while read; do # each line is in $REPLY # do something with it done < mybigfile If you don't have enough knowledge of these tools to devise a solution, Chris idea of Email::Folder may work for you. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php |