From: John Kelly on 15 Jun 2010 17:07 On Tue, 15 Jun 2010 22:34:43 +0200, Tuxedo <tuxedo(a)mailinator.com> wrote: >I even tested placing the resulting file 100 bytes file in a Mozilla mail >directory. The mailfolder (or file) shows up but is empty, not even a start >of a single message. I also tested with version longer than the 100 bytes. > >I guess I have been doomed with a corrupt mbox file! But how can such large >2.8GB file contain nothing readable? It should be a direct copy of the mbox >and a full version of the file, not a truncated 2GB limit file via ftp or >other file transfer. I copied the file from the original Windows drive via >USB Flash media directly onto a Linux system where I ran the dd command. > >Thanks for any advise or theories on how this possibly corrupt mbox may be >reinvigorated and viewed. 100 bytes is not enough to see the big picture. Try more, 1,000 or 10,000, or whatever it takes until you see some data that looks like mail messages. Then use the skip feature of dd to read past that when copying. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php
From: John Kelly on 15 Jun 2010 17:26 On Tue, 15 Jun 2010 22:34:43 +0200, Tuxedo <tuxedo(a)mailinator.com> wrote: >I guess I have been doomed with a corrupt mbox file! But how can such large >2.8GB file contain nothing readable? It should be a direct copy of the mbox >and a full version of the file, not a truncated 2GB limit file via ftp or >other file transfer. I copied the file from the original Windows drive via >USB Flash media directly onto a Linux system where I ran the dd command. Are you sure the original Windows file is mbox format? Even if it is, there are opportunites for extra garbage to be added when copying from one system to another. If you can find mbox messages somewhere in the file, you can use dd to strip off the leading garbage. But maybe it's not really mbox format, and there is extra garbage between each message. Or worse, some kind of compressed format where you can't really see what you have just by looking at the data. Tinkering with the data, using dd, can help you answer those questions. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php
From: Janis Papanagnou on 15 Jun 2010 19:59 Tuxedo wrote: > Hi, > > I customer just gave me a massive mail file in mbox format which has > accrued over several years. The file was rescued from an old drive of a > previous but now broken system, and so I would like to restore the mailbox > in a mail application on a new system. > > The mail file was readable on the previous system in Mozilla Thunderbird, > as there it had a corresponding .msf index. However, the .msf file no > longer exists and the mbox itself is nearly 3GB. When placing this in a new > T-Bird mail folder, the mail application tries but soon fails to generate > the index which is necessary to display the messages. > > At first I thought the file may be corrupt so I tried running: > formail -zds < big_mbox >> fixed_mbox > > But soon after formail began munching its way into the big_mbox there was > an "Out of memory" error returned by the shell, which I guess was also what > the mail client silently did. > > I guess I need more ram to process such big file and that any mail > application, formail included, simply needs more than the filesize, which > unfortunately I do not have. In any case, I think the file is probably Ok > since it worked fine on the previous system. > > What methods exists to process and restore this huge file? How about for > example splitting it into parts, such as 5 or 10 different files, obviously > cut at the right points between messages. I guess the individual mbox files > can then easily be readable in more or less any mail application. Can this > be done via the shell and if so how? > > Are there any particular Unix tools to split such huge message files or > create an .msf index without running out of memory in the process? > > Many thanks for any ideas and advise. I haven't read the whole bandworm thread, so that may already have been suggested; say you want the mails sorted by month and year, as defined in the From field (e.g. "From - Sun Dec 27 21:08:44 2009", and all mails from Dec 2009 in file mbox shall be stored in a file mbox_2009-Dec)... awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox (If the number of created files will exceed some number of allowed open file descriptors, please tell us, then the code needs some adjustments.) Janis > > Tuxedo
From: Janis Papanagnou on 15 Jun 2010 20:07 [ Sorry for the followup to my own post.] Janis Papanagnou wrote: > > I haven't read the whole bandworm thread, so that may already have been > suggested; say you want the mails sorted by month and year, as defined > in the From field (e.g. "From - Sun Dec 27 21:08:44 2009", and all mails > from Dec 2009 in file mbox shall be stored in a file mbox_2009-Dec)... > > awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox To prevent a message body line starting with "From [...]" you can defined the pattern more accurate, instead of /^From / specify (for example)... /^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ {...} or perhaps just NF==7 && /^From / {...} > > (If the number of created files will exceed some number of allowed open > file descriptors, please tell us, then the code needs some adjustments.) > > Janis
From: Tuxedo on 16 Jun 2010 02:07
Janis Papanagnou wrote: [...] > > awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox > > To prevent a message body line starting with "From [...]" you can defined > the pattern more accurate, instead of /^From / specify (for example)... > > /^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ {...} > > or perhaps just > > NF==7 && /^From / {...} > > > > > (If the number of created files will exceed some number of allowed open > > file descriptors, please tell us, then the code needs some adjustments.) > > > > Janis > Thanks for this awk tip! But you are right, the first one catches message body text that simply begin a line with "From": awk '/^From / { f = "mbox_"$NF"-"$4 } { print > f }' mbox The other versions, however, I get some errors with. I presume I am replicating it in some wrong way: awk '/^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ { f = "mbox_"$NF"-"$4 } { print > f }' mbox The error for the above is "redirection has null string value". awk 'NF==7 && /^From { f = "mbox_"$NF"-"$4 } { print > f }' sent-mail The error here is "unterminated regexp". Perhaps you can correct the above or type your two last examples in full? Thanks, Tuxedo |