From: Chris Nehren on 16 Jun 2010 04:39 On 2010-06-16, Tuxedo scribbled these curious markings: > I thought the mbox format was meant to begin with "From" on the first line > of the file. At least that's how mboxes look on my Linux box. But who knows > what could have been inserted by some Windows application. Silly mortal, assuming software adheres to standards. Have you watched that video yet? :) -- Thanks and best regards, Chris Nehren Unless noted, all content I post is CC-BY-SA.
From: John Kelly on 16 Jun 2010 07:22 On Wed, 16 Jun 2010 02:07:09 +0200, Janis Papanagnou <janis_papanagnou(a)hotmail.com> wrote: >To prevent a message body line starting with "From [...]" you can defined >the pattern more accurate, instead of /^From / specify (for example)... > > /^From - [A-Z][a-z][a-z] [A-Z][a-z][a-z] .* [0-9][0-9][0-9][0-9]$/ {...} > >or perhaps just > > NF==7 && /^From / {...} I wonder how mail programs cope with that. The extra test is good, but not foolproof. No test can be foolproof, unless "^From " in the body is escaped (mangled) when stored. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php
From: John Kelly on 16 Jun 2010 07:45 Tuxedo <tuxedo(a)mailinator.com> wrote: >Yes I think the file must be in some compressed format. Low level tools like dd help focus on the real problem. >I customer just gave me a massive mail file in mbox format which has >accrued over several years. The file was rescued from an old drive of a >previous but now broken system, and so I would like to restore the mailbox >in a mail application on a new system. > >The mail file was readable on the previous system in Mozilla Thunderbird, >as there it had a corresponding .msf index. However, the .msf file no >longer exists and the mbox itself is nearly 3GB. What? You said you rescued an old drive. So if the .msf file no longer exists, how can you know it had a .msf file? I'm beginning to wonder if this thread is a practical joke. -- Web mail, POP3, and SMTP http://www.beewyz.com/freeaccounts.php
From: Tuxedo on 16 Jun 2010 07:46 Chris Nehren wrote: [...] > Silly mortal, assuming software adheres to standards. Have you watched > that video yet? :) I watched about half of "Email hatest the Living" on Google, entertaining stuff! I will watch the rest. In case of Thunderbird mbox format, the mbox files normally begin with 'From', at least so it does in other working T-Bird mail files from the same system where the 2.8GB mail file comes from. It appears that T-Bird is using some compression format when an mbox file hits a certain size: https://wiki.mozilla.org/Talk:Thunderbird:2.0_Product_Planning#Auto_compress_folders_after_relative_changes_in_size If I only knew which, I could try and uncompress it. Tuxedo
From: Maxwell Lol on 16 Jun 2010 08:12
John Kelly <jak(a)isp2dial.com> writes: > On Mon, 14 Jun 2010 21:17:26 -0400, Maxwell Lol <nospam(a)com.invalid> > wrote: >>You can even use perl and use something like >> >> @mail = split(/\nFrom /,$mboxfile); > > That will read it into memory all at once, which may cause thrashing > with his 3GB file. In his scenario, better to read and write one line > at a time, and open a new output file every so many messages. Sure. I just wanted to mention this technique, because it's useful at times. > It's easy to shoot yourself in the foot with Perl. Of course. Dealing with 3GB files can be a concern. However, if you only have to do it once, sometimes it's better to let the computer do the work, even if it's not the most elegant solution. There are times when I know it will take (say) 30 seconds longer for a command to complete, but it's easier to do that, than to write a better script (which will take longer than 20 seconds). Mental triage, so to speak. |