From: lurch on
On Wed, 16 Dec 2009 01:06:47 -0800 (PST), Vlad_Inhaler
<andrew.williams(a)t-online.de> wrote:

> Having to pretty much re-
>install after /usr gets corrupted changes your priorities.


Whatever it is that you do that causes a system level failure mode that
causes a hard drive write operation failure.

THAT is what you need to get a handle on.

Your file volumes should not be getting corrupted, regardless of what
their names are. I would also check the drive itself for physical flaws.
From: lurch on
On Wed, 16 Dec 2009 23:04:39 +0000 (UTC), Paul J Gans <gansno(a)panix.com>
wrote:

>Rob <nomail(a)example.com> wrote:
>>Paul J Gans <gansno(a)panix.com> wrote:
>>> Rob <nomail(a)example.com> wrote:
>>>>Vlad_Inhaler <andrew.williams(a)t-online.de> wrote:
>>>>> I abandoned ReiserFS years ago when I hit a (SuSE) level which needed
>>>>> around 20-30 seconds to mount each ReiserFS partition - and I had
>>>>> around 6 of them, so I can't really check this but . . .
>>>
>>>>Of course this time is dwarfed by the time it will take to run
>>>>fsck.ext3 on your 6 disks formatted in ext3, when you boot the system
>>>>and it decides that too much time has gone between checks.
>>>
>>> Off the topic but still: the refresh time for file system checks
>>> can be set manually. I'd set them to numbers that are relatively
>>> prime to each other so that you'd never have two of them being
>>> checked at the same boot.
>
>>My system is normally booted only once or twice a year. When it is,
>>I have to wait about two hours before all the fscks are finished.
>
>>Stupidly, it does not run them in parallel. I can understand why one
>>would serialize the checks of different partitions that are on the
>>same drive, but serializing the checks on the drives is ridiculous.
>
>I agree. I think that the entire strategy here needs to be
>rethought. And I think that has been done to an extent in 11.2.
>I've not used it enough to be sure.
>
>I do wish that there was some documentation on all of this. Microsoft
>refugees don't have to read it, but it would be good to have it around
>for the other folks. Saying "read the code" isn't good enough.


Has anyone tried to see if parallel operation is possible from the
command prompt console? If it is, it would seem that fsck need some code
revamping.

Also, there is no reason why it would not be able to do it in parallel
on multiple volumes on the same physical drive either, since each task is
a single operation. The only reason not to do it would be due to head
traverse times, it would take longer than a one-volume-at-a-time mode
would.

Nobody cares about head traverse when other disc intensive apps run. Why
worry about it here? Because it taxes the drive more, potentially
reducing lifespan.
From: Vlad_Inhaler on
On Dec 17, 2:05 am, lurch <lu...(a)yourangcousinitslibrary.org> wrote:
> On Wed, 16 Dec 2009 01:06:47 -0800 (PST), Vlad_Inhaler
>
> <andrew.willi...(a)t-online.de> wrote:
> > Having to pretty much re-
> >install after /usr gets corrupted changes your priorities.
>
>  Whatever it is that you do that causes a system level failure mode that
> causes a hard drive write operation failure.
>
>   THAT is what you need to get a handle on.
>
>   Your file volumes should not be getting corrupted, regardless of what
> their names are.  I would also check the drive itself for physical flaws.

1 x power-outage
n x auto-reboot while halfway through booting (Samsung Laptop, mobo
replacement fixed it)
n x system-freeze in conjunction with nfs usage (temperature sensors
showing 28C - 35C range, 'tail -f' active at time of hang shows
*nothing* ). The corruption was when major upheavals (SW updates,
directory copying) were taking place at the time.

I have a fix for the last one as well, stop using Linux. The machine
is dual-boot and it has not hung up once under XP.
Maybe I should use Samba instead of NFS
From: Rob on
lurch <lurch(a)yourangcousinitslibrary.org> wrote:
> Has anyone tried to see if parallel operation is possible from the
> command prompt console? If it is, it would seem that fsck need some code
> revamping.

The problem is not in fsck, it is in the startup file that fires
off the fscks for the different drives.

A complication is that fsck might ask for input. When multiple instances
run in parallel it may be unclear what way your input should go.

That can be solved of course.
From: Stephen Horne on
On Wed, 16 Dec 2009 17:03:00 -0800, lurch
<lurch(a)yourangcousinitslibrary.org> wrote:

>On Wed, 16 Dec 2009 03:12:34 +0000 (UTC), Paul J Gans <gansno(a)panix.com>
>wrote:
>
>> I'd set them to numbers that are relatively
>>prime to each other
>
> 'relatively'? 'prime to each other'?
>
> Makes no sense. Zero, in fact. And it wouldn't work either.

"relatively prime" = "coprime"

http://en.wikipedia.org/wiki/Coprime

Wouldn't prevent checks from occurring together forever - in fact it
guarantees they *all* occur together once for the product of the
individual periods - but it does keep that occuring-together frequency
small.

Better would be for each to have the same period, but with a different
start point, so that the individual checks are spaced out within that
period. Just don't ask me how.