Prev: xorg.conf for nvidia binary drivers interfering with xorg.conf for vnc4server
Next: How can I set up a mobile broadband connection in squeeze without another connection already established?
From: Lukas Kolbe on 1 Jun 2010 08:00 I reported this as a bug to bugzilla.kernel.org as #16081, but since we're running Debian I thought asking around for help and discussion would be advisable. The current situation is: We have a Supermicro XDWN7+ board (Intel 5400, Xeon 5420 CPU, 8GB Ram) with 24 1TB SATA disks attached to an Adaptec 52445 controller and a Tandberg Tape-library attached to LSI SAS1068E SAS controller. The system runs on Lenny with a recompiled (no options changed) Linux 2.6.32-12. We use bacula 5.0.2 as our backup software (backported to lenny) and so far it works quite well. The only problem is: After writing around 10TiB of data to the disks, the machine crashes. This happened two times, and after the second time both filesystems containing the backup-diskpool (9TiB LVM-Volumes with ext4 filesystems) were completely garbled. One fs now looks like this: shepherd:~# ls -la /mnt/lost+found/ | head -n 20 total 216936 drwx------ 250 root root 69632 2010-05-31 13:10 . drwxr-xr-x 3 root root 4096 2010-05-31 13:10 .. c----wxr-- 1 774037444 162299347 237, 210 1957-02-23 13:50 #1000 brwx-----T 1 1954511736 3121970260 249, 121 1922-08-12 15:08 #10021 b-w---xrwt 1 543753214 3130053982 234, 213 2012-06-01 07:58 #10027 c--S--sr-T 1 3871079531 3443641576 2, 232 2036-01-31 13:12 #10036 -r-S-w-r-T 1 2298731406 344458386 32768 2035-05-22 08:46 #10046 brw---Srw- 1 2052225653 4012639896 218, 196 1912-06-23 18:14 #10067 prwS-wSr-x 1 2235883341 1302567651 0 1927-10-10 00:51 #10086 s-wS--x-wt 1 2286828425 2999490124 0 1949-08-22 22:50 #10109 crw--wSrwt 1 3083778288 3882824206 148, 212 2003-07-28 08:32 #10126 s-wS--sr-x 1 874900871 80451928 0 1977-11-28 01:52 #10130 s--sr-x--- 1 1903432768 1059722 0 2013-07-05 00:55 #10131 c-w-r-Sr-T 1 3259732952 2590389953 9, 22 2012-06-19 14:56 #10147 pr-x-w--wt 1 1627318825 1016384218 0 1956-12-27 06:01 #10160 srw-r-SrwT 1 2603486838 3240878817 0 1954-11-16 08:43 #10177 srw---srwt 1 458009213 951782573 0 2023-12-03 18:43 #10184 brwxr--rwx 1 2423698452 2252742920 44, 231 1956-07-25 07:28 #10197 brwS-wS-w- 1 3480615060 1244965598 44, 189 2006-10-21 17:03 #1020 The other one is not mountable anymore: [88397.252831] EXT4-fs (dm-1): ext4_check_descriptors: Checksum for group 1 failed (49189!=48621) [88397.252856] EXT4-fs (dm-1): group descriptors corrupted! One thing to note is that using Supermicros current BIOS 1.2b for this board, the machine crashes after a fair amount of network and disk-io (around 2-5TiB I believe) with an MCE. This does not happen with their BIOS version 1.1b which is installed at the moment. I'm at a loss here, as I really don't know what's causing these crashes and also don't really know how I can debug this any further. Does anybody have any hints for me? memtest86 runs fine for hours, by the way, and the machine doesn't have heat problems (at least the IPMI-console doesn't say so, and the fans are all fine). More info on the system can be found at https://bugzilla.kernel.org/show_bug.cgi?id=16081 (lspci/lsscsi) Thanks, Lukas -- To UNSUBSCRIBE, email to debian-user-REQUEST(a)lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster(a)lists.debian.org Archive: http://lists.debian.org/1275392420.16058.18.camel(a)larosa.fritz.box |