From: unruh on
On 2010-04-28, Rahul <nospam(a)nospam.invalid> wrote:
> unruh <unruh(a)wormhole.physics.ubc.ca> wrote in
> news:slrnhtgmis.biq.unruh(a)wormhole.physics.ubc.ca:
>
> Thanks unruh. My bad. I should explain more.
>>
>> Sorry, you talk about "this disk" and then you talk about raid, which
>> implies more than one physical disk. You talk about "raid-bios" but I
>> do not know wthat that means.
>
> There are 3 physical disks in the machine. The server has been running
> for a long time and when it was running I used to see only one disk from
> within Linux.

OK.

>
> The bootup process is confusing. There is a std. machine BIOS. But there
> is also a "Adaptec SCSI BIOS". Plus a "LSI MegaRAID Controller Utility".

? Where is this "utility"? Is it on the disk?
>
> Does the last one imply this is a "software RAID?" I am not sure.

Not sure. Is the Adaptec card a raid card (in which case it is probably
"hardware raid) or if it is just an ordinary scsi card, then it is
probably some sort of software raid.

>
> I did go into the MegaRAID Utility and it shows 3 physical disks and one
> logical disk using a RAID5.
>
>>Is this the abortion known as software
>> raid contained on many motherboards, which is, as far as I can tell, a
>> useless piece of junk.
>
> I am not sure. How does one tell. I totally agree with you: I'd never do
> software RAID myself. But this machine is pretty old and not designed by
> me unfortunately.

No, the abortion is that mother board manufacturers claim that they
support raid, but it turns out to be a software raid which requires
drivers etc. Far better off just using the software raid within Linux.
It seems to work well (I have a copule of disks with raid0, and the
throughput to the disks really is about twice the throughput to a single
disk in speed.) I needed it to have the disk be able to keep up with
gigabit ethernet data input.


>
>> You will probably need at least one partition
>> on one of your disks that is NOT raid. and use that to contain the
>> /boot directory, so your bios can find the absolute sectors on that
>> disk and download the second stage grub loader.
>
> The plus point is that this was a perfectly running machine till just a
> few days ago. So I guess whatever setup it had was bootable. Its just
> that something's broken down or was accidentally changed and I cannot
> figure out what!
>
>> Remember that on bootup, the ONLY thing available is the disk access
>> software in the bios. It has access to no drivers or any other
>> software. It can only load absolute disk sectors into memory to run
>> them. If the boot loader is not at some absolute disk address on one
>> disk ( and if you use striped raid, it is not) the bios disk reader
>> cannot do anything.
>
> What you say does make sense. But the machine was booting up so far so I
> am thinking there has to be a way to fix it. Just not sure how.
>

Yeah. I am afraid it will have to be one of the other experts here, not
me. Sorry.

>
From: unruh on
On 2010-04-28, Robert Heller <heller(a)deepsoft.com> wrote:
> At Wed, 28 Apr 2010 18:43:21 +0000 (UTC) Rahul <nospam(a)nospam.invalid> wrote:
>
>>
>> Robert Heller <heller(a)deepsoft.com> wrote in
>> news:2O2dnbs2uokx40XWnZ2dnUVZ_tudnZ2d(a)posted.localnet:
>>
>> Thanks Robert!
>>
>>
>> > *Exactly* what sort of disks are on this internal MegaRAID controller?
>>
>> SCSI disks for sure. I opened the box.
>>
>> > Do you know *exactly* what make/model of MegaRAID controller it is?
>> > Its start up screen *should* tell you something. If nothing else,
>> > crack open the case and have a look at the drives & controller board.
>>
>> Sure. I'll find that out and post. I have a strong suspicion this has to do
>> with the drivers. Somehow my rescue disk doesn't have the drivers needed to
>> access the MegaRAID.
>
> That seems odd. The CentOS 5.4 kernel (2.6.18-164.15.1.el5) includes
> megaraid.ko and megaraid/megaraid_mbox.ko in its SCSI drivers, which
> should support this -- I looked at the modules.alias file and
> /usr/share/hwdata/pci.ids -- a large number of controllers seem to be
> covered. What kernel / distro rescue disk are you using? Maybe you
> need a better rescue disk...

The problem "is" that grub seems to be dying before the kernel is
loaded, and perhaps before anything is loaded. Thus the existence of the
modules cannot be used. (I assume) It is not clear to me exactly when in
the boot process grub dies.


>
From: Rahul on
TomT <TomT(a)UnrealBox.invalid> wrote in
news:to4ht5de837u69h83go5fhkbb5rnukv9o1(a)4ax.com:

> GRUB is obviously installed. You may want to reinstalled it but first,
> just for laughs, after the cursor, type "find /boot/grub/stage1"
> (without the quotes of course) and see if it returns anything. If it
> does we'll go from there. If not, you're no worse off.
>

If I do:

grub>find /grub/stage1

It does work. I used the root and setup commands pointing to the location
returned. Still no luck.

--
Rahul
From: TomT on
Rahul <nospam(a)nospam.invalid> wrote:

>TomT <TomT(a)UnrealBox.invalid> wrote in
>news:to4ht5de837u69h83go5fhkbb5rnukv9o1(a)4ax.com:
>
>> GRUB is obviously installed. You may want to reinstalled it but first,
>> just for laughs, after the cursor, type "find /boot/grub/stage1"
>> (without the quotes of course) and see if it returns anything. If it
>> does we'll go from there. If not, you're no worse off.
>>
>
>If I do:
>
>grub>find /grub/stage1
>
>It does work. I used the root and setup commands pointing to the location
>returned. Still no luck.

Then I can only suggest you reinstall GRUB.

I think Robert Heller is on to something when he says your devices may
have been remapped. I'd be looking for tools like gparted or cfdisk to
tell me what my disks and partitions are named. It's hard to work if
you don't know what you're working with.

It seems like your RAID problem messed GRUB up as both J G Miller and
Robert Heller have said.

Good luck.

TomT
From: Robert Heller on
At Wed, 28 Apr 2010 17:55:59 -0600 TomT <TomT(a)UnrealBox.invalid> wrote:

>
> Rahul <nospam(a)nospam.invalid> wrote:
>
> >TomT <TomT(a)UnrealBox.invalid> wrote in
> >news:to4ht5de837u69h83go5fhkbb5rnukv9o1(a)4ax.com:
> >
> >> GRUB is obviously installed. You may want to reinstalled it but first,
> >> just for laughs, after the cursor, type "find /boot/grub/stage1"
> >> (without the quotes of course) and see if it returns anything. If it
> >> does we'll go from there. If not, you're no worse off.
> >>
> >
> >If I do:
> >
> >grub>find /grub/stage1
> >
> >It does work. I used the root and setup commands pointing to the location
> >returned. Still no luck.
>
> Then I can only suggest you reinstall GRUB.
>
> I think Robert Heller is on to something when he says your devices may
> have been remapped. I'd be looking for tools like gparted or cfdisk to
> tell me what my disks and partitions are named. It's hard to work if
> you don't know what you're working with.
>
> It seems like your RAID problem messed GRUB up as both J G Miller and
> Robert Heller have said.

Right. From what little Rahul has told us, I'm guessing that his
machine has two disk controllers:

1) Some flavor of [PCI] Adaptec SCSI controller (probably some sort of
2940-flavor card) with some sort of self-contained SCSI-connected
RAID unit (eg a small-scale box like a DEC HZ70)
Since he mentioned the Adaptec SCSI *first*, I *suspect* that the
external RAID box originally showed up as /dev/sda
2) Some sort of [LSI] MegaRAID PCI SCSI RAID controller with three
internal SCSI disks, that contain some flavor of RHEL or CentOS, with
two partitions, originall /dev/sdb1 (mounted as /boot) and /dev/sdb2,
which is a LVM volume group. For some idiot reason, the O/S was
installed on the *second* logical disk, or else the system somehow
became the second logical disk (maybe the Adaptec SCSI controller +
external RAID box was added later).

The BIOS was configured to boot off of /dev/sdb (originally or at some
point). Then the external RAID lost a disk (which was replaced and the
RAID array rebuilt off-line somehow). Now with /dev/sda (the external
RAID array) not connected the BIOS is trying to boot from what used to
be /dev/sdb, but is now /dev/sda. Grub at this point is using the BIOS
parameters for /dev/sdb though or at least the piece of grub residing
in the MBR of the MegaRAID logical disk that used to be /dev/sdb, but
is not /dev/sda.

I suspect that the /etc/fstab on the root file system wants to mount
/dev/sdb1 as /boot when it reall ought be to mounting /dev/sda1 as
/boot. Rahul needs to do the following:

1) fire up his CentOS 5.4 rescue disk
2) use e2label to give /dev/sda1 a label (e2label /dev/sda1 boot)
3) change /dev/sdb1 to LABEL=boot in the /etc/fstab file.
4) make sure /boot is properly mounted
5) chroot /mnt/sysimage
6) grub-install /dev/sda
7) exit
8) exit

When the system has rebooted properly, shut it down (shutdown -h now)

Open the case and swap the positions of the two PCI cards (the MegaRAID
and the Adaptec card).

Fire up the machine to make sure things are working properly.

Shut the machine down and re-connect the external RAID system and start
the machine up. The external box should now show up as /dev/sdb.

I guessing whoever originally set this box up was some flavor of idiot.
Yes, you don't *have* to install Linux on the *first* logical disk, but
PC BIOS/Boot loaders work better if you do, especially when you have
disks that can possibly be removed from the system easily. Especially
if they are SCSI disks, since SCSI disks have a 'fluid' device naming...

>
> Good luck.
>
> TomT
>

--
Robert Heller -- Get the Deepwoods Software FireFox Toolbar!
Deepwoods Software -- Linux Installation and Administration
http://www.deepsoft.com/ -- Web Hosting, with CGI and Database
heller(a)deepsoft.com -- Contract Programming: C/C++, Tcl/Tk