Date: Tue, 28 Nov 2006 17:39:29 +0100 From: Palle Girgensohn <girgen@FreeBSD.org> To: gayn.winters@bristolsystems.com, hardware@freebsd.org Subject: RE: no file system after replacing bad RAID drive Message-ID: <E03FB902373D6F28723EDF19@rambutan.pingpong.net> In-Reply-To: <04dc01c71308$ecc9e930$6501a8c0@workdog> References: <04dc01c71308$ecc9e930$6501a8c0@workdog>
next in thread | previous in thread | raw e-mail | index | archive | help
--On tisdag, november 28, 2006 08.18.59 -0800 Gayn Winters <gayn.winters@bristolsystems.com> wrote: >> -----Original Message----- >> From: Palle Girgensohn [mailto:girgen@FreeBSD.org] >> Sent: Tuesday, November 28, 2006 7:38 AM >> To: gayn.winters@bristolsystems.com; hardware@freebsd.org >> Subject: RE: no file system after replacing bad RAID drive >> >> >> >> >> --On tisdag, november 28, 2006 07.16.47 -0800 Gayn Winters >> <gayn.winters@bristolsystems.com> wrote: >> >> >> -----Original Message----- >> >> >> >> Hi! >> >> >> >> We just got Dell to replace a bad drive in a RAID5 cluster. >> >> After reboot, >> >> megarc says their all online, but FreeBSD cannot find a file >> >> system on the >> >> logical drive. It worked before replacing (in degraded mode). >> >> >> >> It is a Dell 2850 with a Perc4/I (really a LSILogic MegaRAID) >> >> controller >> >> running FreeBSD 6.0 >> >> >> >> Any ideas how to get the file system back? bsdlabel finds no >> >> label, and >> >> since the OS does not find the label during startup, there >> >> are no devices. >> >> I tried some megarc and camcontrol commands, but nothing >> >> seems to help. >> >> >> >> Any ideas appreciated. Thanks, >> >> Palle >> > >> > I have not been able to get my PERC4 to rebuild while >> online. Have you >> > tried stopping the boot (control M as I recall) to get into >> the PERC4 >> > firmware and rebuilding the RAID from there? My rebuild >> (3x150GB RAID5) >> > takes about 14 hours - totally offline. >> >> Well, before replacing the disk, the file system worked OK >> (only degraded, >> no redundancy). The Dell guy replaced the disk while the >> system was shut >> off, Dell thinks that may have something to do with, but it >> sounds strange >> to me. When disk was inserted, it seemed like it was >> rebuilding, since all >> disks in the cluster where flashing vividely. >> >> I tried a rebuild with the system on line, by first setting >> the new disk >> off line and then >> >> # megarc -physoff -a0 pd'['0:3']' >> # megarc -doRbld -a0 -RbldArray'[0:3]' -ShowProg >> >> it took a couple of hours, maybe four, but not 14. Perhaps it >> is not OK, I >> dunno. >> >> >> My problem is that it cannot find a bsd label on the disk, >> and hence no >> devices are created: >> >> $ ls -l /dev/amrd1* >> crw-r----- 1 root operator 0, 66 Nov 28 10:09 /dev/amrd1 >> $ >> >> fdisk looks OK: >> >> # fdisk amrd1 >> ******* Working on device /dev/amrd1 ******* >> parameters extracted from in-core disklabel are: >> cylinders=35669 heads=255 sectors/track=63 (16065 blks/cyl) >> >> Figures below won't work with BIOS for partitions not in cyl 1 >> parameters to be used for BIOS calculations are: >> cylinders=35669 heads=255 sectors/track=63 (16065 blks/cyl) >> >> fdisk: invalid fdisk partition table found >> Media sector size is 512 >> Warning: BIOS sector numbering starts with sector 1 >> Information from DOS bootblock is: >> The data for partition 1 is: >> sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) >> start 63, size 573022422 (279796 Meg), flag 80 (active) >> beg: cyl 0/ head 1/ sector 1; >> end: cyl 852/ head 254/ sector 63 >> The data for partition 2 is: >> <UNUSED> >> The data for partition 3 is: >> <UNUSED> >> The data for partition 4 is: >> <UNUSED> >> >> but I cannot look for a bsdlabel, since there is no slice? >> There used to be >> a /dev/amrd1s1d that was one single slice for the entire >> disk. It should be >> possible to reproduce it, but how and what happens if I do this? >> >> If I enter sysinstall, it is all empty, no slice and >> naturally label, so I >> need to add a partition slice in the fdisk submenu, and then >> I can create a >> bsd label. Problem is, I want to data back... :-/ >> >> /Palle >> > > Hi Palle, > > You might be missing my key point. What you are tying to do is to use > some OS tool. In my experience with Dell's PERC4 cards (which are not > exactly LSI Logic cards - in fact LSI Logic will not support these cards > and advises to only use Dell firmware with them) you have to get into > the PERC4 firmware setup to do a rebuild. Watch carefully what is > displayed on the screen when you reboot your system. You need to get > into the PERC4 firmware BEFORE the opportunity to get into the BIOS > setup (because the last thing the BIOS does is call INT 21 to run the > OS.) Watch the screen. I think the key combination to get into the > PERC4 firmware is Control M, but it may be Alt M. I can't remember for > sure and I don't feel like rebooting the server of mine that uses the > PERC4. It says on the screen as the system boots. As usual you have to > be pretty quick pressing the keys to get into this setup. > > Once you are into the PERC4 firmware, you will get a nice display of > your physical drives. Most likely, it will still say "degraded", and it > will indicate which physical drive is causing the problem. (This better > be the drive you replaced, or you replaced the wrong drive!) You can > rebuild the new drive from here. This rebuild is before the BIOS does > its setup thing. Don't let the system try to boot any OS - not even > from a recovery disk, because then you have missed the PERC4 firmware > setup! > > In my experience, the time it takes to do a rebuild increases as your > (or my) drives fill up. This makes sense because the redundancy > calculation is trivial if all the blocks on the other n-1 drives are > zeroes. My drives are getting rather full. During the rebuild, you > will get a percent complete display. > > Good luck! > > -gayn > > Bristol Systems Inc. > 714/532-6776 > www.bristolsystems.com OK, I guess I'll have to try this. The megarc (check ports, sysutisl/megarc) is a utility from LSI, but as you say, it does run from the OS. The odd thing is, even with the logical drive in a degrade state, it should really work, since I have two out of three disk in a RAID 5 cluster. It worked fine before I replaced the bad drive. Now, it doesn't work. Odd. :( /Palle Thanks for your help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E03FB902373D6F28723EDF19>