Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 May 2009 16:03:38 -0500
From:      "Larry Rosenman" <ler@lerctr.org>
To:        "'Joe Karthauser'" <joe@freebsd.org>, "'Alexander Motin'" <mav@freebsd.org>
Cc:        freebsd-stable@freebsd.org, 'Kip Macy' <kmacy@freebsd.org>
Subject:   RE: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC 	heads up))
Message-ID:  <00d901c9db20$cd1baff0$67530fd0$@org>
In-Reply-To: <4A170EDE.8000906@freebsd.org>
References:  <3c1674c90905201459k19776d53n309b2abeab0f8d0a@mail.gmail.com>		<200905202209.n4KM9Bcg094853@lava.sentex.ca>		<3c1674c90905201541n65f997e6jaa20d93bf566fb98@mail.gmail.com>		<68BDAD74-021A-4169-B003-21A2BCF2AD5C@transsys.com>		<4A156AD7.8000003@icyb.net.ua>	<4A159482.9080903@freebsd.org>		<3c1674c90905211128n45814519o903ee2b6eb3cf195@mail.gmail.com>		<4A16E37C.2030005@freebsd.org>	<3c1674c90905221139i6f335062k74d641b7c91c188c@mail.gmail.com>	<4A16FF93.2020504@freebsd.org> <4A170EDE.8000906@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

I saw really strange stuff with one bad SATA cable on my 6 drive ZFS array.
It would work most of the time, but
the scrub would either cough up CRC's or hang.

I wound up replacing the disk *AND* the cable, and it's been fine since. 

This is on a SuperMicro chassis with Intel chips.

YMMV
-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 512-248-2683                E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893

-----Original Message-----
From: owner-freebsd-stable@freebsd.org
[mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Joe Karthauser
Sent: Friday, May 22, 2009 3:45 PM
To: Alexander Motin
Cc: freebsd-stable@freebsd.org; Kip Macy
Subject: Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at
kernel boot now, but didn't before... (Re: ZFS MFC heads up))

This appears to have gone away now. I unplugged the bay that was causing 
the trouble, and the system booted just fine on the remaining 4 drives. 
Then I plugged the bay back in (live) and did an atacontrol 
detach/attach on that bus (I wonder why I always have to do that). The 
drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to 
make sure that everything is good, and I'll do a reboot and see if it's 
all ok after that.

Strange, so it looks like a cable might have got a little loose or 
something. I wonder why that would have hung the kernel probe though.

Joe

on 22/05/2009 20:40 Joe Karthauser said the following:
> Hi Alexander,
>
> I've love it if you were able to provide some insight into this problem.
>
> I'm going to try switching sata cables around next to see if the problem
> goes away if I disconnect some combination of bays.
>
> Thanks,
> Joe
>
> on 22/05/2009 19:39 Kip Macy said the following:
>> Motin is your best bet in tracking down ATA problems.
>>
>> Cheers,
>> Kip
>>
>>
>> On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser<joe@freebsd.org> wrote:
>>> Hi Kip,
>>>
>>> I seriously don't understand what has happened. If I boot kernel.old
>>> I still
>>> get the same problem. Very confusing. :(.
>>>
>>> Joe
>>>
>>> on 21/05/2009 19:28 Kip Macy said the following:
>>>> I have no idea what is happening. I think our best bet is having
>>>> someone with insight into ATA provide us with help in adding
>>>> diagnostics.
>>>>
>>>> Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.
>>>>
>>>> Cheers,
>>>> Kip
>>>>
>>>>
>>>> On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser<joe@freebsd.org>
>>>> wrote:
>>>>> Hmm, I've had a bit of a miserable afternoon trying to fight my
>>>>> RELENG_7
>>>>> server, which now doesn't boot. :(.
>>>>>
>>>>> So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
>>>>> disks
>>>>> (gmirror on 500Mb partition on each of five disks, and zraid2 over the
>>>>> rest
>>>>> of each drive).
>>>>>
>>>>> What I did was to update the userland, and then reboot. I didn't
>>>>> upgrade
>>>>> the
>>>>> kernel (but I've subsequently done that and have the same problem).
>>>>>
>>>>> What happens is that the kernel hangs booting just after displaying a
>>>>> LABEL
>>>>> message or ZFS pool/spool message. I _can_ get it to boot if I boot
>>>>> single
>>>>> user with acpi switched off. When I do that I can manually start
>>>>> zfs, and
>>>>> mount all the partitions. However, one of the disks is missing....
>>>>> more
>>>>> on
>>>>> that next.
>>>>>
>>>>> The machine is running a gigabyte motherboard (domestic gamer P35
>>>>> board,
>>>>> similar to this
>>>>>
>>>>>
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?Produ
ctID=2533,
>>>>>
>>>>> although it might be a DS4 variant). I've got 5 of the 6 sata ports
>>>>> wired
>>>>> to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
>>>>> 5-1/4"
>>>>> bays
>>>>> kind of thing).
>>>>>
>>>>> Now, because of the gmirror I can boot the system on any disk, or
>>>>> combination of plugged in disks. I should be able to succeed with the
>>>>> kernel probe up to the attempt to mount the root filesystem
>>>>> irrespective
>>>>> of
>>>>> any zfs pool, etc. And, indeed, this has been working fine for
>>>>> about two
>>>>> years.
>>>>>
>>>>> But, now it hangs in the same place no matter what disk I boot on
>>>>> (I've
>>>>> tried every bay).
>>>>>
>>>>> But, without ACPI enabled it does appear to boot ok... what's going on
>>>>> here?
>>>>> Is it possible that the machine has developed a hardware fault?
>>>>>
>>>>> Ok, finally, if I boot with ACPI disabled then one of the disks is
>>>>> missing.
>>>>> If I unplug it I get a disconnect message from the ata device, and a
>>>>> reconnect and reinit attempt when I plug it back in, but no device
>>>>> appears
>>>>> on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
>>>>> atacontrol
>>>>> attach sata4' and the device reappears. This happens on the other
>>>>> buses,
>>>>> but
>>>>> not on the last one. It's not the disk, because if I swap it into
>>>>> another
>>>>> bay, it comes up and appears on the bus. On the other hand it doesn't
>>>>> appear
>>>>> to be that controller or slow in the drive bay because if I unplug all
>>>>> the
>>>>> over disks the system will boot that disk and get as far as the
>>>>> hang....
>>>>> hmm.
>>>>>
>>>>> Is this a consequence of disabling the ACPI?
>>>>>
>>>>> Does anyone have a clue what might be going on?
>>>>>
>>>>> Joe

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00d901c9db20$cd1baff0$67530fd0$>