From owner-freebsd-stable@FreeBSD.ORG Fri May 22 21:22:58 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0E1441065670; Fri, 22 May 2009 21:22:58 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (thebighonker.lerctr.org [192.147.25.65]) by mx1.freebsd.org (Postfix) with ESMTP id C54CA8FC16; Fri, 22 May 2009 21:22:57 +0000 (UTC) (envelope-from ler@lerctr.org) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=lerami; d=lerctr.org; h=Received:From:To:Cc:References:In-Reply-To:Subject:Date:Message-ID:MIME-Version:Content-Type:Content-Transfer-Encoding:X-Mailer:Thread-Index:Content-Language:X-Spam-Score:X-LERCTR-Spam-Score:X-Spam-Report:X-LERCTR-Spam-Report:DomainKey-Status; b=iNM3R/PK4k+Tz/FbRHpCUBKOLlnpXcHan9BuBYWsGUboFm7vkR8JqSw3n05e+ybcO65InYMzpsVdlPx6D+rSjSiOANyNn4wEwqzbBNWWYXAKG4rscG9DDfYBF6iIZ+KNBnyGB3vpo8pU1D2e2ZQ9adV5pBVIsoU3cHdTAnG7OcY=; Received: from 64.3.1.253.ptr.us.xo.net ([64.3.1.253]:45109 helo=LROSENMAN) by thebighonker.lerctr.org with esmtpa (Exim 4.69 (FreeBSD)) (envelope-from ) id 1M7bu8-000GNL-Bi; Fri, 22 May 2009 16:03:49 -0500 From: "Larry Rosenman" To: "'Joe Karthauser'" , "'Alexander Motin'" References: <3c1674c90905201459k19776d53n309b2abeab0f8d0a@mail.gmail.com> <200905202209.n4KM9Bcg094853@lava.sentex.ca> <3c1674c90905201541n65f997e6jaa20d93bf566fb98@mail.gmail.com> <68BDAD74-021A-4169-B003-21A2BCF2AD5C@transsys.com> <4A156AD7.8000003@icyb.net.ua> <4A159482.9080903@freebsd.org> <3c1674c90905211128n45814519o903ee2b6eb3cf195@mail.gmail.com> <4A16E37C.2030005@freebsd.org> <3c1674c90905221139i6f335062k74d641b7c91c188c@mail.gmail.com> <4A16FF93.2020504@freebsd.org> <4A170EDE.8000906@freebsd.org> In-Reply-To: <4A170EDE.8000906@freebsd.org> Date: Fri, 22 May 2009 16:03:38 -0500 Message-ID: <00d901c9db20$cd1baff0$67530fd0$@org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: AcnbHkbLGpvjH0F3RIiBITPTI95P5wAAlCQQ Content-Language: en-us X-Spam-Score: -2.3 (--) X-LERCTR-Spam-Score: -2.3 (--) X-Spam-Report: SpamScore (-2.3/5.0) ALL_TRUSTED=-1.8, BAYES_00=-2.599, SARE_SUB_OBFU_OTHER=0.135, TVD_RCVD_IP=1.931, TW_ZF=0.077 X-LERCTR-Spam-Report: SpamScore (-2.3/5.0) ALL_TRUSTED=-1.8, BAYES_00=-2.599, SARE_SUB_OBFU_OTHER=0.135, TVD_RCVD_IP=1.931, TW_ZF=0.077 DomainKey-Status: no signature Cc: freebsd-stable@freebsd.org, 'Kip Macy' Subject: RE: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 May 2009 21:22:58 -0000 I saw really strange stuff with one bad SATA cable on my 6 drive ZFS array. It would work most of the time, but the scrub would either cough up CRC's or hang. I wound up replacing the disk *AND* the cable, and it's been fine since. This is on a SuperMicro chassis with Intel chips. YMMV -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 -----Original Message----- From: owner-freebsd-stable@freebsd.org [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Joe Karthauser Sent: Friday, May 22, 2009 3:45 PM To: Alexander Motin Cc: freebsd-stable@freebsd.org; Kip Macy Subject: Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)) This appears to have gone away now. I unplugged the bay that was causing the trouble, and the system booted just fine on the remaining 4 drives. Then I plugged the bay back in (live) and did an atacontrol detach/attach on that bus (I wonder why I always have to do that). The drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to make sure that everything is good, and I'll do a reboot and see if it's all ok after that. Strange, so it looks like a cable might have got a little loose or something. I wonder why that would have hung the kernel probe though. Joe on 22/05/2009 20:40 Joe Karthauser said the following: > Hi Alexander, > > I've love it if you were able to provide some insight into this problem. > > I'm going to try switching sata cables around next to see if the problem > goes away if I disconnect some combination of bays. > > Thanks, > Joe > > on 22/05/2009 19:39 Kip Macy said the following: >> Motin is your best bet in tracking down ATA problems. >> >> Cheers, >> Kip >> >> >> On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser wrote: >>> Hi Kip, >>> >>> I seriously don't understand what has happened. If I boot kernel.old >>> I still >>> get the same problem. Very confusing. :(. >>> >>> Joe >>> >>> on 21/05/2009 19:28 Kip Macy said the following: >>>> I have no idea what is happening. I think our best bet is having >>>> someone with insight into ATA provide us with help in adding >>>> diagnostics. >>>> >>>> Sorry for the trouble. Perhaps you can just roll back to 7.2 for now. >>>> >>>> Cheers, >>>> Kip >>>> >>>> >>>> On Thu, May 21, 2009 at 10:50 AM, Joe Karthauser >>>> wrote: >>>>> Hmm, I've had a bit of a miserable afternoon trying to fight my >>>>> RELENG_7 >>>>> server, which now doesn't boot. :(. >>>>> >>>>> So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 >>>>> disks >>>>> (gmirror on 500Mb partition on each of five disks, and zraid2 over the >>>>> rest >>>>> of each drive). >>>>> >>>>> What I did was to update the userland, and then reboot. I didn't >>>>> upgrade >>>>> the >>>>> kernel (but I've subsequently done that and have the same problem). >>>>> >>>>> What happens is that the kernel hangs booting just after displaying a >>>>> LABEL >>>>> message or ZFS pool/spool message. I _can_ get it to boot if I boot >>>>> single >>>>> user with acpi switched off. When I do that I can manually start >>>>> zfs, and >>>>> mount all the partitions. However, one of the disks is missing.... >>>>> more >>>>> on >>>>> that next. >>>>> >>>>> The machine is running a gigabyte motherboard (domestic gamer P35 >>>>> board, >>>>> similar to this >>>>> >>>>> http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?Produ ctID=2533, >>>>> >>>>> although it might be a DS4 variant). I've got 5 of the 6 sata ports >>>>> wired >>>>> to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 >>>>> 5-1/4" >>>>> bays >>>>> kind of thing). >>>>> >>>>> Now, because of the gmirror I can boot the system on any disk, or >>>>> combination of plugged in disks. I should be able to succeed with the >>>>> kernel probe up to the attempt to mount the root filesystem >>>>> irrespective >>>>> of >>>>> any zfs pool, etc. And, indeed, this has been working fine for >>>>> about two >>>>> years. >>>>> >>>>> But, now it hangs in the same place no matter what disk I boot on >>>>> (I've >>>>> tried every bay). >>>>> >>>>> But, without ACPI enabled it does appear to boot ok... what's going on >>>>> here? >>>>> Is it possible that the machine has developed a hardware fault? >>>>> >>>>> Ok, finally, if I boot with ACPI disabled then one of the disks is >>>>> missing. >>>>> If I unplug it I get a disconnect message from the ata device, and a >>>>> reconnect and reinit attempt when I plug it back in, but no device >>>>> appears >>>>> on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; >>>>> atacontrol >>>>> attach sata4' and the device reappears. This happens on the other >>>>> buses, >>>>> but >>>>> not on the last one. It's not the disk, because if I swap it into >>>>> another >>>>> bay, it comes up and appears on the bus. On the other hand it doesn't >>>>> appear >>>>> to be that controller or slow in the drive bay because if I unplug all >>>>> the >>>>> over disks the system will boot that disk and get as far as the >>>>> hang.... >>>>> hmm. >>>>> >>>>> Is this a consequence of disabling the ACPI? >>>>> >>>>> Does anyone have a clue what might be going on? >>>>> >>>>> Joe _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"