From owner-freebsd-current@FreeBSD.ORG Tue Nov 20 13:11:44 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B312616A417 for ; Tue, 20 Nov 2007 13:11:44 +0000 (UTC) (envelope-from askbill@conducive.net) Received: from conducive.net (conducive.org [203.194.153.81]) by mx1.freebsd.org (Postfix) with ESMTP id 85A3F13C467 for ; Tue, 20 Nov 2007 13:11:44 +0000 (UTC) (envelope-from askbill@conducive.net) Received: from cm218-253-81-177.hkcable.com.hk ([218.253.81.177]:59457 helo=pb.local) by conducive.net with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.63 (FreeBSD)) (envelope-from ) id 1IuSt7-000AFe-PJ for freebsd-current@freebsd.org; Tue, 20 Nov 2007 13:11:37 +0000 Message-ID: <4742DD08.4060306@conducive.net> Date: Tue, 20 Nov 2007 13:11:36 +0000 From: =?UTF-8?B?6Z+T5a625qiZIEJpbGwgSGFja2Vy?= User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1.2) Gecko/20070221 SeaMonkey/1.1.1 MIME-Version: 1.0 To: freebsd-current@freebsd.org References: <20071120025930.ECDA216A419@hub.freebsd.org> <4742A68A.406@fusiongol.com> In-Reply-To: <4742A68A.406@fusiongol.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: SATA controller testing update X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Nov 2007 13:11:44 -0000 Nathan Butcher wrote: > Hi, > > Just posting to say that Soren's upcoming patch to fix the Promise SATA > controller issue seems to work fine for me. I imported and ran 4 drives > of ZFS on the controller - and no more checksum issues. Tried running > bonnie++ and had no probelms whatsoever. So far so good. I'll keep my > ZFS pool on the card for a while in case anything pops up. > > What I have noticed though, is that occasionally my root mounted system > drive (which is on my JMB363 controller running AHCI, as /dev/ad6) > occasionally gets randomly dismounted with the latest BETA3. It has > happened twice now. > This issue hasn't happened on any other drive on my system (despite > there being 11 other drives on other controllers). > > I have no idea how to reproduce this issue, and since it takes out my > main system drive, I can't get any debugging info. All I see first is > that the drive gets dismounted before the screen fills up and scrolls > over with messages about missing nodes. > I may have a way to reproduce at least a vaguely similar fault that we've just started looking at, but on ICH9, not Promise: GigaByte GA G33-DS3R Core-2 Quad, 2 GB DDR-800, 2 X Toshiba 160 GB 2.5" SATA on IHC9 as GMIRROR RAID1 'split' gm0 taking in the entire device (ad0 and ad2). With gm0 in good shape, a cp of a Qemu .img file - from /dev/mirror/gm0s3d ufs /pub - to /dev/mirror/gm0s3e ufs /bak/backups is rock-solid. But an inadvertant 'mv' (technically illegal, as it crosses a mount-point) doesn't throw an error message. Instead, it unaccountably causes GEOM to shed /dev/ad2 'instantly' from gm0. Several hoops must be jumped thru to get it back, as /dev/ad2 thereafter reports as 'not attached'. In addition to the usual GMIRROR commands to clean house and set up for a rebuild, I've had to set sysctl kern.geom.debugflags = 16, then do a newfs of the whole ad2 device, wiping out disklabel et al, then do what gmirror needs, re-insert, and let it rebuild. Which it does just fine. I'm about to set up a 'better instrumented' test box to get more specifics, so nothing further here yet. I report it only because there is SO little logging that first impression is that the trigger incident is below the GEOM layer. This is with 7-BETA1 i386 of 20 October, testing to be started with 7-BETA3 of last night, but I've got to buy a couple of similar drives first. Both the initially reporting MB and the test board have IHC9 *and* JMB363, so will try to reproduce on each controller with 7-BETA3 and 8- before looking at patches. More info as I get it. Bill