From owner-freebsd-stable@FreeBSD.ORG Thu May 10 23:15:15 2007 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 90E7116A400 for ; Thu, 10 May 2007 23:15:15 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 34C7313C46A for ; Thu, 10 May 2007 23:15:15 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from phobos.samsco.home (phobos.samsco.home [192.168.254.11]) (authenticated bits=0) by pooker.samsco.org (8.13.8/8.13.8) with ESMTP id l4ANE6hM021040; Thu, 10 May 2007 17:14:06 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <4643A739.3080601@samsco.org> Date: Thu, 10 May 2007 17:14:01 -0600 From: Scott Long User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.2pre) Gecko/20070111 SeaMonkey/1.1 MIME-Version: 1.0 To: David Wolfskill , stable@freebsd.org References: <20070510200211.GM64542@bunrab.catwhisker.org> In-Reply-To: <20070510200211.GM64542@bunrab.catwhisker.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (pooker.samsco.org [168.103.85.57]); Thu, 10 May 2007 17:14:06 -0600 (MDT) X-Spam-Status: No, score=-1.4 required=5.5 tests=ALL_TRUSTED autolearn=failed version=3.1.8 X-Spam-Checker-Version: SpamAssassin 3.1.8 (2007-02-13) on pooker.samsco.org Cc: Subject: Re: 6.2-R on Dell Poweredge 2950 with Dell PERC 5/i [mfi(4)] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 May 2007 23:15:15 -0000 David Wolfskill wrote: > From a quick look in the lists, I get the impression that the Dell PERC > 5/i may be a bit problematic. Since I hadn't any plans on using that > hardware, though, I've paid more attention to other things. > Not sure that this impression is entirely accurate. The biggest problem with MFI machines is online RAID management. The storage driver itself matured very quickly and has been very reliable. > Well, now a colleague is trying to run 6.2-R on one of these 2950s; dmesg > says the controller is: > > mfi0: mem 0xd80f0000-0xd80fffff,0xfc4e0000-0xfc4fffff irq 78 at device 14.0 on pci2 > mfi0: 817 (224963336s/0x0020/0) - Shutdown command received from host > mfi0: 818 (4278190080s/0x0020/0) - PCI 0x041028 0x0415 0x041028 0x041f03: Firmware initialization started (PCI ID 0015/1028/1f03/1028) > mfi0: 819 (4278190080s/0x0020/0) - Type 18: Firmware version 1.00.02-0157 > mfi0: 820 (4278190096s/0x0008/0) - Battery Present > mfi0: 821 (4278190124s/0x0004/0) - PD 08(e1/s255) event: Enclosure (SES) discovered on PD 08(e1/s255) > mfi0: 822 (4278190124s/0x0002/0) - PD 08(e1/s255) event: Inserted: PD 08(e1/s255) > mfi0: 823 (4278190124s/0x0002/0) - Type 29: Inserted: PD 08(e1/s255) Info: enclPd=08, scsiType=d, portMap=00, sasAddr=500180b04413ce00,0000000000000000 > mfi0: 824 (4278190124s/0x0002/0) - PD 00(e1/s0) event: Inserted: PD 00(e1/s0) > mfi0: 825 (4278190124s/0x0002/0) - Type 29: Inserted: PD 00(e1/s0) Info: enclPd=08, scsiType=0, portMap=01, sasAddr=50010b900046038e,0000000000000000 > mfi0: 826 (4278190124s/0x0002/0) - PD 01(e1/s1) event: Inserted: PD 01(e1/s1) > mfi0: 827 (4278190124s/0x0002/0) - Type 29: Inserted: PD 01(e1/s1) Info: enclPd=08, scsiType=0, portMap=02, sasAddr=50010b9000460376,0000000000000000 > mfi0: 828 (4278190124s/0x0002/0) - PD 02(e1/s2) event: Inserted: PD 02(e1/s2) > mfi0: 829 (4278190124s/0x0002/0) - Type 29: Inserted: PD 02(e1/s2) Info: enclPd=08, scsiType=0, portMap=04, sasAddr=50010b900046035a,0000000000000000 > mfi0: 830 (4278190124s/0x0002/0) - PD 03(e1/s3) event: Inserted: PD 03(e1/s3) > mfi0: 831 (4278190124s/0x0002/0) - Type 29: Inserted: PD 03(e1/s3) Info: enclPd=08, scsiType=0, portMap=08, sasAddr=50010b90004603be,0000000000000000 > mfi0: 832 (4278190124s/0x0002/0) - PD 04(e1/s4) event: Inserted: PD 04(e1/s4) > mfi0: 833 (4278190124s/0x0002/0) - Type 29: Inserted: PD 04(e1/s4) Info: enclPd=08, scsiType=0, portMap=10, sasAddr=50010b900045f6d6,0000000000000000 > mfi0: 834 (4278190124s/0x0002/0) - PD 05(e1/s5) event: Inserted: PD 05(e1/s5) > mfi0: 835 (4278190124s/0x0002/0) - Type 29: Inserted: PD 05(e1/s5) Info: enclPd=08, scsiType=0, portMap=20, sasAddr=50010b9000460246,0000000000000000 > mfi0: 836 (224964238s/0x0020/0) - Adapter ticks 224964238 elapsed 45s: Time established as 02/16/07 18:03:58; (45 seconds since power on) > > and the disks looks like: > > mfid0: on mfi0 > mfid0: 418176MB (856424448 sectors) RAID volume '' is optimal > Looks A OK to me. > > The intended production workload involves creation and deletion of > a large number of files rather rapidly. > > I recalled that for the first year or two with Soft Updates, there > were problems with that kind of workload, such that there was enough > hysteresis in making free blocks actually available for subsequent > allocation that processes that were trying to write to new blocks > on such file systems would often fail, reporting ENOSPC. Un-mounting > and re-mounting the file system would clean things up, but that > doesn't tend to be a viable approach for keeping a long-running > application happy. :-} > sysctl vfs.ffs.doasyncfree=0 might help. Running the syncer more frequently might also help, but I don't recall the sysctl node for that. > I reminded my colleague of this, since she also reported that an > un-mount/re-mount sequence caused a lot of free space to show up > on the file system in question, and she responded that she had been > aware of this, and had been turning off Soft Updates on the file > systems for the application in question, but she had forgotten that > Soft Updates was on by default when she set up this (test) system. > > She then turned off Soft Updates and started the test workload again. > And instead of failing with ENOSPC after 3 days, it only took 2. Very strange. No chance that it was due to files that were deleted but still referenced by open apps? > > Hmmm... well; that wasn't exactly what I had expected. > > Any hints, here? The machine is running the i386 arch, with a pair of > dual-core 2.33HHz Xeons. > > I have a recent dmesg.boot, but I'd rather keep list messages fairly > short. > > We have a local private mirror of the FreeBSD CVS repository, so we have > some flexibility in what we can do for testing, but the objective is to > put the box in production -- and I'd rather not run CURRENT as part of a > customer-visible production workload. :-} [My laptop is a different > matter, of course....] > This sounds purely like a filesystem issue, not an MFI driver issue. Scott