From owner-freebsd-scsi@FreeBSD.ORG Tue Feb 8 20:13:12 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 70267106566C; Tue, 8 Feb 2011 20:13:12 +0000 (UTC) (envelope-from ken@kdm.org) Received: from nargothrond.kdm.org (nargothrond.kdm.org [70.56.43.81]) by mx1.freebsd.org (Postfix) with ESMTP id 19C408FC0A; Tue, 8 Feb 2011 20:13:11 +0000 (UTC) Received: from nargothrond.kdm.org (localhost [127.0.0.1]) by nargothrond.kdm.org (8.14.2/8.14.2) with ESMTP id p18KDBZu097735; Tue, 8 Feb 2011 13:13:11 -0700 (MST) (envelope-from ken@nargothrond.kdm.org) Received: (from ken@localhost) by nargothrond.kdm.org (8.14.2/8.14.2/Submit) id p18KDAgo097734; Tue, 8 Feb 2011 13:13:10 -0700 (MST) (envelope-from ken) Date: Tue, 8 Feb 2011 13:13:10 -0700 From: "Kenneth D. Merry" To: Joachim Tingvold Message-ID: <20110208201310.GA97635@nargothrond.kdm.org> References: <20110114001758.GA12793@nargothrond.kdm.org> <07392102-4584-4690-9188-5202728CC7CA@tingvold.com> <20110120155746.GA22515@nargothrond.kdm.org> <070C12D5-A54F-4A48-A151-EBA16EF32A13@tingvold.com> <20110203221056.GA25389@nargothrond.kdm.org> <20110204180011.GA38067@nargothrond.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2i Cc: freebsd-scsi@freebsd.org, Alexander Motin Subject: Re: mps0-troubles X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Feb 2011 20:13:12 -0000 On Tue, Feb 08, 2011 at 02:35:35 +0100, Joachim Tingvold wrote: > On Fri, Feb 04, 2011, at 19:00:11PM GMT+01:00, Kenneth D. Merry wrote: > >Perhaps it could depend on memory fragmentation somewhat. Over time > >you > >may see the low water mark go down a bit. > > > >The good news is that it doesn't look like we have a leak. > > > This particular error is interesting: mps0: (0:40:0) terminated ioc 804b scsi 0 state c xfer 0 mps0: (0:40:0) terminated ioc 804b scsi 0 state c xfer 0 It means that the chip terminated the command for some reason. I have been talking to LSI about it. I'm working on getting an analyzer trace when it happens, so I cn send that to LSI. What kind of expander do you have in your system? How many expanders do you have? How many drives do you have? Can you send 'camcontrol devlist -v' output? > [jocke@filserver ~]$ sysctl hw.mps.0 > hw.mps.0.debug_level: 0 > hw.mps.0.allow_multiple_tm_cmds: 0 > hw.mps.0.io_cmds_active: 1 > hw.mps.0.io_cmds_highwater: 959 > hw.mps.0.chain_free: 2048 > hw.mps.0.chain_free_lowwater: 1721 > hw.mps.0.chain_alloc_fail: 0 > > This time I did a recursive copy of a folder with no large files at > all (it contained only small documents), from 'storage' to 'storage'. > > However, it recovered, so the copy just continued where it left of -- > which is a change from previous crashes. Yes, it looks like we're not running into the out of chain problem. The timeouts could be due to all sorts of problems. The IOC terminated errors I'm still not sure about. I need to get a trace and send that along with a diagnostic ring buffer dump from the card to LSI to get some answers about what is going on. Ken -- Kenneth Merry ken@FreeBSD.ORG