Date: Tue, 9 Jul 2013 11:46:24 -0400 From: Outback Dingo <outbackdingo@gmail.com> To: Jeremy Chadwick <jdc@koitsu.org> Cc: freebsd-stable@freebsd.org Subject: Re: Stable/9 from today mpssas_scsiio timeouts Message-ID: <CAKYr3zyqTdsXU-RrDQsXK-rNvsjxjOFTMqdSDs07D548C7DX2w@mail.gmail.com> In-Reply-To: <20130709153058.GA8769@icarus.home.lan> References: <CAKYr3zwHFiJzO-F3BJ1RrRWJkKWmqPJDvEmMigH0Fg8Epy7Znw@mail.gmail.com> <20130709123900.GA5828@icarus.home.lan> <CAKYr3zxCkk0Cqv%2B-K-G0RQFBiCaTNZdDUzPiyJEP-_LR1i2rQQ@mail.gmail.com> <CAKYr3zxLcS4pGgA=o_xVqDKFUzHGuhDcjYaCu42Kn3rkPvXfxA@mail.gmail.com> <20130709144614.GA7538@icarus.home.lan> <CAKYr3zzTGKaW=AfEMY5vdRMKBsbsQsBVVHCufhtF-FV-pOyMCA@mail.gmail.com> <20130709153058.GA8769@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 9, 2013 at 11:30 AM, Jeremy Chadwick <jdc@koitsu.org> wrote: > On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote: > > On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick <jdc@koitsu.org> wrote: > > > > > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote: > > > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo < > outbackdingo@gmail.com > > > >wrote: > > > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick <jdc@koitsu.org> > > > wrote: > > > > > > > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > > > > >> > as of stable today im seeing alot of new mps time outs > > > > >> > > > > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 > UTC > > > 2013 > > > > >> > root@:/usr/obj/nas/usr/src/sys/ > > > > >> > > > > > >> > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 > > > chip=0x00721000 > > > > >> > rev=0x03 hdr=0x00 > > > > >> > vendor = 'LSI Logic / Symbios Logic' > > > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > > > > >> > class = mass storage > > > > >> > subclass = SAS > > > > >> > > > > > >> > > > > > >> > mps0: mpssas_scsiio_timeout checking sc 0xffffff8002145000 cm > > > > >> > 0xffffff80021a6b78 > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > > SMID > > > > >> 983 > > > > >> > command timeout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800 > > > > >> > mps0: mpssas_alloc_tm freezing simq > > > > >> > mps0: timedout cm 0xffffff80021a6b78 allocated tm > 0xffffff80021587b0 > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > > SMID > > > > >> 983 > > > > >> > completed timedout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800 > > > during > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a > code > > > 0x0 > > > > >> count > > > > >> > 1 > > > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting > > > TaskMID > > > > >> 983 > > > > >> > mps0: mpssas_free_tm releasing simq > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout > > > > >> > (probe40:mps0:0:40:0): Retrying command > > > > >> > mps1: mpssas_scsiio_timeout checking sc 0xffffff8002384000 cm > > > > >> > 0xffffff80023e5b78 > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length > 36 > > > SMID > > > > >> 983 > > > > >> > command timeout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800 > > > > >> > mps1: mpssas_alloc_tm freezing simq > > > > >> > mps1: timedout cm 0xffffff80023e5b78 allocated tm > 0xffffff80023977b0 > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length > 36 > > > SMID > > > > >> 983 > > > > >> > completed timedout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800 > > > during > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a > code > > > 0x0 > > > > >> count > > > > >> > 1 > > > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting > > > TaskMID > > > > >> 983 > > > > >> > mps1: mpssas_free_tm releasing simq > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > > >> > (probe292:mps1:0:37:0): CAM status: Command timeout > > > > >> > (probe292:mps1:0:37:0): Retrying command > > > > >> > > > > >> 1. What revision were you running before (i.e. what were you on > prior > > > to > > > > >> the upgrade)? > > > > >> > > > > > > > > > > > > > > > Sorry I was on 252595 from July 3 > > > > > > And does rolling back to r252595 resolve the problem for you? > > > > > > Because the only commit I see between r253035 and r252595 that might > > > account for some kind of behavioural change, unless I missed one while > > > skimming the commit history, is the following: > > > > > > r252730 -- http://www.freshbsd.org/commit/freebsd/r252730 > > > > > > If at all possible, please try updating to r253037 or newer to see > > > if that has some effect/improvement. Why I mention that commit: > > > > > > r253037 -- http://www.freshbsd.org/commit/freebsd/r253037 > > > > > > Because the only mps(4) changes done in recent days are: > > > > > > http://svnweb.freebsd.org/base/stable/9/sys/dev/mps/mps_sas.c?view=log > > > > > > r253037 > > > r251899 > > > r251874 > > > > > > > i can say this its between July 4, and 253048, im rolling back to 252723 > to > > validate a good known working state > > Looking at your dmesg, it looks like the "errors" might be for SAS ports > which don't have any actual devices (disks) attached to them, yet parts > of the kernel (not sure which layer) are still trying to submit INQUIRY > commands to those ports as if they did have disks attached. > > It looks like you see this behaviour on boot up, and then later during > normal operation at some point (a LUN scan or rescan or "bus taste" > might cause this to happen; for example I know that "zpool import" in > effect can sometimes cause this behaviour -- on one of my systems "zpool > import" would cause the servers' floppy drive to spin up/chunk briefly). > > I'm hoping Steven or mav@ might be able to confirm/deny my theory here. > I see it even trying to write to the pool via NFS or FTP, which even times out on large files now, it was all working, and there are 2 controllers setup in an HA configuration, but they did work fine before, so ill roll back and try an earlier kernel then walk forward till i hit the problem. my only issue was i moved forward to get the newer ixgbe driver and others just commited to stable then to find that SAS was now quirky, welcome to stable. Either way the overall performance on this box has been in question, just havent been able to confirm its the enclosure, the nic card, or the zpool which is degraded, but 40MB/s via NFS on a 10GBe nic isnt good. so tweaking and testing seems to be mute until the box is at least stable again. I do appreciate the insight, and will do whatevers needed to hammer down the issue so it can be resolved. > > -- > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKYr3zyqTdsXU-RrDQsXK-rNvsjxjOFTMqdSDs07D548C7DX2w>