Date: Tue, 9 Jul 2013 08:56:41 -0700 From: Jeremy Chadwick <jdc@koitsu.org> To: Outback Dingo <outbackdingo@gmail.com> Cc: freebsd-stable@freebsd.org Subject: Re: Stable/9 from today mpssas_scsiio timeouts Message-ID: <20130709155641.GA9350@icarus.home.lan> In-Reply-To: <CAKYr3zyqTdsXU-RrDQsXK-rNvsjxjOFTMqdSDs07D548C7DX2w@mail.gmail.com> References: <CAKYr3zwHFiJzO-F3BJ1RrRWJkKWmqPJDvEmMigH0Fg8Epy7Znw@mail.gmail.com> <20130709123900.GA5828@icarus.home.lan> <CAKYr3zxCkk0Cqv%2B-K-G0RQFBiCaTNZdDUzPiyJEP-_LR1i2rQQ@mail.gmail.com> <CAKYr3zxLcS4pGgA=o_xVqDKFUzHGuhDcjYaCu42Kn3rkPvXfxA@mail.gmail.com> <20130709144614.GA7538@icarus.home.lan> <CAKYr3zzTGKaW=AfEMY5vdRMKBsbsQsBVVHCufhtF-FV-pOyMCA@mail.gmail.com> <20130709153058.GA8769@icarus.home.lan> <CAKYr3zyqTdsXU-RrDQsXK-rNvsjxjOFTMqdSDs07D548C7DX2w@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 09, 2013 at 11:46:24AM -0400, Outback Dingo wrote: > On Tue, Jul 9, 2013 at 11:30 AM, Jeremy Chadwick <jdc@koitsu.org> wrote: > > > On Tue, Jul 09, 2013 at 11:20:45AM -0400, Outback Dingo wrote: > > > On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick <jdc@koitsu.org> wrote: > > > > > > > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote: > > > > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo < > > outbackdingo@gmail.com > > > > >wrote: > > > > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick <jdc@koitsu.org> > > > > wrote: > > > > > > > > > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > > > > > >> > as of stable today im seeing alot of new mps time outs > > > > > >> > > > > > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 > > UTC > > > > 2013 > > > > > >> > root@:/usr/obj/nas/usr/src/sys/ > > > > > >> > > > > > > >> > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 > > > > chip=0x00721000 > > > > > >> > rev=0x03 hdr=0x00 > > > > > >> > vendor = 'LSI Logic / Symbios Logic' > > > > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > > > > > >> > class = mass storage > > > > > >> > subclass = SAS > > > > > >> > > > > > > >> > > > > > > >> > mps0: mpssas_scsiio_timeout checking sc 0xffffff8002145000 cm > > > > > >> > 0xffffff80021a6b78 > > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > > > SMID > > > > > >> 983 > > > > > >> > command timeout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800 > > > > > >> > mps0: mpssas_alloc_tm freezing simq > > > > > >> > mps0: timedout cm 0xffffff80021a6b78 allocated tm > > 0xffffff80021587b0 > > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > > > > SMID > > > > > >> 983 > > > > > >> > completed timedout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800 > > > > during > > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a > > code > > > > 0x0 > > > > > >> count > > > > > >> > 1 > > > > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting > > > > TaskMID > > > > > >> 983 > > > > > >> > mps0: mpssas_free_tm releasing simq > > > > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout > > > > > >> > (probe40:mps0:0:40:0): Retrying command > > > > > >> > mps1: mpssas_scsiio_timeout checking sc 0xffffff8002384000 cm > > > > > >> > 0xffffff80023e5b78 > > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length > > 36 > > > > SMID > > > > > >> 983 > > > > > >> > command timeout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800 > > > > > >> > mps1: mpssas_alloc_tm freezing simq > > > > > >> > mps1: timedout cm 0xffffff80023e5b78 allocated tm > > 0xffffff80023977b0 > > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length > > 36 > > > > SMID > > > > > >> 983 > > > > > >> > completed timedout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800 > > > > during > > > > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > > > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a > > code > > > > 0x0 > > > > > >> count > > > > > >> > 1 > > > > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting > > > > TaskMID > > > > > >> 983 > > > > > >> > mps1: mpssas_free_tm releasing simq > > > > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 > > > > > >> > (probe292:mps1:0:37:0): CAM status: Command timeout > > > > > >> > (probe292:mps1:0:37:0): Retrying command > > > > > >> > > > > > >> 1. What revision were you running before (i.e. what were you on > > prior > > > > to > > > > > >> the upgrade)? > > > > > >> > > > > > > > > > > > > > > > > > > Sorry I was on 252595 from July 3 > > > > > > > > And does rolling back to r252595 resolve the problem for you? > > > > > > > > Because the only commit I see between r253035 and r252595 that might > > > > account for some kind of behavioural change, unless I missed one while > > > > skimming the commit history, is the following: > > > > > > > > r252730 -- http://www.freshbsd.org/commit/freebsd/r252730 > > > > > > > > If at all possible, please try updating to r253037 or newer to see > > > > if that has some effect/improvement. Why I mention that commit: > > > > > > > > r253037 -- http://www.freshbsd.org/commit/freebsd/r253037 > > > > > > > > Because the only mps(4) changes done in recent days are: > > > > > > > > http://svnweb.freebsd.org/base/stable/9/sys/dev/mps/mps_sas.c?view=log > > > > > > > > r253037 > > > > r251899 > > > > r251874 > > > > > > > > > > i can say this its between July 4, and 253048, im rolling back to 252723 > > to > > > validate a good known working state > > > > Looking at your dmesg, it looks like the "errors" might be for SAS ports > > which don't have any actual devices (disks) attached to them, yet parts > > of the kernel (not sure which layer) are still trying to submit INQUIRY > > commands to those ports as if they did have disks attached. > > > > It looks like you see this behaviour on boot up, and then later during > > normal operation at some point (a LUN scan or rescan or "bus taste" > > might cause this to happen; for example I know that "zpool import" in > > effect can sometimes cause this behaviour -- on one of my systems "zpool > > import" would cause the servers' floppy drive to spin up/chunk briefly). > > > > I'm hoping Steven or mav@ might be able to confirm/deny my theory here. > > > > I see it even trying to write to the pool via NFS or FTP, which even times > out on large files > now, it was all working, and there are 2 controllers setup in an HA > configuration, but they did > work fine before, so ill roll back and try an earlier kernel then walk > forward till i hit the problem. > my only issue was i moved forward to get the newer ixgbe driver and others > just commited to stable > then to find that SAS was now quirky, welcome to stable. Either way the > overall performance > on this box has been in question, just havent been able to confirm its the > enclosure, the nic card, > or the zpool which is degraded, but 40MB/s via NFS on a 10GBe nic isnt > good. so tweaking and > testing seems to be mute until the box is at least stable again. I do > appreciate the insight, and will > do whatevers needed to hammer down the issue so it can be resolved. Again, I would strongly suggest trying r253037 or newer first. -- | Jeremy Chadwick jdc@koitsu.org | | UNIX Systems Administrator http://jdc.koitsu.org/ | | Making life hard for others since 1977. PGP 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20130709155641.GA9350>