Date: Tue, 9 Jul 2013 11:20:45 -0400 From: Outback Dingo <outbackdingo@gmail.com> To: Jeremy Chadwick <jdc@koitsu.org> Cc: freebsd-stable@freebsd.org Subject: Re: Stable/9 from today mpssas_scsiio timeouts Message-ID: <CAKYr3zzTGKaW=AfEMY5vdRMKBsbsQsBVVHCufhtF-FV-pOyMCA@mail.gmail.com> In-Reply-To: <20130709144614.GA7538@icarus.home.lan> References: <CAKYr3zwHFiJzO-F3BJ1RrRWJkKWmqPJDvEmMigH0Fg8Epy7Znw@mail.gmail.com> <20130709123900.GA5828@icarus.home.lan> <CAKYr3zxCkk0Cqv%2B-K-G0RQFBiCaTNZdDUzPiyJEP-_LR1i2rQQ@mail.gmail.com> <CAKYr3zxLcS4pGgA=o_xVqDKFUzHGuhDcjYaCu42Kn3rkPvXfxA@mail.gmail.com> <20130709144614.GA7538@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jul 9, 2013 at 10:46 AM, Jeremy Chadwick <jdc@koitsu.org> wrote: > On Tue, Jul 09, 2013 at 09:47:01AM -0400, Outback Dingo wrote: > > On Tue, Jul 9, 2013 at 9:44 AM, Outback Dingo <outbackdingo@gmail.com > >wrote: > > > On Tue, Jul 9, 2013 at 8:39 AM, Jeremy Chadwick <jdc@koitsu.org> > wrote: > > > > > >> On Tue, Jul 09, 2013 at 05:32:39AM -0400, Outback Dingo wrote: > > >> > as of stable today im seeing alot of new mps time outs > > >> > > > >> > 9.1-STABLE FreeBSD 9.1-STABLE #0 r253035M: Mon Jul 8 16:34:28 UTC > 2013 > > >> > root@:/usr/obj/nas/usr/src/sys/ > > >> > > > >> > mps1@pci0:130:0:0: class=0x010700 card=0x30201000 > chip=0x00721000 > > >> > rev=0x03 hdr=0x00 > > >> > vendor = 'LSI Logic / Symbios Logic' > > >> > device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]' > > >> > class = mass storage > > >> > subclass = SAS > > >> > > > >> > > > >> > mps0: mpssas_scsiio_timeout checking sc 0xffffff8002145000 cm > > >> > 0xffffff80021a6b78 > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > SMID > > >> 983 > > >> > command timeout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800 > > >> > mps0: mpssas_alloc_tm freezing simq > > >> > mps0: timedout cm 0xffffff80021a6b78 allocated tm 0xffffff80021587b0 > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > SMID > > >> 983 > > >> > completed timedout cm 0xffffff80021a6b78 ccb 0xfffffe002bb5f800 > during > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > >> > (noperiph:mps0:0:40:0): SMID 6 abort TaskMID 983 status 0x4a code > 0x0 > > >> count > > >> > 1 > > >> > (noperiph:mps0:0:40:0): SMID 6 finished recovery after aborting > TaskMID > > >> 983 > > >> > mps0: mpssas_free_tm releasing simq > > >> > (probe40:mps0:0:40:0): INQUIRY. CDB: 12 00 00 00 24 00 > > >> > (probe40:mps0:0:40:0): CAM status: Command timeout > > >> > (probe40:mps0:0:40:0): Retrying command > > >> > mps1: mpssas_scsiio_timeout checking sc 0xffffff8002384000 cm > > >> > 0xffffff80023e5b78 > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > SMID > > >> 983 > > >> > command timeout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800 > > >> > mps1: mpssas_alloc_tm freezing simq > > >> > mps1: timedout cm 0xffffff80023e5b78 allocated tm 0xffffff80023977b0 > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 > SMID > > >> 983 > > >> > completed timedout cm 0xffffff80023e5b78 ccb 0xfffffe002be14800 > during > > >> > recovery ioc 8048 scsi 0 state c xfer 0 > > >> > (noperiph:mps1:0:37:0): SMID 6 abort TaskMID 983 status 0x4a code > 0x0 > > >> count > > >> > 1 > > >> > (noperiph:mps1:0:37:0): SMID 6 finished recovery after aborting > TaskMID > > >> 983 > > >> > mps1: mpssas_free_tm releasing simq > > >> > (probe292:mps1:0:37:0): INQUIRY. CDB: 12 00 00 00 24 00 > > >> > (probe292:mps1:0:37:0): CAM status: Command timeout > > >> > (probe292:mps1:0:37:0): Retrying command > > >> > > >> 1. What revision were you running before (i.e. what were you on prior > to > > >> the upgrade)? > > >> > > > > > > > > > Sorry I was on 252595 from July 3 > > And does rolling back to r252595 resolve the problem for you? > > Because the only commit I see between r253035 and r252595 that might > account for some kind of behavioural change, unless I missed one while > skimming the commit history, is the following: > > r252730 -- http://www.freshbsd.org/commit/freebsd/r252730 > > If at all possible, please try updating to r253037 or newer to see > if that has some effect/improvement. Why I mention that commit: > > r253037 -- http://www.freshbsd.org/commit/freebsd/r253037 > > Because the only mps(4) changes done in recent days are: > > http://svnweb.freebsd.org/base/stable/9/sys/dev/mps/mps_sas.c?view=log > > r253037 > r251899 > r251874 > i can say this its between July 4, and 253048, im rolling back to 252723 to validate a good known working state > > Else I'd say what you're experiencing is legitimate/unrelated to kernel > changes. I can only speculate. > > The messages to me indicate that some part of the kernel is submitting a > SCSI INQUIRY request to the underlying device(s) which results in a CAM > timeout, i.e. the disk attached to the controller did not respond > promptly (while the controller seemed to be alive/well). > > If these disks (which we do not know the type of -- no dmesg provided, > etc.) are SSDs then TRIM behaviour is possibly causing the drive to take > too long to perform its TRIM cleanup, or, the drives themselves are > doing some kind of garbage collection which is taking quite a long time. > > Steven et all may have a different (and almost certainly more accurate) > analysis. > > It would really help if you could provide "dmesg" from the machine, as > well as any details about your setup (if ZFS, "zpool status", etc.), in > addition to (if SSDs) "sysctl -a | grep -i trim". All this matters. > > -- > | Jeremy Chadwick jdc@koitsu.org | > | UNIX Systems Administrator http://jdc.koitsu.org/ | > | Making life hard for others since 1977. PGP 4BD6C0CB | > >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAKYr3zzTGKaW=AfEMY5vdRMKBsbsQsBVVHCufhtF-FV-pOyMCA>