Date: Mon, 13 Jul 2015 11:01:48 +0200 From: Yamagi Burmeister <lists@yamagi.org> To: freebsd-scsi@freebsd.org Subject: Re: Device timeouts(?) with LSI SAS3008 on mpr(4) Message-ID: <20150713110148.1a27b973881b64ce2f9e98e0@yamagi.org> In-Reply-To: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org> References: <20150707132416.71b44c90f7f4cd6014a304b2@yamagi.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hello, after some fiddling and testing I managed to track this down. TRIM is the culprit: - With vfs.zfs.trim.enabled set to 1 timeouts occure. Regardless of cabeling, of a backplane or direct connection. It doesn't matter if Intel DC S3500 oder S3700 SSDs are connected, but on the other hand both share the same controller. I don't have enough onboard S-ATA ports to test the whole setup without the 9300-8i HBA, but a short (maybe too short and without enough load) test with 6 SSDs didn't show any timeouts. - With vfs.zfs.trim.enabled set to 0 I havn't seen a single timeout for ~56 hours. Regards, Yamagi On Tue, 7 Jul 2015 13:24:16 +0200 Yamagi Burmeister <lists@yamagi.org> wrote: > Hello, > I've got 3 new Supermicro servers based upon the X10DRi-LN4+ platform. > Each server is equiped with 2 LSI SAS9300-8i-SQL SAS adapters. Each > adapter serves 8 Intel DC S3700 SSDs. Operating system is 10.1-STABLE > as of r283938 on 2 servers and r285196 on the last one. > > The controller identify themself as: > > ---- > > mpr0: <Avago Technologies (LSI) SAS3008> port 0x6000-0x60ff mem > 0xc7240000-0xc724ffff,0xc7200000-0xc723ffff irq 32 at device 0.0 on > pci2 mpr0: IOCFacts : MsgVersion: 0x205 > HeaderVersion: 0x2300 > IOCNumber: 0 > IOCExceptions: 0x0 > MaxChainDepth: 128 > NumberOfPorts: 1 > RequestCredit: 10240 > ProductID: 0x2221 > IOCRequestFrameSize: 32 > MaxInitiators: 32 > MaxTargets: 1024 > MaxSasExpanders: 42 > MaxEnclosures: 43 > HighPriorityCredit: 128 > MaxReplyDescriptorPostQueueDepth: 65504 > ReplyFrameSize: 32 > MaxVolumes: 0 > MaxDevHandle: 1106 > MaxPersistentEntries: 128 > mpr0: Firmware: 08.00.00.00, Driver: 09.255.01.00-fbsd > mpr0: IOCCapabilities: > 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc> > > ---- > > 08.00.00.00 is the last available firmware. > > > Since day one 'dmesg' is cluttered with CAM errors: > > ---- > > mpr1: Sending reset from mprsas_send_abort for target ID 5 > (da11:mpr1:0:5:0): WRITE(10). CDB: 2a 00 4c 15 1f 88 00 00 08 > 00 length 4096 SMID 554 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mpr1:0:5:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 > 01 00 00 00 00 00 00 40 06 00 length 512 SMID 506 ter(da11:mpr1:0:5:0): > READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 minated ioc 804b scsi 0 > state c xfer 0 (da11:mpr1:0:5:0): CAM status: Command timeout mpr1: > (da11:Unfreezing devq for target ID 5 mpr1:0:5:0): Retrying command > (da11:mpr1:0:5:0): READ(10). CDB: 28 00 4c 2b 95 c0 00 00 10 00 > (da11:mpr1:0:5:0): CAM status: SCSI Status Error (da11:mpr1:0:5:0): > SCSI status: Check Condition (da11:mpr1:0:5:0): SCSI sense: UNIT > ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) > (da11:mpr1:0:5:0): Retrying command (per sense data) (da11:mpr1:0:5:0): > READ(10). CDB: 28 00 4c 22 b5 b8 00 00 18 00 (da11:mpr1:0:5:0): CAM > status: SCSI Status Error (da11:mpr1:0:5:0): SCSI status: Check > Condition (da11:mpr1:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power > on, reset, or bus device reset occurred) (da11:mpr1:0:5:0): Retrying > command (per sense data) (noperiph:mpr1:0:4294967295:0): SMID 2 > Aborting command 0xfffffe0001601a30 > > mpr1: Sending reset from mprsas_send_abort for target ID 2 > (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 00 30 00 > length 24576 SMID 898 terminated ioc 804b scsi 0 state c xfer 0 > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 77 cc e0 00 00 18 00 length > 12288 SMID 604 terminated ioc 804b scsi 0 state c xfer 0 mpr1: > Unfreezing devq for target ID 2 (da8:mpr1:0:2:0): ATA COMMAND PASS > THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 > (da8:mpr1:0:2:0): CAM status: Command timeout (da8:mpr1:0:2:0): > Retrying command (da8:mpr1:0:2:0): WRITE(10). CDB: 2a 00 59 81 ae 18 00 > 00 30 00 (da8:mpr1:0:2:0): CAM status: SCSI Status Error > (da8:mpr1:0:2:0): SCSI status: Check Condition (da8:mpr1:0:2:0): SCSI > sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset > occurred) (da8:mpr1:0:2:0): Retrying command (per sense data) > (da8:mpr1:0:2:0): READ(10). CDB: 28 00 59 41 3d 08 00 00 10 00 > (da8:mpr1:0:2:0): CAM status: SCSI Status Error (da8:mpr1:0:2:0): SCSI > status: Check Condition (da8:mpr1:0:2:0): SCSI sense: UNIT ATTENTION > asc:29,0 (Power on, reset, or bus device reset occurred) > (da8:mpr1:0:2:0): Retrying command (per sense data) > (noperiph:mpr1:0:4294967295:0): SMID 3 Aborting command > 0xfffffe000160b660 > > ---- > > ZFS doesn't like this and sees read errors or even write errors. In > extreme cases the device is marked as FAULTED: > > ---- > > pool: examplepool > state: DEGRADED > status: One or more devices are faulted in response to persistent > errors. Sufficient replicas exist for the pool to continue functioning > in a degraded state. > action: Replace the faulted device, or use 'zpool clear' to mark the > device repaired. > scan: none requested > config: > > NAME STATE READ WRITE CKSUM > examplepool DEGRADED 0 0 0 > raidz1-0 ONLINE 0 0 0 > da3p1 ONLINE 0 0 0 > da4p1 ONLINE 0 0 0 > da5p1 ONLINE 0 0 0 > logs > da1p1 FAULTED 3 0 0 too many errors > cache > da1p2 FAULTED 3 0 0 too many errors > spares > da2p1 AVAIL > > errors: No known data errors > > ---- > > The problems arise on all 3 machines all all SSDs nearly daily. So I > highly suspect a software issue. Has anyone an idea what's going on and > what I can do to solve this problems? More information can be provided > if necessary. > > Regards, > Yamagi > > -- > Homepage: www.yamagi.org > XMPP: yamagi@yamagi.org > GnuPG/GPG: 0xEFBCCBCB > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" -- Homepage: www.yamagi.org XMPP: yamagi@yamagi.org GnuPG/GPG: 0xEFBCCBCB
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150713110148.1a27b973881b64ce2f9e98e0>