Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Apr 2012 15:26:10 -0700
From:      John Hickey <jjh@deterlab.net>
To:        freebsd-scsi@freebsd.org
Subject:   Re: Write Timeouts with MPS
Message-ID:  <A82E913C-05F6-4770-A8BF-1193780ACE76@deterlab.net>
In-Reply-To: <4F848B93.10402@brockmann-consult.de>
References:  <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de>

next in thread | previous in thread | raw e-mail | index | archive | help
I have 19 drives in my array, so changing them isn't that easy. ;-)  =
They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001 =
0001) and according to LSI documents my whole setup should be supported. =
 The drives at least aren't being marked as failed.  I believe a change =
was made a while back to make FreeBSD less sensitive to these sorts of =
timeouts.  I have had a panic or two on the system, but haven't tracked =
down the exact cause yet.

John

On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote:

> I found this only happens with specific disks / disk firmware... but
> nobody seems to listen to me about it. They all seem to blame the
> driver. (I blame both, but changing disks is a simple fix.)
>=20
> And looking around, most reports are with various Seagates (including
> one that can cause this type of error with smartctl -a with a SAS
> Seagate, but cannot reproduce with the binary LSI driver) or Samsung
> Spinpoints. The only other disk I know of that does this is a Crucial
> SSD with old firmware. One guy said he can do a camcontrol rescan to =
get
> it back; I tried that and get either panics, hangs, or nothing.
>=20
> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate
> greens don't seem to have this problem. I have no idea if different
> disks behave differently with different controllers. I asked Seagate
> about it and they reply with marketing nonsense about buying =
enterprise
> disks instead, and say I should buy disks that are on the specific
> compatibility list for the HBA.
>=20
> I found that with the few disks that I have that fail randomly (and
> others), I can reproduce the issue (not exact same symptoms though) by
> hot pulling the disk while writing something, putting it back, wait a
> few seconds (<10; less than enough for the SCSI controller to rescan)
> pull and replace again. The old 2TB seagate greens fail this test, but
> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as
> well as 3TB WD greens. (all enterprise disks I tried pass this test
> except the Toshiba 2TB ones I tried)
>=20
> If I put a "failed" disk back in, it does not work. If I put it in a
> different slot, same. But if I put any other disk in, it works fine. =
So
> it is the disk, but it is also FreeBSD not being able to reset/rescan
> it. But it is simple enough to blame both, and since you can't get rid
> of the driver, get different disks (eg. swap them with some different
> same sized ones in a different machine).
>=20
> Here is my forum thread about it, including disk product ids for ones =
I
> tested, and a huge list of things that don't fix it.
> http://forums.freebsd.org/showthread.php?t=3D28252
>=20
> Peter
>=20
>=20
> On 10.04.2012 03:52, John Hickey wrote:
>> I've seen people having this problem before, but I don't think anyone
>> has figured it out.  I am running:
>>=20
>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr  7 18:05:57 =
PDT 2012     root@zfs:/usr/obj/usr/src/sys/GENERIC  amd64
>>=20
>> I have the latest LSI IT firmware 13 loaded:
>>=20
>> mps1: <LSI SAS2008> port 0xc000-0xc0ff mem =
0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5
>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd
>> mps1: IOCCapabilities: =
1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>=

>>=20
>> All disks are on a SuperMicro SAS II backplane:
>>=20
>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist
>> <SEAGATE ST3300657SS 0008>         at scbus0 target 0 lun 0 =
(da0,pass0)
>> <SEAGATE ST3300657SS 0008>         at scbus0 target 1 lun 0 =
(da1,pass1)
>> <SEAGATE ST2000NM0001 0001>        at scbus1 target 8 lun 0 =
(da2,pass2)
>> .... x16 more of the same
>> <SEAGATE ST2000NM0001 0001>        at scbus1 target 46 lun 0 =
(da20,pass20)
>> <LSI CORP SAS2X36 0717>            at scbus1 target 47 lun 0 =
(ses0,pass21)
>>=20
>> Essentially when putting the ZFS filesystem under load, I am getting
>> these sorts of errors:
>>=20
>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length =
131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0
>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length =
131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0
>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length =
131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0
>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length =
131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0
>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length =
131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0
>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length =
131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0
>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length =
131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0
>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length =
131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0
>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length =
131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0
>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length =
131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0
>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length =
131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0
>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length =
131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0
>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length =
131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0
>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length =
131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0
>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length =
131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0
>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length =
131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0
>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length =
131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0
>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length =
131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0
>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length =
131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0
>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length =
131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0
>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length =
131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0
>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length =
131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0
>> _______________________________________________
>> freebsd-scsi@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
>> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"
>=20
> _______________________________________________
> freebsd-scsi@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to =
"freebsd-scsi-unsubscribe@freebsd.org"
>=20




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?A82E913C-05F6-4770-A8BF-1193780ACE76>