Date: Tue, 17 Apr 2012 20:41:47 -0700 From: John Hickey <jjh@deterlab.net> To: "Desai, Kashyap" <Kashyap.Desai@lsi.com> Cc: "freebsd-scsi@freebsd.org" <freebsd-scsi@freebsd.org>, "Mankani, Krishnaraddi" <Krishnaraddi.Mankani@lsi.com>, "Kenneth D. Merry" <ken@freebsd.org>, "Reddy, Sreekanth" <Sreekanth.Reddy@lsi.com> Subject: Re: Write Timeouts with MPS Message-ID: <47976F0C-7786-4B2D-B898-6CE5A9A8EE96@deterlab.net> In-Reply-To: <54373403-939F-4FC5-9A2E-40B2304EB518@deterlab.net> References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> <A82E913C-05F6-4770-A8BF-1193780ACE76@deterlab.net> <4F85180D.5060104@brockmann-consult.de> <20120411073532.GC13315@deterlab.net> <B2FD678A64EAAD45B089B123FDFC3ED72B96EF94F7@inbmail01.lsi.com> <54373403-939F-4FC5-9A2E-40B2304EB518@deterlab.net>
next in thread | previous in thread | raw e-mail | index | archive | help
I have updated all the drives with the firmware provided by Seagate. = Performance is up and I don't see any timeouts when doing a zpool scrub. = I'm going to give the system more of a workout, but so far I think the = drive firmware did the trick. =20 Seatools for windows is a pain. It will let you select a firmware file = anywhere on your system, but silently fail if you don't put the firmware = update in its program directory. It also seems to have a hard display = limit of ~13 drives. Has anyone had success with using camcontrol = fwdownload with Seagate .LOD firmware files? John On Apr 12, 2012, at 1:16 PM, John Hickey wrote: > I have a firmware update in hand for the drives. I am going to update = my drives and see if I can still reproduce this. >=20 > John >=20 > On Apr 12, 2012, at 5:26 AM, Desai, Kashyap wrote: >=20 >> We never see this issue on our test machines. >> Adding Sreekanth and he will plan to reproduce this issue locally to = have further analysis on issue. >>=20 >> Please help Sreekanth to reproduce it locally. >>=20 >>=20 >> ~ Kashyap >>=20 >>> -----Original Message----- >>> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >>> scsi@freebsd.org] On Behalf Of John Hickey >>> Sent: Wednesday, April 11, 2012 1:06 PM >>> To: freebsd-scsi@freebsd.org >>> Subject: Re: Write Timeouts with MPS >>>=20 >>> I pretty much did this and filed a ticket with Seagate this = afternoon. >>> They told me the latest firmware is 0006 (I am at 0001) and wanted >>> the serial numbers of the other drives in the array (probably to >>> confirm firmware compatibility). I suspect I'll have the update in >>> hand tomorrow and see how that works. Running FreeBSD didn't seem = to >>> be an issue to them aside from concern about reading the serial = numbers >>> without seatools. Only issue with that was that I initially gave = them >>> the whole inquiry serial string, but only the first 8 (X) characters = of >>> inquiry are the serial number: >>>=20 >>> $ sudo camcontrol inquiry da3 >>> pass3: <SEAGATE ST2000NM0001 0001> Fixed Direct Access SCSI-6 = device >>> pass3: Serial Number XXXXXXXX0000YYYYYYYY >>> pass3: 600.000MB/s transfers, Command Queueing Enabled >>>=20 >>> John >>>=20 >>> On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote: >>>> Well, when I emailed some Seagate people, they just told me to use >>>> supported ones. So I suggest you email them about it, telling them = it >>> is >>>> on the compatibility list, and asking for an explanation and fix = (eg. >>>> firmware bug fix). You could also say it is fairly common on = seagate >>>> (and Samsung) disks, and very uncommon with other brands. >>>>=20 >>>> Peter >>>>=20 >>>> On 11.04.2012 00:26, John Hickey wrote: >>>>> I have 19 drives in my array, so changing them isn't that easy. = ;-) >>> They are Seagate Constellation ES 2TB SAS drives (SEAGATE = ST2000NM0001 >>> 0001) and according to LSI documents my whole setup should be = supported. >>> The drives at least aren't being marked as failed. I believe a = change >>> was made a while back to make FreeBSD less sensitive to these sorts = of >>> timeouts. I have had a panic or two on the system, but haven't = tracked >>> down the exact cause yet. >>>>>=20 >>>>> John >>>>>=20 >>>>> On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: >>>>>=20 >>>>>> I found this only happens with specific disks / disk firmware... >>> but >>>>>> nobody seems to listen to me about it. They all seem to blame the >>>>>> driver. (I blame both, but changing disks is a simple fix.) >>>>>>=20 >>>>>> And looking around, most reports are with various Seagates >>> (including >>>>>> one that can cause this type of error with smartctl -a with a SAS >>>>>> Seagate, but cannot reproduce with the binary LSI driver) or >>> Samsung >>>>>> Spinpoints. The only other disk I know of that does this is a >>> Crucial >>>>>> SSD with old firmware. One guy said he can do a camcontrol rescan >>> to get >>>>>> it back; I tried that and get either panics, hangs, or nothing. >>>>>>=20 >>>>>> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB >>> Seagate >>>>>> greens don't seem to have this problem. I have no idea if = different >>>>>> disks behave differently with different controllers. I asked >>> Seagate >>>>>> about it and they reply with marketing nonsense about buying >>> enterprise >>>>>> disks instead, and say I should buy disks that are on the = specific >>>>>> compatibility list for the HBA. >>>>>>=20 >>>>>> I found that with the few disks that I have that fail randomly = (and >>>>>> others), I can reproduce the issue (not exact same symptoms = though) >>> by >>>>>> hot pulling the disk while writing something, putting it back, = wait >>> a >>>>>> few seconds (<10; less than enough for the SCSI controller to >>> rescan) >>>>>> pull and replace again. The old 2TB seagate greens fail this = test, >>> but >>>>>> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this = test, >>> as >>>>>> well as 3TB WD greens. (all enterprise disks I tried pass this = test >>>>>> except the Toshiba 2TB ones I tried) >>>>>>=20 >>>>>> If I put a "failed" disk back in, it does not work. If I put it = in >>> a >>>>>> different slot, same. But if I put any other disk in, it works >>> fine. So >>>>>> it is the disk, but it is also FreeBSD not being able to >>> reset/rescan >>>>>> it. But it is simple enough to blame both, and since you can't = get >>> rid >>>>>> of the driver, get different disks (eg. swap them with some >>> different >>>>>> same sized ones in a different machine). >>>>>>=20 >>>>>> Here is my forum thread about it, including disk product ids for >>> ones I >>>>>> tested, and a huge list of things that don't fix it. >>>>>> http://forums.freebsd.org/showthread.php?t=3D28252 >>>>>>=20 >>>>>> Peter >>>>>>=20 >>>>>>=20 >>>>>> On 10.04.2012 03:52, John Hickey wrote: >>>>>>> I've seen people having this problem before, but I don't think >>> anyone >>>>>>> has figured it out. I am running: >>>>>>>=20 >>>>>>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 >>> 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 >>>>>>>=20 >>>>>>> I have the latest LSI IT firmware 13 loaded: >>>>>>>=20 >>>>>>> mps1: <LSI SAS2008> port 0xc000-0xc0ff mem 0xfe93c000- >>> 0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 >>>>>>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd >>>>>>> mps1: IOCCapabilities: >>> = 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDis >>> c> >>>>>>>=20 >>>>>>> All disks are on a SuperMicro SAS II backplane: >>>>>>>=20 >>>>>>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist >>>>>>> <SEAGATE ST3300657SS 0008> at scbus0 target 0 lun 0 >>> (da0,pass0) >>>>>>> <SEAGATE ST3300657SS 0008> at scbus0 target 1 lun 0 >>> (da1,pass1) >>>>>>> <SEAGATE ST2000NM0001 0001> at scbus1 target 8 lun 0 >>> (da2,pass2) >>>>>>> .... x16 more of the same >>>>>>> <SEAGATE ST2000NM0001 0001> at scbus1 target 46 lun 0 >>> (da20,pass20) >>>>>>> <LSI CORP SAS2X36 0717> at scbus1 target 47 lun 0 >>> (ses0,pass21) >>>>>>>=20 >>>>>>> Essentially when putting the ZFS filesystem under load, I am >>> getting >>>>>>> these sorts of errors: >>>>>>>=20 >>>>>>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 >>> length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 = length >>> 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 = length >>> 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 >>> length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 >>> length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 = length >>> 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length >>> 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 = length >>> 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 = length >>> 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 = length >>> 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 = length >>> 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 = length >>> 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 = length >>> 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 = length >>> 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 = length >>> 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 = length >>> 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 = length >>> 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length >>> 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 >>> length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 >>> length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 = length >>> 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 = length >>> 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 >>>>>>> _______________________________________________ >>>>>>> freebsd-scsi@freebsd.org mailing list >>>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>>> _______________________________________________ >>>>>> freebsd-scsi@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>>>=20 >>>>> _______________________________________________ >>>>> freebsd-scsi@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>=20 >>>> _______________________________________________ >>>> freebsd-scsi@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>> To unsubscribe, send any mail to "freebsd-scsi- >>> unsubscribe@freebsd.org" >>>>=20 >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >>=20 >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >=20
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?47976F0C-7786-4B2D-B898-6CE5A9A8EE96>