From owner-freebsd-scsi@FreeBSD.ORG Thu Apr 12 12:30:07 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7750B1065674; Thu, 12 Apr 2012 12:30:07 +0000 (UTC) (envelope-from Kashyap.Desai@lsi.com) Received: from na3sys009aog113.obsmtp.com (na3sys009aog113.obsmtp.com [74.125.149.209]) by mx1.freebsd.org (Postfix) with ESMTP id A24428FC19; Thu, 12 Apr 2012 12:30:06 +0000 (UTC) Received: from paledge01.lsi.com ([192.19.193.42]) (using TLSv1) by na3sys009aob113.postini.com ([74.125.148.12]) with SMTP ID DSNKT4bKzRSae3lHHdpT7pAUKBWz4qqF/SBb@postini.com; Thu, 12 Apr 2012 05:30:06 PDT Received: from PALHUB01.lsi.com (128.94.213.114) by PALEDGE01.lsi.com (192.19.193.42) with Microsoft SMTP Server (TLS) id 8.3.213.0; Thu, 12 Apr 2012 08:31:35 -0400 Received: from inbexch01.lsi.com (135.36.98.37) by PALHUB01.lsi.com (128.94.213.114) with Microsoft SMTP Server (TLS) id 8.3.213.0; Thu, 12 Apr 2012 08:26:35 -0400 Received: from inbmail01.lsi.com ([135.36.98.64]) by inbexch01.lsi.com ([135.36.98.37]) with mapi; Thu, 12 Apr 2012 17:56:32 +0530 From: "Desai, Kashyap" To: John Hickey , "freebsd-scsi@freebsd.org" Date: Thu, 12 Apr 2012 17:56:31 +0530 Thread-Topic: Write Timeouts with MPS Thread-Index: Ac0XtfWomvZYwb29Qcyuc9142dcY0wA8UCQg Message-ID: References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> <4F85180D.5060104@brockmann-consult.de> <20120411073532.GC13315@deterlab.net> In-Reply-To: <20120411073532.GC13315@deterlab.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "Reddy, Sreekanth" , "Mankani, Krishnaraddi" , "Kenneth D. Merry" Subject: RE: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Apr 2012 12:30:07 -0000 We never see this issue on our test machines. Adding Sreekanth and he will plan to reproduce this issue locally to have f= urther analysis on issue. Please help Sreekanth to reproduce it locally. ~ Kashyap > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of John Hickey > Sent: Wednesday, April 11, 2012 1:06 PM > To: freebsd-scsi@freebsd.org > Subject: Re: Write Timeouts with MPS >=20 > I pretty much did this and filed a ticket with Seagate this afternoon. > They told me the latest firmware is 0006 (I am at 0001) and wanted > the serial numbers of the other drives in the array (probably to > confirm firmware compatibility). I suspect I'll have the update in > hand tomorrow and see how that works. Running FreeBSD didn't seem to > be an issue to them aside from concern about reading the serial numbers > without seatools. Only issue with that was that I initially gave them > the whole inquiry serial string, but only the first 8 (X) characters of > inquiry are the serial number: >=20 > $ sudo camcontrol inquiry da3 > pass3: Fixed Direct Access SCSI-6 device > pass3: Serial Number XXXXXXXX0000YYYYYYYY > pass3: 600.000MB/s transfers, Command Queueing Enabled >=20 > John >=20 > On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote: > > Well, when I emailed some Seagate people, they just told me to use > > supported ones. So I suggest you email them about it, telling them it > is > > on the compatibility list, and asking for an explanation and fix (eg. > > firmware bug fix). You could also say it is fairly common on seagate > > (and Samsung) disks, and very uncommon with other brands. > > > > Peter > > > > On 11.04.2012 00:26, John Hickey wrote: > > > I have 19 drives in my array, so changing them isn't that easy. ;-) > They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001 > 0001) and according to LSI documents my whole setup should be supported. > The drives at least aren't being marked as failed. I believe a change > was made a while back to make FreeBSD less sensitive to these sorts of > timeouts. I have had a panic or two on the system, but haven't tracked > down the exact cause yet. > > > > > > John > > > > > > On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: > > > > > >> I found this only happens with specific disks / disk firmware... > but > > >> nobody seems to listen to me about it. They all seem to blame the > > >> driver. (I blame both, but changing disks is a simple fix.) > > >> > > >> And looking around, most reports are with various Seagates > (including > > >> one that can cause this type of error with smartctl -a with a SAS > > >> Seagate, but cannot reproduce with the binary LSI driver) or > Samsung > > >> Spinpoints. The only other disk I know of that does this is a > Crucial > > >> SSD with old firmware. One guy said he can do a camcontrol rescan > to get > > >> it back; I tried that and get either panics, hangs, or nothing. > > >> > > >> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB > Seagate > > >> greens don't seem to have this problem. I have no idea if different > > >> disks behave differently with different controllers. I asked > Seagate > > >> about it and they reply with marketing nonsense about buying > enterprise > > >> disks instead, and say I should buy disks that are on the specific > > >> compatibility list for the HBA. > > >> > > >> I found that with the few disks that I have that fail randomly (and > > >> others), I can reproduce the issue (not exact same symptoms though) > by > > >> hot pulling the disk while writing something, putting it back, wait > a > > >> few seconds (<10; less than enough for the SCSI controller to > rescan) > > >> pull and replace again. The old 2TB seagate greens fail this test, > but > > >> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, > as > > >> well as 3TB WD greens. (all enterprise disks I tried pass this test > > >> except the Toshiba 2TB ones I tried) > > >> > > >> If I put a "failed" disk back in, it does not work. If I put it in > a > > >> different slot, same. But if I put any other disk in, it works > fine. So > > >> it is the disk, but it is also FreeBSD not being able to > reset/rescan > > >> it. But it is simple enough to blame both, and since you can't get > rid > > >> of the driver, get different disks (eg. swap them with some > different > > >> same sized ones in a different machine). > > >> > > >> Here is my forum thread about it, including disk product ids for > ones I > > >> tested, and a huge list of things that don't fix it. > > >> http://forums.freebsd.org/showthread.php?t=3D28252 > > >> > > >> Peter > > >> > > >> > > >> On 10.04.2012 03:52, John Hickey wrote: > > >>> I've seen people having this problem before, but I don't think > anyone > > >>> has figured it out. I am running: > > >>> > > >>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 > 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 > > >>> > > >>> I have the latest LSI IT firmware 13 loaded: > > >>> > > >>> mps1: port 0xc000-0xc0ff mem 0xfe93c000- > 0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 > > >>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd > > >>> mps1: IOCCapabilities: > 1285c c> > > >>> > > >>> All disks are on a SuperMicro SAS II backplane: > > >>> > > >>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist > > >>> at scbus0 target 0 lun 0 > (da0,pass0) > > >>> at scbus0 target 1 lun 0 > (da1,pass1) > > >>> at scbus1 target 8 lun 0 > (da2,pass2) > > >>> .... x16 more of the same > > >>> at scbus1 target 46 lun 0 > (da20,pass20) > > >>> at scbus1 target 47 lun 0 > (ses0,pass21) > > >>> > > >>> Essentially when putting the ZFS filesystem under load, I am > getting > > >>> these sorts of errors: > > >>> > > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 > length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length > 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length > 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 > length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 > length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length > 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length > 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length > 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length > 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length > 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length > 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length > 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length > 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length > 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length > 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length > 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length > 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length > 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 > length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 > length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length > 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length > 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 > > >>> _______________________________________________ > > >>> freebsd-scsi@freebsd.org mailing list > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > >>> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > >> _______________________________________________ > > >> freebsd-scsi@freebsd.org mailing list > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > >> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > >> > > > _______________________________________________ > > > freebsd-scsi@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > > To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"