From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 10 19:36:00 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E9FEA1065670 for ; Tue, 10 Apr 2012 19:36:00 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.freebsd.org (Postfix) with ESMTP id 931058FC16 for ; Tue, 10 Apr 2012 19:36:00 +0000 (UTC) Received: from [192.168.179.43] (hmbg-5f762089.pool.mediaWays.net [95.118.32.137]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MMYXO-1SNkai0En8-007zs5; Tue, 10 Apr 2012 21:35:54 +0200 Message-ID: <4F848B93.10402@brockmann-consult.de> Date: Tue, 10 Apr 2012 21:35:47 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <20120410015210.GI9589@deterlab.net> In-Reply-To: <20120410015210.GI9589@deterlab.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:mfQ8sD/liHn4REdeve7H4PZ0CIhORFc4pao9K95VmPI cDqi0nelq8TAIyoel7CUIcc3C7btkYVk94Q80zQWnl1H6EvbsM rZXiDm9ak31B34IE6guZxaHjAjEmUtZW0SBvkqj1Q1/BBigrkP gsAxJbc1upUKsgvymR1Q8kHyHMmB2Yt16U1B8CqIZ0Y88Sg6qD JPu3BMeVgEzLu2dxWk4AdkwRXXc3nwyvZvuaCkuyCoTuweF9Cj f+1S+YtcNHOFsyzSEjSINz+NrT+ROTanLvZxUOt0I9u5wXAVnn rpoAmS8sNgOFwlonSOBsdGmOtmeznZzkMDPcjxbeyCCOp7iAaS rJ23/oLQFAwxadzGdaomVwDjVFZ74Gg4HxZczIC4TB2K9eEisT nR9zgfeqPPLgw== Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 19:36:01 -0000 I found this only happens with specific disks / disk firmware... but nobody seems to listen to me about it. They all seem to blame the driver. (I blame both, but changing disks is a simple fix.) And looking around, most reports are with various Seagates (including one that can cause this type of error with smartctl -a with a SAS Seagate, but cannot reproduce with the binary LSI driver) or Samsung Spinpoints. The only other disk I know of that does this is a Crucial SSD with old firmware. One guy said he can do a camcontrol rescan to get it back; I tried that and get either panics, hangs, or nothing. What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate greens don't seem to have this problem. I have no idea if different disks behave differently with different controllers. I asked Seagate about it and they reply with marketing nonsense about buying enterprise disks instead, and say I should buy disks that are on the specific compatibility list for the HBA. I found that with the few disks that I have that fail randomly (and others), I can reproduce the issue (not exact same symptoms though) by hot pulling the disk while writing something, putting it back, wait a few seconds (<10; less than enough for the SCSI controller to rescan) pull and replace again. The old 2TB seagate greens fail this test, but the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as well as 3TB WD greens. (all enterprise disks I tried pass this test except the Toshiba 2TB ones I tried) If I put a "failed" disk back in, it does not work. If I put it in a different slot, same. But if I put any other disk in, it works fine. So it is the disk, but it is also FreeBSD not being able to reset/rescan it. But it is simple enough to blame both, and since you can't get rid of the driver, get different disks (eg. swap them with some different same sized ones in a different machine). Here is my forum thread about it, including disk product ids for ones I tested, and a huge list of things that don't fix it. http://forums.freebsd.org/showthread.php?t=28252 Peter On 10.04.2012 03:52, John Hickey wrote: > I've seen people having this problem before, but I don't think anyone > has figured it out. I am running: > > FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 > > I have the latest LSI IT firmware 13 loaded: > > mps1: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 > mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd > mps1: IOCCapabilities: 1285c > > All disks are on a SuperMicro SAS II backplane: > > root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist > at scbus0 target 0 lun 0 (da0,pass0) > at scbus0 target 1 lun 0 (da1,pass1) > at scbus1 target 8 lun 0 (da2,pass2) > .... x16 more of the same > at scbus1 target 46 lun 0 (da20,pass20) > at scbus1 target 47 lun 0 (ses0,pass21) > > Essentially when putting the ZFS filesystem under load, I am getting > these sorts of errors: > > (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 > (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 > (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 > (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 > (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 > (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 > (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 > (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 > (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 > (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 > (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 > (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 > (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 > (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 > (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 > (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org"