From owner-freebsd-scsi@FreeBSD.ORG Mon Apr 9 11:07:21 2012 Return-Path: Delivered-To: freebsd-scsi@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 666B4106566C for ; Mon, 9 Apr 2012 11:07:21 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 502898FC16 for ; Mon, 9 Apr 2012 11:07:21 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q39B7LeO039727 for ; Mon, 9 Apr 2012 11:07:21 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q39B7KAu039725 for freebsd-scsi@FreeBSD.org; Mon, 9 Apr 2012 11:07:20 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 9 Apr 2012 11:07:20 GMT Message-Id: <201204091107.q39B7KAu039725@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-scsi@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-scsi@FreeBSD.org X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Apr 2012 11:07:21 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165982 scsi [mpt] mpt instability, drive resets, and losses on Fre o kern/165740 scsi [cam] SCSI code must drain callbacks before free o kern/163713 scsi [aic7xxx] [patch] Add Adaptec29329LPE to aic79xx_pci.c o kern/162256 scsi [mpt] QUEUE FULL EVENT and 'mpt_cam_event: 0x0' o kern/161809 scsi [cam] [patch] set kern.cam.boot_delay via build option o kern/159412 scsi [ciss] 7.3 RELEASE: ciss0 ADAPTER HEARTBEAT FAILED err o kern/157770 scsi [iscsi] [panic] iscsi_initiator panic o kern/154432 scsi [xpt] run_interrupt_driven_hooks: still waiting after o kern/153514 scsi [cam] [panic] CAM related panic o kern/153361 scsi [ciss] Smart Array 5300 boot/detect drive problem o kern/152250 scsi [ciss] [patch] Kernel panic when hw.ciss.expose_hidden o kern/151564 scsi [ciss] ciss(4) should increase CISS_MAX_LOGICAL to 10 o docs/151336 scsi Missing documentation of scsi_ and ata_ functions in c s kern/149927 scsi [cam] hard drive not stopped before removing power dur o kern/148083 scsi [aac] Strange device reporting o kern/147704 scsi [mpt] sys/dev/mpt: new chip revision, partially unsupp o kern/146287 scsi [ciss] ciss(4) cannot see more than one SmartArray con o kern/145768 scsi [mpt] can't perform I/O on SAS based SAN disk in freeb o kern/144648 scsi [aac] Strange values of speed and bus width in dmesg o kern/144301 scsi [ciss] [hang] HP proliant server locks when using ciss o kern/142351 scsi [mpt] LSILogic driver performance problems o kern/134488 scsi [mpt] MPT SCSI driver probes max. 8 LUNs per device o kern/132250 scsi [ciss] ciss driver does not support more then 15 drive o kern/132206 scsi [mpt] system panics on boot when mirroring and 2nd dri o kern/130621 scsi [mpt] tranfer rate is inscrutable slow when use lsi213 o kern/129602 scsi [ahd] ahd(4) gets confused and wedges SCSI bus o kern/128452 scsi [sa] [panic] Accessing SCSI tape drive randomly crashe o kern/128245 scsi [scsi] "inquiry data fails comparison at DV1 step" [re o kern/127927 scsi [isp] isp(4) target driver crashes kernel when set up o kern/127717 scsi [ata] [patch] [request] - support write cache toggling o kern/123674 scsi [ahc] ahc driver dumping o kern/123520 scsi [ahd] unable to boot from net while using ahd o sparc/121676 scsi [iscsi] iscontrol do not connect iscsi-target on sparc o kern/120487 scsi [sg] scsi_sg incompatible with scanners o kern/120247 scsi [mpt] FreeBSD 6.3 and LSI Logic 1030 = only 3.300MB/s o kern/114597 scsi [sym] System hangs at SCSI bus reset with dual HBAs o kern/110847 scsi [ahd] Tyan U320 onboard problem with more than 3 disks o kern/99954 scsi [ahc] reading from DVD failes on 6.x [regression] o kern/92798 scsi [ahc] SCSI problem with timeouts o kern/90282 scsi [sym] SCSI bus resets cause loss of ch device o kern/76178 scsi [ahd] Problem with ahd and large SCSI Raid system o kern/74627 scsi [ahc] [hang] Adaptec 2940U2W Can't boot 5.3 s kern/61165 scsi [panic] kernel page fault after calling cam_send_ccb o kern/60641 scsi [sym] Sporadic SCSI bus resets with 53C810 under load o kern/60598 scsi wire down of scsi devices conflicts with config s kern/57398 scsi [mly] Current fails to install on mly(4) based RAID di o kern/52638 scsi [panic] SCSI U320 on SMP server won't run faster than o kern/44587 scsi dev/dpt/dpt.h is missing defines required for DPT_HAND o kern/39388 scsi ncr/sym drivers fail with 53c810 and more than 256MB m o kern/35234 scsi World access to /dev/pass? (for scanner) requires acce 50 problems total. From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 10 02:00:40 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 454CB1065670 for ; Tue, 10 Apr 2012 02:00:40 +0000 (UTC) (envelope-from jjh@deterlab.net) Received: from tardis.deterlab.net (tardis.deterlab.net [206.117.25.63]) by mx1.freebsd.org (Postfix) with ESMTP id 337B48FC16 for ; Tue, 10 Apr 2012 02:00:40 +0000 (UTC) Received: by tardis.deterlab.net (Postfix, from userid 1000) id 60CEA3C2700; Mon, 9 Apr 2012 18:52:10 -0700 (PDT) Date: Mon, 9 Apr 2012 18:52:10 -0700 From: John Hickey To: freebsd-scsi@freebsd.org Message-ID: <20120410015210.GI9589@deterlab.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 02:00:40 -0000 I've seen people having this problem before, but I don't think anyone has figured it out. I am running: FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 I have the latest LSI IT firmware 13 loaded: mps1: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd mps1: IOCCapabilities: 1285c All disks are on a SuperMicro SAS II backplane: root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist at scbus0 target 0 lun 0 (da0,pass0) at scbus0 target 1 lun 0 (da1,pass1) at scbus1 target 8 lun 0 (da2,pass2) .... x16 more of the same at scbus1 target 46 lun 0 (da20,pass20) at scbus1 target 47 lun 0 (ses0,pass21) Essentially when putting the ZFS filesystem under load, I am getting these sorts of errors: (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 10 19:36:00 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E9FEA1065670 for ; Tue, 10 Apr 2012 19:36:00 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.freebsd.org (Postfix) with ESMTP id 931058FC16 for ; Tue, 10 Apr 2012 19:36:00 +0000 (UTC) Received: from [192.168.179.43] (hmbg-5f762089.pool.mediaWays.net [95.118.32.137]) by mrelayeu.kundenserver.de (node=mrbap1) with ESMTP (Nemesis) id 0MMYXO-1SNkai0En8-007zs5; Tue, 10 Apr 2012 21:35:54 +0200 Message-ID: <4F848B93.10402@brockmann-consult.de> Date: Tue, 10 Apr 2012 21:35:47 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <20120410015210.GI9589@deterlab.net> In-Reply-To: <20120410015210.GI9589@deterlab.net> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:mfQ8sD/liHn4REdeve7H4PZ0CIhORFc4pao9K95VmPI cDqi0nelq8TAIyoel7CUIcc3C7btkYVk94Q80zQWnl1H6EvbsM rZXiDm9ak31B34IE6guZxaHjAjEmUtZW0SBvkqj1Q1/BBigrkP gsAxJbc1upUKsgvymR1Q8kHyHMmB2Yt16U1B8CqIZ0Y88Sg6qD JPu3BMeVgEzLu2dxWk4AdkwRXXc3nwyvZvuaCkuyCoTuweF9Cj f+1S+YtcNHOFsyzSEjSINz+NrT+ROTanLvZxUOt0I9u5wXAVnn rpoAmS8sNgOFwlonSOBsdGmOtmeznZzkMDPcjxbeyCCOp7iAaS rJ23/oLQFAwxadzGdaomVwDjVFZ74Gg4HxZczIC4TB2K9eEisT nR9zgfeqPPLgw== Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 19:36:01 -0000 I found this only happens with specific disks / disk firmware... but nobody seems to listen to me about it. They all seem to blame the driver. (I blame both, but changing disks is a simple fix.) And looking around, most reports are with various Seagates (including one that can cause this type of error with smartctl -a with a SAS Seagate, but cannot reproduce with the binary LSI driver) or Samsung Spinpoints. The only other disk I know of that does this is a Crucial SSD with old firmware. One guy said he can do a camcontrol rescan to get it back; I tried that and get either panics, hangs, or nothing. What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate greens don't seem to have this problem. I have no idea if different disks behave differently with different controllers. I asked Seagate about it and they reply with marketing nonsense about buying enterprise disks instead, and say I should buy disks that are on the specific compatibility list for the HBA. I found that with the few disks that I have that fail randomly (and others), I can reproduce the issue (not exact same symptoms though) by hot pulling the disk while writing something, putting it back, wait a few seconds (<10; less than enough for the SCSI controller to rescan) pull and replace again. The old 2TB seagate greens fail this test, but the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as well as 3TB WD greens. (all enterprise disks I tried pass this test except the Toshiba 2TB ones I tried) If I put a "failed" disk back in, it does not work. If I put it in a different slot, same. But if I put any other disk in, it works fine. So it is the disk, but it is also FreeBSD not being able to reset/rescan it. But it is simple enough to blame both, and since you can't get rid of the driver, get different disks (eg. swap them with some different same sized ones in a different machine). Here is my forum thread about it, including disk product ids for ones I tested, and a huge list of things that don't fix it. http://forums.freebsd.org/showthread.php?t=28252 Peter On 10.04.2012 03:52, John Hickey wrote: > I've seen people having this problem before, but I don't think anyone > has figured it out. I am running: > > FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 > > I have the latest LSI IT firmware 13 loaded: > > mps1: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 > mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd > mps1: IOCCapabilities: 1285c > > All disks are on a SuperMicro SAS II backplane: > > root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist > at scbus0 target 0 lun 0 (da0,pass0) > at scbus0 target 1 lun 0 (da1,pass1) > at scbus1 target 8 lun 0 (da2,pass2) > .... x16 more of the same > at scbus1 target 46 lun 0 (da20,pass20) > at scbus1 target 47 lun 0 (ses0,pass21) > > Essentially when putting the ZFS filesystem under load, I am getting > these sorts of errors: > > (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 > (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 > (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 > (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 > (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 > (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 > (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 > (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 > (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 > (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 > (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 > (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 > (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 > (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 > (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 > (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 10 21:13:48 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5409D1065670 for ; Tue, 10 Apr 2012 21:13:48 +0000 (UTC) (envelope-from kcreyts@merit.edu) Received: from sfpop-ironport03.merit.edu (sfpop-ironport03.merit.edu [207.75.116.62]) by mx1.freebsd.org (Postfix) with ESMTP id E20408FC12 for ; Tue, 10 Apr 2012 21:13:47 +0000 (UTC) X-Merit-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.75,399,1330923600"; d="scan'208";a="284774299" Received: from merit-mailstore01.merit.edu ([10.108.1.190]) by sfpop-ironport03-ob.merit.edu with ESMTP; 10 Apr 2012 17:12:39 -0400 Date: Tue, 10 Apr 2012 17:12:39 -0400 (EDT) From: Kyle Creyts To: Peter Maloney Message-ID: <780649479.3371324.1334092359691.JavaMail.root@merit-mailstore01> In-Reply-To: <4F848B93.10402@brockmann-consult.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Mailer: Zimbra 7.1.4_GA_2567 (ZimbraWebClient - GC12 (Mac)/7.1.4_GA_2555) Cc: freebsd-scsi@freebsd.org Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 21:13:48 -0000 I am using Hitachi HDS72404 Revision: A250 Deskstar drives, on LSI SAS 9201-16e ----- Original Message ----- From: "Peter Maloney" To: freebsd-scsi@freebsd.org Sent: Tuesday, April 10, 2012 3:35:47 PM Subject: Re: Write Timeouts with MPS I found this only happens with specific disks / disk firmware... but nobody seems to listen to me about it. They all seem to blame the driver. (I blame both, but changing disks is a simple fix.) And looking around, most reports are with various Seagates (including one that can cause this type of error with smartctl -a with a SAS Seagate, but cannot reproduce with the binary LSI driver) or Samsung Spinpoints. The only other disk I know of that does this is a Crucial SSD with old firmware. One guy said he can do a camcontrol rescan to get it back; I tried that and get either panics, hangs, or nothing. What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate greens don't seem to have this problem. I have no idea if different disks behave differently with different controllers. I asked Seagate about it and they reply with marketing nonsense about buying enterprise disks instead, and say I should buy disks that are on the specific compatibility list for the HBA. I found that with the few disks that I have that fail randomly (and others), I can reproduce the issue (not exact same symptoms though) by hot pulling the disk while writing something, putting it back, wait a few seconds (<10; less than enough for the SCSI controller to rescan) pull and replace again. The old 2TB seagate greens fail this test, but the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as well as 3TB WD greens. (all enterprise disks I tried pass this test except the Toshiba 2TB ones I tried) If I put a "failed" disk back in, it does not work. If I put it in a different slot, same. But if I put any other disk in, it works fine. So it is the disk, but it is also FreeBSD not being able to reset/rescan it. But it is simple enough to blame both, and since you can't get rid of the driver, get different disks (eg. swap them with some different same sized ones in a different machine). Here is my forum thread about it, including disk product ids for ones I tested, and a huge list of things that don't fix it. http://forums.freebsd.org/showthread.php?t=28252 Peter On 10.04.2012 03:52, John Hickey wrote: > I've seen people having this problem before, but I don't think anyone > has figured it out. I am running: > > FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 > > I have the latest LSI IT firmware 13 loaded: > > mps1: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 > mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd > mps1: IOCCapabilities: 1285c > > All disks are on a SuperMicro SAS II backplane: > > root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist > at scbus0 target 0 lun 0 (da0,pass0) > at scbus0 target 1 lun 0 (da1,pass1) > at scbus1 target 8 lun 0 (da2,pass2) > .... x16 more of the same > at scbus1 target 46 lun 0 (da20,pass20) > at scbus1 target 47 lun 0 (ses0,pass21) > > Essentially when putting the ZFS filesystem under load, I am getting > these sorts of errors: > > (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 > (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 > (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 > (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 > (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 > (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 > (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 > (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 > (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 > (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 > (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 > (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 > (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 > (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 > (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 > (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 > (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 > (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" _______________________________________________ freebsd-scsi@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-scsi To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 10 22:26:11 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 408CE106566C for ; Tue, 10 Apr 2012 22:26:11 +0000 (UTC) (envelope-from jjh@deterlab.net) Received: from tardis.deterlab.net (tardis.deterlab.net [206.117.25.63]) by mx1.freebsd.org (Postfix) with ESMTP id 28C878FC0C for ; Tue, 10 Apr 2012 22:26:11 +0000 (UTC) Received: from [192.168.1.128] (pod.isi.edu [128.9.168.186]) by tardis.deterlab.net (Postfix) with ESMTPSA id A0C1C3C25FE for ; Tue, 10 Apr 2012 15:26:10 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1257) From: John Hickey In-Reply-To: <4F848B93.10402@brockmann-consult.de> Date: Tue, 10 Apr 2012 15:26:10 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> To: freebsd-scsi@freebsd.org X-Mailer: Apple Mail (2.1257) Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 10 Apr 2012 22:26:11 -0000 I have 19 drives in my array, so changing them isn't that easy. ;-) = They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001 = 0001) and according to LSI documents my whole setup should be supported. = The drives at least aren't being marked as failed. I believe a change = was made a while back to make FreeBSD less sensitive to these sorts of = timeouts. I have had a panic or two on the system, but haven't tracked = down the exact cause yet. John On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: > I found this only happens with specific disks / disk firmware... but > nobody seems to listen to me about it. They all seem to blame the > driver. (I blame both, but changing disks is a simple fix.) >=20 > And looking around, most reports are with various Seagates (including > one that can cause this type of error with smartctl -a with a SAS > Seagate, but cannot reproduce with the binary LSI driver) or Samsung > Spinpoints. The only other disk I know of that does this is a Crucial > SSD with old firmware. One guy said he can do a camcontrol rescan to = get > it back; I tried that and get either panics, hangs, or nothing. >=20 > What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate > greens don't seem to have this problem. I have no idea if different > disks behave differently with different controllers. I asked Seagate > about it and they reply with marketing nonsense about buying = enterprise > disks instead, and say I should buy disks that are on the specific > compatibility list for the HBA. >=20 > I found that with the few disks that I have that fail randomly (and > others), I can reproduce the issue (not exact same symptoms though) by > hot pulling the disk while writing something, putting it back, wait a > few seconds (<10; less than enough for the SCSI controller to rescan) > pull and replace again. The old 2TB seagate greens fail this test, but > the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as > well as 3TB WD greens. (all enterprise disks I tried pass this test > except the Toshiba 2TB ones I tried) >=20 > If I put a "failed" disk back in, it does not work. If I put it in a > different slot, same. But if I put any other disk in, it works fine. = So > it is the disk, but it is also FreeBSD not being able to reset/rescan > it. But it is simple enough to blame both, and since you can't get rid > of the driver, get different disks (eg. swap them with some different > same sized ones in a different machine). >=20 > Here is my forum thread about it, including disk product ids for ones = I > tested, and a huge list of things that don't fix it. > http://forums.freebsd.org/showthread.php?t=3D28252 >=20 > Peter >=20 >=20 > On 10.04.2012 03:52, John Hickey wrote: >> I've seen people having this problem before, but I don't think anyone >> has figured it out. I am running: >>=20 >> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 = PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 >>=20 >> I have the latest LSI IT firmware 13 loaded: >>=20 >> mps1: port 0xc000-0xc0ff mem = 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 >> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd >> mps1: IOCCapabilities: = 1285c= >>=20 >> All disks are on a SuperMicro SAS II backplane: >>=20 >> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist >> at scbus0 target 0 lun 0 = (da0,pass0) >> at scbus0 target 1 lun 0 = (da1,pass1) >> at scbus1 target 8 lun 0 = (da2,pass2) >> .... x16 more of the same >> at scbus1 target 46 lun 0 = (da20,pass20) >> at scbus1 target 47 lun 0 = (ses0,pass21) >>=20 >> Essentially when putting the ZFS filesystem under load, I am getting >> these sorts of errors: >>=20 >> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length = 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 >> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length = 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 >> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length = 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 >> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length = 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 >> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length = 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 >> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length = 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 >> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length = 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 >> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length = 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 >> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length = 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 >> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length = 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 >> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length = 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 >> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length = 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 >> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length = 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 >> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length = 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 >> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length = 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 >> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length = 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 >> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length = 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 >> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length = 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 >> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length = 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 >> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length = 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 >> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length = 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 >> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length = 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >=20 From owner-freebsd-scsi@FreeBSD.ORG Wed Apr 11 05:35:18 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0EA63106566B for ; Wed, 11 Apr 2012 05:35:18 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mx1.freebsd.org (Postfix) with ESMTP id AAE658FC0C for ; Wed, 11 Apr 2012 05:35:17 +0000 (UTC) Received: from [192.168.179.43] (hmbg-5f7604fa.pool.mediaWays.net [95.118.4.250]) by mrelayeu.kundenserver.de (node=mreu1) with ESMTP (Nemesis) id 0MZQZl-1SaKZF1yQP-00LqJS; Wed, 11 Apr 2012 07:35:16 +0200 Message-ID: <4F85180D.5060104@brockmann-consult.de> Date: Wed, 11 Apr 2012 07:35:09 +0200 From: Peter Maloney User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20120327 Thunderbird/11.0.1 MIME-Version: 1.0 To: freebsd-scsi@freebsd.org References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:QgAdAJIAoF1tv2ZPROhURZqXEhgPFgZQFUaoLp5Fb7c ge9G0Dyv54jws0JeSQbAPIqQ1Jl/htBokm5BxomYSflKDGasKX gT5+cIKDL9UG1A524gkxKRQiSdCoG4M4o7Vka2ypLkPGREUGHX a68NfbN/OHCUrKyXCv0h4gYENRwTD/V06JzuK/0ux81Tw+EPCi LufZwU6Q8pWIwoNLrHUu07J819Hp5EdDZiZoH9Y/56G58Odanh MumLl4FsueYK3HuH5R5Rnfv1jB9TCZDMhYqTMq/ON6iILzfTZj vwVLcDall7FckGwBR+L8pb+cDBQJViDlF9Lyjlk68l1z3G5sEa /0QHbuZict4ERfdpmrcrvdTHMPrl+Sg4R4O02u24f8dhvxEco0 EnVWtdWtVni+Q== Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2012 05:35:18 -0000 Well, when I emailed some Seagate people, they just told me to use supported ones. So I suggest you email them about it, telling them it is on the compatibility list, and asking for an explanation and fix (eg. firmware bug fix). You could also say it is fairly common on seagate (and Samsung) disks, and very uncommon with other brands. Peter On 11.04.2012 00:26, John Hickey wrote: > I have 19 drives in my array, so changing them isn't that easy. ;-) They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001 0001) and according to LSI documents my whole setup should be supported. The drives at least aren't being marked as failed. I believe a change was made a while back to make FreeBSD less sensitive to these sorts of timeouts. I have had a panic or two on the system, but haven't tracked down the exact cause yet. > > John > > On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: > >> I found this only happens with specific disks / disk firmware... but >> nobody seems to listen to me about it. They all seem to blame the >> driver. (I blame both, but changing disks is a simple fix.) >> >> And looking around, most reports are with various Seagates (including >> one that can cause this type of error with smartctl -a with a SAS >> Seagate, but cannot reproduce with the binary LSI driver) or Samsung >> Spinpoints. The only other disk I know of that does this is a Crucial >> SSD with old firmware. One guy said he can do a camcontrol rescan to get >> it back; I tried that and get either panics, hangs, or nothing. >> >> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate >> greens don't seem to have this problem. I have no idea if different >> disks behave differently with different controllers. I asked Seagate >> about it and they reply with marketing nonsense about buying enterprise >> disks instead, and say I should buy disks that are on the specific >> compatibility list for the HBA. >> >> I found that with the few disks that I have that fail randomly (and >> others), I can reproduce the issue (not exact same symptoms though) by >> hot pulling the disk while writing something, putting it back, wait a >> few seconds (<10; less than enough for the SCSI controller to rescan) >> pull and replace again. The old 2TB seagate greens fail this test, but >> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as >> well as 3TB WD greens. (all enterprise disks I tried pass this test >> except the Toshiba 2TB ones I tried) >> >> If I put a "failed" disk back in, it does not work. If I put it in a >> different slot, same. But if I put any other disk in, it works fine. So >> it is the disk, but it is also FreeBSD not being able to reset/rescan >> it. But it is simple enough to blame both, and since you can't get rid >> of the driver, get different disks (eg. swap them with some different >> same sized ones in a different machine). >> >> Here is my forum thread about it, including disk product ids for ones I >> tested, and a huge list of things that don't fix it. >> http://forums.freebsd.org/showthread.php?t=28252 >> >> Peter >> >> >> On 10.04.2012 03:52, John Hickey wrote: >>> I've seen people having this problem before, but I don't think anyone >>> has figured it out. I am running: >>> >>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 >>> >>> I have the latest LSI IT firmware 13 loaded: >>> >>> mps1: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 >>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd >>> mps1: IOCCapabilities: 1285c >>> >>> All disks are on a SuperMicro SAS II backplane: >>> >>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist >>> at scbus0 target 0 lun 0 (da0,pass0) >>> at scbus0 target 1 lun 0 (da1,pass1) >>> at scbus1 target 8 lun 0 (da2,pass2) >>> .... x16 more of the same >>> at scbus1 target 46 lun 0 (da20,pass20) >>> at scbus1 target 47 lun 0 (ses0,pass21) >>> >>> Essentially when putting the ZFS filesystem under load, I am getting >>> these sorts of errors: >>> >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 >>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 >>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 >>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 >>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" >> > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Wed Apr 11 07:35:38 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E63B10657A2 for ; Wed, 11 Apr 2012 07:35:38 +0000 (UTC) (envelope-from jjh@deterlab.net) Received: from tardis.deterlab.net (tardis.deterlab.net [206.117.25.63]) by mx1.freebsd.org (Postfix) with ESMTP id 361318FC12 for ; Wed, 11 Apr 2012 07:35:38 +0000 (UTC) Received: by tardis.deterlab.net (Postfix, from userid 1000) id 4DB843C21EB; Wed, 11 Apr 2012 00:35:32 -0700 (PDT) Date: Wed, 11 Apr 2012 00:35:32 -0700 From: John Hickey To: freebsd-scsi@freebsd.org Message-ID: <20120411073532.GC13315@deterlab.net> References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> <4F85180D.5060104@brockmann-consult.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F85180D.5060104@brockmann-consult.de> User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Apr 2012 07:35:38 -0000 I pretty much did this and filed a ticket with Seagate this afternoon. They told me the latest firmware is 0006 (I am at 0001) and wanted the serial numbers of the other drives in the array (probably to confirm firmware compatibility). I suspect I'll have the update in hand tomorrow and see how that works. Running FreeBSD didn't seem to be an issue to them aside from concern about reading the serial numbers without seatools. Only issue with that was that I initially gave them the whole inquiry serial string, but only the first 8 (X) characters of inquiry are the serial number: $ sudo camcontrol inquiry da3 pass3: Fixed Direct Access SCSI-6 device pass3: Serial Number XXXXXXXX0000YYYYYYYY pass3: 600.000MB/s transfers, Command Queueing Enabled John On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote: > Well, when I emailed some Seagate people, they just told me to use > supported ones. So I suggest you email them about it, telling them it is > on the compatibility list, and asking for an explanation and fix (eg. > firmware bug fix). You could also say it is fairly common on seagate > (and Samsung) disks, and very uncommon with other brands. > > Peter > > On 11.04.2012 00:26, John Hickey wrote: > > I have 19 drives in my array, so changing them isn't that easy. ;-) They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001 0001) and according to LSI documents my whole setup should be supported. The drives at least aren't being marked as failed. I believe a change was made a while back to make FreeBSD less sensitive to these sorts of timeouts. I have had a panic or two on the system, but haven't tracked down the exact cause yet. > > > > John > > > > On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: > > > >> I found this only happens with specific disks / disk firmware... but > >> nobody seems to listen to me about it. They all seem to blame the > >> driver. (I blame both, but changing disks is a simple fix.) > >> > >> And looking around, most reports are with various Seagates (including > >> one that can cause this type of error with smartctl -a with a SAS > >> Seagate, but cannot reproduce with the binary LSI driver) or Samsung > >> Spinpoints. The only other disk I know of that does this is a Crucial > >> SSD with old firmware. One guy said he can do a camcontrol rescan to get > >> it back; I tried that and get either panics, hangs, or nothing. > >> > >> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB Seagate > >> greens don't seem to have this problem. I have no idea if different > >> disks behave differently with different controllers. I asked Seagate > >> about it and they reply with marketing nonsense about buying enterprise > >> disks instead, and say I should buy disks that are on the specific > >> compatibility list for the HBA. > >> > >> I found that with the few disks that I have that fail randomly (and > >> others), I can reproduce the issue (not exact same symptoms though) by > >> hot pulling the disk while writing something, putting it back, wait a > >> few seconds (<10; less than enough for the SCSI controller to rescan) > >> pull and replace again. The old 2TB seagate greens fail this test, but > >> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, as > >> well as 3TB WD greens. (all enterprise disks I tried pass this test > >> except the Toshiba 2TB ones I tried) > >> > >> If I put a "failed" disk back in, it does not work. If I put it in a > >> different slot, same. But if I put any other disk in, it works fine. So > >> it is the disk, but it is also FreeBSD not being able to reset/rescan > >> it. But it is simple enough to blame both, and since you can't get rid > >> of the driver, get different disks (eg. swap them with some different > >> same sized ones in a different machine). > >> > >> Here is my forum thread about it, including disk product ids for ones I > >> tested, and a huge list of things that don't fix it. > >> http://forums.freebsd.org/showthread.php?t=28252 > >> > >> Peter > >> > >> > >> On 10.04.2012 03:52, John Hickey wrote: > >>> I've seen people having this problem before, but I don't think anyone > >>> has figured it out. I am running: > >>> > >>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 > >>> > >>> I have the latest LSI IT firmware 13 loaded: > >>> > >>> mps1: port 0xc000-0xc0ff mem 0xfe93c000-0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 > >>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd > >>> mps1: IOCCapabilities: 1285c > >>> > >>> All disks are on a SuperMicro SAS II backplane: > >>> > >>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist > >>> at scbus0 target 0 lun 0 (da0,pass0) > >>> at scbus0 target 1 lun 0 (da1,pass1) > >>> at scbus1 target 8 lun 0 (da2,pass2) > >>> .... x16 more of the same > >>> at scbus1 target 46 lun 0 (da20,pass20) > >>> at scbus1 target 47 lun 0 (ses0,pass21) > >>> > >>> Essentially when putting the ZFS filesystem under load, I am getting > >>> these sorts of errors: > >>> > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 > >>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 > >>> _______________________________________________ > >>> freebsd-scsi@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >>> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > >> _______________________________________________ > >> freebsd-scsi@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > >> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > >> > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" > From owner-freebsd-scsi@FreeBSD.ORG Thu Apr 12 12:30:07 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 7750B1065674; Thu, 12 Apr 2012 12:30:07 +0000 (UTC) (envelope-from Kashyap.Desai@lsi.com) Received: from na3sys009aog113.obsmtp.com (na3sys009aog113.obsmtp.com [74.125.149.209]) by mx1.freebsd.org (Postfix) with ESMTP id A24428FC19; Thu, 12 Apr 2012 12:30:06 +0000 (UTC) Received: from paledge01.lsi.com ([192.19.193.42]) (using TLSv1) by na3sys009aob113.postini.com ([74.125.148.12]) with SMTP ID DSNKT4bKzRSae3lHHdpT7pAUKBWz4qqF/SBb@postini.com; Thu, 12 Apr 2012 05:30:06 PDT Received: from PALHUB01.lsi.com (128.94.213.114) by PALEDGE01.lsi.com (192.19.193.42) with Microsoft SMTP Server (TLS) id 8.3.213.0; Thu, 12 Apr 2012 08:31:35 -0400 Received: from inbexch01.lsi.com (135.36.98.37) by PALHUB01.lsi.com (128.94.213.114) with Microsoft SMTP Server (TLS) id 8.3.213.0; Thu, 12 Apr 2012 08:26:35 -0400 Received: from inbmail01.lsi.com ([135.36.98.64]) by inbexch01.lsi.com ([135.36.98.37]) with mapi; Thu, 12 Apr 2012 17:56:32 +0530 From: "Desai, Kashyap" To: John Hickey , "freebsd-scsi@freebsd.org" Date: Thu, 12 Apr 2012 17:56:31 +0530 Thread-Topic: Write Timeouts with MPS Thread-Index: Ac0XtfWomvZYwb29Qcyuc9142dcY0wA8UCQg Message-ID: References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> <4F85180D.5060104@brockmann-consult.de> <20120411073532.GC13315@deterlab.net> In-Reply-To: <20120411073532.GC13315@deterlab.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "Reddy, Sreekanth" , "Mankani, Krishnaraddi" , "Kenneth D. Merry" Subject: RE: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Apr 2012 12:30:07 -0000 We never see this issue on our test machines. Adding Sreekanth and he will plan to reproduce this issue locally to have f= urther analysis on issue. Please help Sreekanth to reproduce it locally. ~ Kashyap > -----Original Message----- > From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- > scsi@freebsd.org] On Behalf Of John Hickey > Sent: Wednesday, April 11, 2012 1:06 PM > To: freebsd-scsi@freebsd.org > Subject: Re: Write Timeouts with MPS >=20 > I pretty much did this and filed a ticket with Seagate this afternoon. > They told me the latest firmware is 0006 (I am at 0001) and wanted > the serial numbers of the other drives in the array (probably to > confirm firmware compatibility). I suspect I'll have the update in > hand tomorrow and see how that works. Running FreeBSD didn't seem to > be an issue to them aside from concern about reading the serial numbers > without seatools. Only issue with that was that I initially gave them > the whole inquiry serial string, but only the first 8 (X) characters of > inquiry are the serial number: >=20 > $ sudo camcontrol inquiry da3 > pass3: Fixed Direct Access SCSI-6 device > pass3: Serial Number XXXXXXXX0000YYYYYYYY > pass3: 600.000MB/s transfers, Command Queueing Enabled >=20 > John >=20 > On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote: > > Well, when I emailed some Seagate people, they just told me to use > > supported ones. So I suggest you email them about it, telling them it > is > > on the compatibility list, and asking for an explanation and fix (eg. > > firmware bug fix). You could also say it is fairly common on seagate > > (and Samsung) disks, and very uncommon with other brands. > > > > Peter > > > > On 11.04.2012 00:26, John Hickey wrote: > > > I have 19 drives in my array, so changing them isn't that easy. ;-) > They are Seagate Constellation ES 2TB SAS drives (SEAGATE ST2000NM0001 > 0001) and according to LSI documents my whole setup should be supported. > The drives at least aren't being marked as failed. I believe a change > was made a while back to make FreeBSD less sensitive to these sorts of > timeouts. I have had a panic or two on the system, but haven't tracked > down the exact cause yet. > > > > > > John > > > > > > On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: > > > > > >> I found this only happens with specific disks / disk firmware... > but > > >> nobody seems to listen to me about it. They all seem to blame the > > >> driver. (I blame both, but changing disks is a simple fix.) > > >> > > >> And looking around, most reports are with various Seagates > (including > > >> one that can cause this type of error with smartctl -a with a SAS > > >> Seagate, but cannot reproduce with the binary LSI driver) or > Samsung > > >> Spinpoints. The only other disk I know of that does this is a > Crucial > > >> SSD with old firmware. One guy said he can do a camcontrol rescan > to get > > >> it back; I tried that and get either panics, hangs, or nothing. > > >> > > >> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB > Seagate > > >> greens don't seem to have this problem. I have no idea if different > > >> disks behave differently with different controllers. I asked > Seagate > > >> about it and they reply with marketing nonsense about buying > enterprise > > >> disks instead, and say I should buy disks that are on the specific > > >> compatibility list for the HBA. > > >> > > >> I found that with the few disks that I have that fail randomly (and > > >> others), I can reproduce the issue (not exact same symptoms though) > by > > >> hot pulling the disk while writing something, putting it back, wait > a > > >> few seconds (<10; less than enough for the SCSI controller to > rescan) > > >> pull and replace again. The old 2TB seagate greens fail this test, > but > > >> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, > as > > >> well as 3TB WD greens. (all enterprise disks I tried pass this test > > >> except the Toshiba 2TB ones I tried) > > >> > > >> If I put a "failed" disk back in, it does not work. If I put it in > a > > >> different slot, same. But if I put any other disk in, it works > fine. So > > >> it is the disk, but it is also FreeBSD not being able to > reset/rescan > > >> it. But it is simple enough to blame both, and since you can't get > rid > > >> of the driver, get different disks (eg. swap them with some > different > > >> same sized ones in a different machine). > > >> > > >> Here is my forum thread about it, including disk product ids for > ones I > > >> tested, and a huge list of things that don't fix it. > > >> http://forums.freebsd.org/showthread.php?t=3D28252 > > >> > > >> Peter > > >> > > >> > > >> On 10.04.2012 03:52, John Hickey wrote: > > >>> I've seen people having this problem before, but I don't think > anyone > > >>> has figured it out. I am running: > > >>> > > >>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 > 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 > > >>> > > >>> I have the latest LSI IT firmware 13 loaded: > > >>> > > >>> mps1: port 0xc000-0xc0ff mem 0xfe93c000- > 0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 > > >>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd > > >>> mps1: IOCCapabilities: > 1285c c> > > >>> > > >>> All disks are on a SuperMicro SAS II backplane: > > >>> > > >>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist > > >>> at scbus0 target 0 lun 0 > (da0,pass0) > > >>> at scbus0 target 1 lun 0 > (da1,pass1) > > >>> at scbus1 target 8 lun 0 > (da2,pass2) > > >>> .... x16 more of the same > > >>> at scbus1 target 46 lun 0 > (da20,pass20) > > >>> at scbus1 target 47 lun 0 > (ses0,pass21) > > >>> > > >>> Essentially when putting the ZFS filesystem under load, I am > getting > > >>> these sorts of errors: > > >>> > > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 > length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 length > 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 length > 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 > length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 > length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length > 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length > 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 length > 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 length > 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length > 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length > 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 length > 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 length > 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 length > 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 length > 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 length > 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length > 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length > 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 > length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 > length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 length > 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 > > >>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 length > 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 > > >>> _______________________________________________ > > >>> freebsd-scsi@freebsd.org mailing list > > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > >>> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > >> _______________________________________________ > > >> freebsd-scsi@freebsd.org mailing list > > >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > >> To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > >> > > > _______________________________________________ > > > freebsd-scsi@freebsd.org mailing list > > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > > To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > > > _______________________________________________ > > freebsd-scsi@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > > To unsubscribe, send any mail to "freebsd-scsi- > unsubscribe@freebsd.org" > > > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to "freebsd-scsi-unsubscribe@freebsd.org" From owner-freebsd-scsi@FreeBSD.ORG Thu Apr 12 20:16:44 2012 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8CF52106564A; Thu, 12 Apr 2012 20:16:44 +0000 (UTC) (envelope-from jjh@deterlab.net) Received: from tardis.deterlab.net (tardis.deterlab.net [206.117.25.63]) by mx1.freebsd.org (Postfix) with ESMTP id 6B90F8FC20; Thu, 12 Apr 2012 20:16:44 +0000 (UTC) Received: from [192.168.1.128] (pod.isi.edu [128.9.168.186]) by tardis.deterlab.net (Postfix) with ESMTPSA id 6F6723C0610; Thu, 12 Apr 2012 13:16:38 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=us-ascii From: John Hickey In-Reply-To: Date: Thu, 12 Apr 2012 13:16:38 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <54373403-939F-4FC5-9A2E-40B2304EB518@deterlab.net> References: <20120410015210.GI9589@deterlab.net> <4F848B93.10402@brockmann-consult.de> <4F85180D.5060104@brockmann-consult.de> <20120411073532.GC13315@deterlab.net> To: "Desai, Kashyap" X-Mailer: Apple Mail (2.1257) Cc: "freebsd-scsi@freebsd.org" , "Mankani, Krishnaraddi" , "Kenneth D. Merry" , "Reddy, Sreekanth" Subject: Re: Write Timeouts with MPS X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Apr 2012 20:16:44 -0000 I have a firmware update in hand for the drives. I am going to update = my drives and see if I can still reproduce this. John On Apr 12, 2012, at 5:26 AM, Desai, Kashyap wrote: > We never see this issue on our test machines. > Adding Sreekanth and he will plan to reproduce this issue locally to = have further analysis on issue. >=20 > Please help Sreekanth to reproduce it locally. >=20 >=20 > ~ Kashyap >=20 >> -----Original Message----- >> From: owner-freebsd-scsi@freebsd.org [mailto:owner-freebsd- >> scsi@freebsd.org] On Behalf Of John Hickey >> Sent: Wednesday, April 11, 2012 1:06 PM >> To: freebsd-scsi@freebsd.org >> Subject: Re: Write Timeouts with MPS >>=20 >> I pretty much did this and filed a ticket with Seagate this = afternoon. >> They told me the latest firmware is 0006 (I am at 0001) and wanted >> the serial numbers of the other drives in the array (probably to >> confirm firmware compatibility). I suspect I'll have the update in >> hand tomorrow and see how that works. Running FreeBSD didn't seem to >> be an issue to them aside from concern about reading the serial = numbers >> without seatools. Only issue with that was that I initially gave = them >> the whole inquiry serial string, but only the first 8 (X) characters = of >> inquiry are the serial number: >>=20 >> $ sudo camcontrol inquiry da3 >> pass3: Fixed Direct Access SCSI-6 = device >> pass3: Serial Number XXXXXXXX0000YYYYYYYY >> pass3: 600.000MB/s transfers, Command Queueing Enabled >>=20 >> John >>=20 >> On Wed, Apr 11, 2012 at 07:35:09AM +0200, Peter Maloney wrote: >>> Well, when I emailed some Seagate people, they just told me to use >>> supported ones. So I suggest you email them about it, telling them = it >> is >>> on the compatibility list, and asking for an explanation and fix = (eg. >>> firmware bug fix). You could also say it is fairly common on seagate >>> (and Samsung) disks, and very uncommon with other brands. >>>=20 >>> Peter >>>=20 >>> On 11.04.2012 00:26, John Hickey wrote: >>>> I have 19 drives in my array, so changing them isn't that easy. ;-) >> They are Seagate Constellation ES 2TB SAS drives (SEAGATE = ST2000NM0001 >> 0001) and according to LSI documents my whole setup should be = supported. >> The drives at least aren't being marked as failed. I believe a = change >> was made a while back to make FreeBSD less sensitive to these sorts = of >> timeouts. I have had a panic or two on the system, but haven't = tracked >> down the exact cause yet. >>>>=20 >>>> John >>>>=20 >>>> On Apr 10, 2012, at 12:35 PM, Peter Maloney wrote: >>>>=20 >>>>> I found this only happens with specific disks / disk firmware... >> but >>>>> nobody seems to listen to me about it. They all seem to blame the >>>>> driver. (I blame both, but changing disks is a simple fix.) >>>>>=20 >>>>> And looking around, most reports are with various Seagates >> (including >>>>> one that can cause this type of error with smartctl -a with a SAS >>>>> Seagate, but cannot reproduce with the binary LSI driver) or >> Samsung >>>>> Spinpoints. The only other disk I know of that does this is a >> Crucial >>>>> SSD with old firmware. One guy said he can do a camcontrol rescan >> to get >>>>> it back; I tried that and get either panics, hangs, or nothing. >>>>>=20 >>>>> What HBA are you using? With my LSI 9211-8i HBAs, the new 3TB >> Seagate >>>>> greens don't seem to have this problem. I have no idea if = different >>>>> disks behave differently with different controllers. I asked >> Seagate >>>>> about it and they reply with marketing nonsense about buying >> enterprise >>>>> disks instead, and say I should buy disks that are on the specific >>>>> compatibility list for the HBA. >>>>>=20 >>>>> I found that with the few disks that I have that fail randomly = (and >>>>> others), I can reproduce the issue (not exact same symptoms = though) >> by >>>>> hot pulling the disk while writing something, putting it back, = wait >> a >>>>> few seconds (<10; less than enough for the SCSI controller to >> rescan) >>>>> pull and replace again. The old 2TB seagate greens fail this test, >> but >>>>> the 3TB ones pass. All 2 and 3 TB Hitachis I tried pass this test, >> as >>>>> well as 3TB WD greens. (all enterprise disks I tried pass this = test >>>>> except the Toshiba 2TB ones I tried) >>>>>=20 >>>>> If I put a "failed" disk back in, it does not work. If I put it in >> a >>>>> different slot, same. But if I put any other disk in, it works >> fine. So >>>>> it is the disk, but it is also FreeBSD not being able to >> reset/rescan >>>>> it. But it is simple enough to blame both, and since you can't get >> rid >>>>> of the driver, get different disks (eg. swap them with some >> different >>>>> same sized ones in a different machine). >>>>>=20 >>>>> Here is my forum thread about it, including disk product ids for >> ones I >>>>> tested, and a huge list of things that don't fix it. >>>>> http://forums.freebsd.org/showthread.php?t=3D28252 >>>>>=20 >>>>> Peter >>>>>=20 >>>>>=20 >>>>> On 10.04.2012 03:52, John Hickey wrote: >>>>>> I've seen people having this problem before, but I don't think >> anyone >>>>>> has figured it out. I am running: >>>>>>=20 >>>>>> FreeBSD zfs 10.0-CURRENT FreeBSD 10.0-CURRENT #5: Sat Apr 7 >> 18:05:57 PDT 2012 root@zfs:/usr/obj/usr/src/sys/GENERIC amd64 >>>>>>=20 >>>>>> I have the latest LSI IT firmware 13 loaded: >>>>>>=20 >>>>>> mps1: port 0xc000-0xc0ff mem 0xfe93c000- >> 0xfe93ffff,0xfe940000-0xfe97ffff irq 16 at device 0.0 on pci5 >>>>>> mps1: Firmware: 13.00.01.00, Driver: 13.00.00.00-fbsd >>>>>> mps1: IOCCapabilities: >> = 1285c> c> >>>>>>=20 >>>>>> All disks are on a SuperMicro SAS II backplane: >>>>>>=20 >>>>>> root@zfs:/usr/ports/sysutils/dmidecode# camcontrol devlist >>>>>> at scbus0 target 0 lun 0 >> (da0,pass0) >>>>>> at scbus0 target 1 lun 0 >> (da1,pass1) >>>>>> at scbus1 target 8 lun 0 >> (da2,pass2) >>>>>> .... x16 more of the same >>>>>> at scbus1 target 46 lun 0 >> (da20,pass20) >>>>>> at scbus1 target 47 lun 0 >> (ses0,pass21) >>>>>>=20 >>>>>> Essentially when putting the ZFS filesystem under load, I am >> getting >>>>>> these sorts of errors: >>>>>>=20 >>>>>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 29 32 f2 0 1 0 0 >> length 131072 SMID 213 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 3d fa ae 0 1 0 0 = length >> 131072 SMID 386 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 a 24 ee 0 1 0 0 = length >> 131072 SMID 542 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 2a c6 b1 0 1 0 0 >> length 131072 SMID 214 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da16:mps1:0:25:0): WRITE(10). CDB: 2a 0 19 2b 83 aa 0 1 0 0 >> length 131072 SMID 879 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 40 d f9 0 1 0 0 length >> 131072 SMID 474 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 c 3 31 0 1 0 0 length >> 131072 SMID 578 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 41 6f ff 0 1 0 0 = length >> 131072 SMID 703 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 c e5 2e 0 1 0 0 = length >> 131072 SMID 684 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 41 b1 4b 0 1 0 0 length >> 131072 SMID 212 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da9:mps1:0:15:0): WRITE(10). CDB: 2a 0 18 d 1e 5c 0 1 0 0 length >> 131072 SMID 63 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 56 1c 0 1 0 0 = length >> 131072 SMID 412 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da4:mps1:0:10:0): WRITE(10). CDB: 2a 0 19 42 2c f1 0 1 0 0 = length >> 131072 SMID 1019 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da11:mps1:0:18:0): WRITE(10). CDB: 2a 0 18 d 6d 22 0 1 0 0 = length >> 131072 SMID 175 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da7:mps1:0:13:0): WRITE(10). CDB: 2a 0 19 42 62 bc 0 1 0 0 = length >> 131072 SMID 458 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da10:mps1:0:16:0): WRITE(10). CDB: 2a 0 18 f 4b d2 0 1 0 0 = length >> 131072 SMID 986 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da3:mps1:0:9:0): WRITE(10). CDB: 2a 0 19 43 f4 50 0 1 0 0 length >> 131072 SMID 809 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da2:mps1:0:8:0): WRITE(10). CDB: 2a 0 19 45 4 18 0 1 0 0 length >> 131072 SMID 998 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da13:mps1:0:21:0): WRITE(10). CDB: 2a 0 19 30 e4 73 0 1 0 0 >> length 131072 SMID 489 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da12:mps1:0:19:0): WRITE(10). CDB: 2a 0 18 10 8d 19 0 1 0 0 >> length 131072 SMID 275 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da14:mps1:0:22:0): WRITE(10). CDB: 2a 0 19 32 e7 0 0 1 0 0 = length >> 131072 SMID 666 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> (da8:mps1:0:14:0): WRITE(10). CDB: 2a 0 18 13 2b 68 0 1 0 0 = length >> 131072 SMID 463 terminated ioc 804b scsi 0 state c xfer 0 >>>>>> _______________________________________________ >>>>>> freebsd-scsi@freebsd.org mailing list >>>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>>> To unsubscribe, send any mail to "freebsd-scsi- >> unsubscribe@freebsd.org" >>>>> _______________________________________________ >>>>> freebsd-scsi@freebsd.org mailing list >>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>>> To unsubscribe, send any mail to "freebsd-scsi- >> unsubscribe@freebsd.org" >>>>>=20 >>>> _______________________________________________ >>>> freebsd-scsi@freebsd.org mailing list >>>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>>> To unsubscribe, send any mail to "freebsd-scsi- >> unsubscribe@freebsd.org" >>>=20 >>> _______________________________________________ >>> freebsd-scsi@freebsd.org mailing list >>> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >>> To unsubscribe, send any mail to "freebsd-scsi- >> unsubscribe@freebsd.org" >>>=20 >> _______________________________________________ >> freebsd-scsi@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi >> To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org" >=20