From owner-freebsd-fs@FreeBSD.ORG Thu Jan 26 11:27:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 401E0106566C for ; Thu, 26 Jan 2012 11:27:10 +0000 (UTC) (envelope-from freebsd@pki2.com) Received: from btw.pki2.com (btw.pki2.com [IPv6:2001:470:a:6fd::2]) by mx1.freebsd.org (Postfix) with ESMTP id EF3AE8FC12 for ; Thu, 26 Jan 2012 11:27:09 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) by btw.pki2.com (8.14.5/8.14.5) with ESMTP id q0QBQxbi035582; Thu, 26 Jan 2012 03:27:00 -0800 (PST) (envelope-from freebsd@pki2.com) From: Dennis Glatting To: Peter Maloney In-Reply-To: <4F211FC7.3080709@brockmann-consult.de> References: <4F192ADA.5020903@brockmann-consult.de> <1327069331.29444.4.camel@btw.pki2.com> <4F197F8D.7010404@brockmann-consult.de> <4F211FC7.3080709@brockmann-consult.de> Content-Type: text/plain; charset="ISO-8859-1" Date: Thu, 26 Jan 2012 03:26:59 -0800 Message-ID: <1327577219.19717.13.camel@btw.pki2.com> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit X-yoursite-MailScanner-Information: Dennis Glatting X-yoursite-MailScanner-ID: q0QBQxbi035582 X-yoursite-MailScanner: Found to be clean X-MailScanner-From: freebsd@pki2.com Cc: freebsd-fs@freebsd.org Subject: Re: sanity check: is 9211-8i, on 8.3, with IT firmware still "the one" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2012 11:27:10 -0000 On Thu, 2012-01-26 at 10:41 +0100, Peter Maloney wrote: > On 01/20/2012 03:51 PM, Peter Maloney wrote: > > On 01/20/2012 03:22 PM, Dennis Glatting wrote: > >> I am having a problem with Seagate ST1000DL002 disks but I haven't yet > >> determined weather it is the disks themselves (they -- two of them, new > >> -- fail under a MB controller too. > > I happen to have some ST2000DL003 disks on hand (same as yours, but 2TB > > instead of 1, and I don't know what firmware)... I could try my hot pull > > test with them to see what happens. > Update: I tested it, and it fails much like the Crucial SSD with old > firmware, except: > > -with the SSD, I could still use smartctl to see the disk afterwards, > but not with the Seagate Green. (I didn't verify this with the SSD, but > I think it has a /dev/da# device, but the Seagate does not) > -the Seagate Green never comes back at all, but the SSD which is > reported as coming back, but has an error "daasync: Unable to attach to > new device due to status 0x6" which makes the disk unusable until reboot > > So in the distant future, I will test newest firmware (currently using > firmware CC45 I think, and yours is CC32), then send some email to > Seagate about it. And in the near future, I will not be using those > disks in ZFS. > Awesomeness dude. Thanks for the data. > Your disk is firmware CC32 I would assume: > > da12: Fixed Direct Access SCSI-6 device > > > > Seagate Green > > (insert device) > Jan 26 09:52:28 bcnas1bak kernel: mpssas_get_sas_address_for_sata_disk: > got SATA identify successfully for handle = 0x21 with try_count = 1 > Jan 26 09:52:28 bcnas1bak kernel: SAS Address for SATA device = > 1f605d2f7e735344 > Jan 26 09:52:28 bcnas1bak kernel: mpssas_get_sas_address_for_sata_disk: > got SATA identify successfully for handle = 0x21 with try_count = 1 > Jan 26 09:52:28 bcnas1bak kernel: da20 at mpslsi0 bus 0 scbus0 target 55 > lun 0 > Jan 26 09:52:28 bcnas1bak kernel: da20: > Fixed Direct Access SCSI-6 device > Jan 26 09:52:28 bcnas1bak kernel: da20: 600.000MB/s transfers > Jan 26 09:52:28 bcnas1bak kernel: da20: Command Queueing enabled > Jan 26 09:52:28 bcnas1bak kernel: da20: 1907729MB (3907029168 512 byte > sectors: 255H 63S/T 243201C) > (insert another device, make a mirror vdev) > (pull device while writing to it) > Jan 26 09:53:53 bcnas1bak kernel: mpslsi0: mpssas_alloc_tm freezing simq > Jan 26 09:53:53 bcnas1bak kernel: mpslsi0: mpssas_lost_target targetid 55 > Jan 26 09:53:53 bcnas1bak kernel: (da20:mpslsi0:0:55:0): lost device > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d 9c 8b 0 1 0 0 length 131072 SMID 232 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d a2 8b 0 1 0 0 length 131072 SMID 856 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d 9b 8b 0 1 0 0 length 131072 SMID 813 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d a1 8b 0 1 0 0 length 131072 SMID 626 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d a0 8b 0 1 0 0 length 131072 SMID 141 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d 9f 8b 0 1 0 0 length 131072 SMID 250 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d 9e 8b 0 1 0 0 length 131072 SMID 734 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d 9d 8b 0 1 0 0 length 131072 SMID 531 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d a3 8b 0 1 0 0 length 131072 SMID 260 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): WRITE(10). CDB: > 2a 0 0 d 9a 8b 0 1 0 0 length 131072 SMID 503 terminated ioc 804b scsi 0 > state c xfer 0 > Jan 26 09:53:54 bcnas1bak kernel: mpslsi0: IOCStatus = 0x4b while > resetting device 0x21 > Jan 26 09:53:54 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): Synchronize > cache failed, status == 0xa, scsi status == 0x0 > Jan 26 09:53:54 bcnas1bak kernel: (da20:mpslsi0:0:55:0): removing device > entry > (put device back in) > (no further logs) > > And then I tried: > camcontrol reset 0:55:0 (or 0:0:55? forget where the 0 goes) and it said > there was no device. > camcontrol reset 0:54:0 (this is the other disk of the same type that I > had in the same test mirror vdev), and the kernel panicked, and this > appeared in /var/log/messages: > Jan 26 09:57:14 bcnas1bak kernel: mpslsi0: mpssas_action XPT_RESET_DEV > > > > Crucial SSD with firmware 0001 > > (pull device while writing to it) > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): CAM status: > SCSI Status Error > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): SCSI status: > Check Condition > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): SCSI sense: > ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected) > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f2 19 0 0 ff 0 length 130560 SMID 292 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f4 d3 0 0 9e 0 length 80896 SMID 426 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f5 71 0 0 cf 0 length 105984 SMID 978 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f6 40 0 0 b2 0 length 91136 SMID 695 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f6 f2 0 0 9f 0 length 81408 SMID 792 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f3 df 0 0 f4 0 length 124928 SMID 615 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f3 18 0 0 c7 0 length 101888 SMID 645 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 c2 83 ec 0 0 8 0 length 4096 SMID 163 terminated ioc 804b scsi 0 > state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f8 61 0 0 b3 0 length 91648 SMID 222 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f9 14 0 0 ed 0 length 121344 SMID 651 terminated ioc 804b scsi > 0 state 0 xfer 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): READ(10). CDB: > 28 0 0 ce f1 91 0 0 1c 0 > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): CAM status: > SCSI Status Error > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): SCSI status: > Check Condition > Jan 19 14:37:16 bcnas1bak kernel: (da20:mpslsi0:0:46:0): SCSI sense: > ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected) > Jan 19 14:40:05 bcnas1bak kernel: (da20:mpslsi0:0:46:0): lost device > Jan 19 14:40:05 bcnas1bak kernel: mpslsi0: Reset aborted 21 commands > Jan 19 14:40:05 bcnas1bak kernel: mpslsi0: clearing target 46 handle 0x0024 > Jan 19 14:40:05 bcnas1bak kernel: mpslsi0: mpssas_remove_complete on > handle 0x0024, IOCStatus= 0x0 > Jan 19 14:40:05 bcnas1bak kernel: mpslsi0: mpssas_free_tm releasing simq > Jan 19 14:40:05 bcnas1bak kernel: (da20:mpslsi0:0:46:0): Synchronize > cache failed, status == 0x39, scsi status == 0x0 > Jan 19 14:40:05 bcnas1bak kernel: (da20:mpslsi0:0:46:0): removing device > entry > (put device back in) > Jan 19 14:41:32 bcnas1bak kernel: mpssas_get_sas_address_for_sata_disk: > got SATA identify successfully for handle = 0x24 with try_count = 1 > Jan 19 14:41:32 bcnas1bak kernel: SAS Address for SATA device = > d828161ba16c7889 > Jan 19 14:41:33 bcnas1bak kernel: mpssas_get_sas_address_for_sata_disk: > got SATA identify successfully for handle = 0x24 with try_count = 1 > Jan 19 14:41:33 bcnas1bak kernel: da20 at mpslsi0 bus 0 scbus0 target 46 > lun 0 > Jan 19 14:41:33 bcnas1bak kernel: da20: Fixed > Direct Access SCSI-6 device > Jan 19 14:41:33 bcnas1bak kernel: da20: 600.000MB/s transfers > Jan 19 14:41:33 bcnas1bak kernel: da20: Command Queueing enabled > Jan 19 14:41:33 bcnas1bak kernel: da20: 244198MB (500118192 512 byte > sectors: 255H 63S/T 31130C) > Jan 19 14:41:42 bcnas1bak kernel: pid 19175 (gpart), uid 0: exited on > signal 11 (core dumped) > Jan 19 14:42:30 bcnas1bak kernel: mpssas_get_sas_address_for_sata_disk: > got SATA identify successfully for handle = 0x18 with try_count = 1 > Jan 19 14:42:30 bcnas1bak kernel: SAS Address for SATA device = > d828161ba16c748a > Jan 19 14:42:30 bcnas1bak kernel: mpssas_get_sas_address_for_sata_disk: > got SATA identify successfully for handle = 0x18 with try_count = 1 > Jan 19 14:42:31 bcnas1bak kernel: cam_periph_alloc: attempt to > re-allocate valid device da10 rejected > *Jan 19 14:42:31 bcnas1bak kernel: daasync: Unable to attach to new > device due to status 0x6* > (no further logs) > > > What sort of failure is happening? > > > > Do you use a ZIL on a device other than an ST1000DL002? > > > > Please send output of > > smartctl -i > > > > (particularly interested in firmware version) > > > >