From owner-freebsd-current@FreeBSD.ORG Thu May 16 19:38:11 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 7E65D23D for ; Thu, 16 May 2013 19:38:11 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) by mx1.freebsd.org (Postfix) with ESMTP id 280EACEC for ; Thu, 16 May 2013 19:38:10 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.80.1) with esmtp (envelope-from ) id <1Ud403-003s9l-IH>; Thu, 16 May 2013 21:38:03 +0200 Received: from g226179192.adsl.alicedsl.de ([92.226.179.192] helo=[192.168.0.128]) by inpost2.zedat.fu-berlin.de (Exim 4.80.1) with esmtpsa (envelope-from ) id <1Ud403-0000aZ-Cx>; Thu, 16 May 2013 21:38:03 +0200 Subject: Re: CURRENT r250636: ZFS pool destroyed while scrubbing in action and shutdown From: "O. Hartmann" To: Steven Hartland In-Reply-To: References: <1368638448.1549.5.camel@thor.walstatt.dyndns.org> <5193C844.2050404@delphij.net> Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="=-eMq3pt6mhQ+7ZeVQ2QGV" Date: Thu, 16 May 2013 21:38:02 +0200 Message-ID: <1368733082.4643.36.camel@thor.walstatt.dyndns.org> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port X-Originating-IP: 92.226.179.192 Cc: FreeBSD Current , d@delphij.net X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 May 2013 19:38:11 -0000 --=-eMq3pt6mhQ+7ZeVQ2QGV Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable On Thu, 2013-05-16 at 19:42 +0100, Steven Hartland wrote: > ----- Original Message -----=20 > From: "Xin Li" >=20 >=20 > > On 05/15/13 10:20, O. Hartmann wrote: > >> Several machines running FreeBSD 10.0-CURRENT #0 r250636: Tue May > >> 14 21:13:19 CEST 2013 amd64 were scrubbing the pools over the past > >> two days. Since that takes a while, I was sure I could shutdown the > >> boxes and scrubbing will restart next restart automatically. > >>=20 > >> Not this time! On ALL(!) systems (three) the pools remains=20 > >> destroyed/corrupted showing this message(s) (as a representative, I > >> will present only one): >=20 > Can you confirm the HW your running there? >=20 > If your using CAM backed disks can you let me know what your seeing for > 1. sysctl kern.cam.da | grep delete_method =20 > 2. sysctl vfs.zfs.trim >=20 > The reason I ask is I'm investigating an issue with ZFS TRIM, reported > by Ajit Jain, and it tests that have just completed potentially indicate > an issue with either CAM or LSI's firmware when processing Write Same > requests. Such requests may be used the ZFS TRIM depending on the > underlying HW. >=20 > Regards > Steve >=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This e.mail is private and confidential between Multiplay (UK) Ltd. and t= he person or entity to whom it is addressed. In the event of misdirection, = the recipient is prohibited from using, copying, printing or otherwise diss= eminating it or any information contained in it.=20 >=20 > In the event of misdirection, illegible or incomplete transmission please= telephone +44 845 868 1337 > or return the E.mail to postmaster@multiplay.co.uk. Hello Steven. Below hopefully the requested informations, if you need more, please ask. The scenario on all boxes was the same: scrubbing hasn't finished the day before, so the boxes were shutdown over night. In the morning I started them up for a couple of minutes before I left for work and shut them down again and at that point the "crash" happened. At work (no access at the moment) the box (third one) is a LGA2011 system based upon X79 chipset (ASUA P9X79 WS). One pool, a ZFS JBOD, finished scrubbing before I shutdown and reboot the box, but as similar to the above mentioned Core2Duo box, there is a single disk 3 TB ZFS BACKUP pool and it showed the same symptoms. Since no activities were performed on those pools in the short period of activity, there seems to be no harm done so far to the pools. Hardware: Box a)=20 (This box use the single-disk pool) disk in question: scbus6 target 0 lun 0 (pass3,ada3) root@thor:/usr/src # camcontrol devlist at scbus3 target 0 lun 0 (pass0,ada0) at scbus4 target 0 lun 0 (pass1,ada1) at scbus5 target 0 lun 0 (pass2,ada2) at scbus6 target 0 lun 0 (pass3,ada3) at scbus7 target 0 lun 0 (pass4,ada4) at scbus8 target 0 lun 0 (pass5,cd0) at scbus9 target 0 lun 0 (pass6,ses0) at scbus11 target 0 lun 0 (da0,pass7) Core2Duo, SATA chipset is ICH10: root@thor:/usr/src # dmesg | grep ahci ahci0: at channel -1 on atapci0 ahci0: AHCI v1.00 with 2 3Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahci1: port 0xac00-0xac07,0xa880-0xa883,0xa800-0xa807,0xa480-0xa483,0xa400-0xa41f mem 0xfbffe800-0xfbffefff irq 19 at device 31.2 on pci0 ahci1: AHCI v1.20 with 6 3Gbps ports, Port Multiplier supported ahcich2: at channel 0 on ahci1 ahcich3: at channel 1 on ahci1 ahcich4: at channel 2 on ahci1 ahcich5: at channel 3 on ahci1 ahcich6: at channel 4 on ahci1 ahcich7: at channel 5 on ahci1 ahciem0: on ahci1 ses0 at ahciem0 bus 0 scbus9 target 0 lun 0 ada0 at ahcich2 bus 0 scbus3 target 0 lun 0 ada1 at ahcich3 bus 0 scbus4 target 0 lun 0 ada2 at ahcich4 bus 0 scbus5 target 0 lun 0 ada3 at ahcich5 bus 0 scbus6 target 0 lun 0 ada4 at ahcich6 bus 0 scbus7 target 0 lun 0 cd0 at ahcich7 bus 0 scbus8 target 0 lun 0 root@thor:/usr/src # sysctl kern.cam.da | grep delete_method kern.cam.da.0.delete_method: NONE root@thor:/usr/src # sysctl vfs.zfs.trim vfs.zfs.trim.enabled: 1 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.max_interval: 1 //////////////////////////////////////////// Box b): (Box with the RAIDZ-1 pool as reported in the initial message. This pool/machine has a log disk (Samsung SSD 830, 64GB) for ZFS (doesn't matter obviously since the other boxes don't have such a thing). disks in question: scbus4 + scbus5 + scbus6 root@gate [src] camcontrol devlist at scbus0 target 0 lun 0 (ada0,pass0) at scbus2 target 0 lun 0 (ada1,pass1) at scbus4 target 0 lun 0 (ada2,pass2) at scbus5 target 0 lun 0 (ada3,pass3) at scbus6 target 0 lun 0 (ada4,pass4) at scbus7 target 0 lun 0 (pass5,cd0) at scbus8 target 0 lun 0 (pass6,ses0) CPU i3-3220, chipset Intel Z77 SATA root@gate [src] dmesg | grep ahci ahci0: port 0xc050-0xc057,0xc040-0xc043,0xc030-0xc037,0xc020-0xc023,0xc000-0xc01f mem 0xf7c00000-0xf7c001ff irq 19 at device 0.0 on pci4 ahci0: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported ahcich0: at channel 0 on ahci0 ahcich1: at channel 1 on ahci0 ahci1: port 0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f mem 0xf7f16000-0xf7f167ff irq 19 at device 31.2 on pci0 ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported ahcich2: at channel 0 on ahci1 ahcich3: at channel 1 on ahci1 ahcich4: at channel 2 on ahci1 ahcich5: at channel 3 on ahci1 ahcich6: at channel 4 on ahci1 ahcich7: at channel 5 on ahci1 ahciem0: on ahci1 ses0 at ahciem0 bus 0 scbus8 target 0 lun 0 ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada1 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2 at ahcich4 bus 0 scbus4 target 0 lun 0 cd0 at ahcich7 bus 0 scbus7 target 0 lun 0 ada3 at ahcich5 bus 0 scbus5 target 0 lun 0 ada4 at ahcich6 bus 0 scbus6 target 0 lun 0 root@gate [src] sysctl kern.cam.da | grep delete_method root@gate [src]=20 root@gate [src] sysctl vfs.zfs.trim vfs.zfs.trim.enabled: 1 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.max_interval: 1 --=-eMq3pt6mhQ+7ZeVQ2QGV Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (FreeBSD) iQEcBAABAgAGBQJRlTWWAAoJEOgBcD7A/5N8SNMIAIMChSSvTiCHaUGK/uLTumpH 1akiD+BUSvieki+O58qRxRloSm0dScz3/L/pkDuWn2revSFGP1eZvbqD1+ijMjYw n/euxJmJlPW8imGlhG0GB0VYiVi4UErW2a+5FxcKtvqvzzde0nWKolnQS2qlQamw YXcMH2VydU/Cj47mIFOfKZAS0Nb8n08ep3dl3nVPSt5zslVuC83Nd74C9KpZXS2f J7GZtqrUeLt0awZHBwDtnbU3uCIbXrFAQE/j8AZ4OYa51O1jiJ5icm2ZarkaTvnX fubHES6DFNGkwMkasWqPYa5wzxcRRcLx787CM/VtzECEbDrSYMjB9bisLFqWCUs= =BysI -----END PGP SIGNATURE----- --=-eMq3pt6mhQ+7ZeVQ2QGV--