From owner-freebsd-current@FreeBSD.ORG Sun Oct 27 12:40:46 2013 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id D13975CF for ; Sun, 27 Oct 2013 12:40:46 +0000 (UTC) (envelope-from ohartman@zedat.fu-berlin.de) Received: from outpost1.zedat.fu-berlin.de (outpost1.zedat.fu-berlin.de [130.133.4.66]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 67394265E for ; Sun, 27 Oct 2013 12:40:46 +0000 (UTC) Received: from inpost2.zedat.fu-berlin.de ([130.133.4.69]) by outpost1.zedat.fu-berlin.de (Exim 4.80.1) for freebsd-current@freebsd.org with esmtp (envelope-from ) id <1VaPe8-0026Gt-0b>; Sun, 27 Oct 2013 13:40:44 +0100 Received: from f052012003.adsl.alicedsl.de ([78.52.12.3] helo=thor.walstatt.dyndns.org) by inpost2.zedat.fu-berlin.de (Exim 4.80.1) for freebsd-current@freebsd.org with esmtpsa (envelope-from ) id <1VaPe7-000m01-Rj>; Sun, 27 Oct 2013 13:40:43 +0100 Date: Sun, 27 Oct 2013 13:40:39 +0100 From: "O. Hartmann" To: FreeBSD CURRENT Subject: ZFS buggy in CURRENT? Stuck in [zio->io_cv] forever! Message-ID: <20131027134039.574849f5@thor.walstatt.dyndns.org> Organization: FU Berlin X-Mailer: Claws Mail 3.9.2 (GTK+ 2.24.19; amd64-portbld-freebsd11.0) Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/Q6DFIaBI2_tjGfCOVEiz0gg"; protocol="application/pgp-signature" X-Originating-IP: 78.52.12.3 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 27 Oct 2013 12:40:47 -0000 --Sig_/Q6DFIaBI2_tjGfCOVEiz0gg Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable I have setup a RAIDZ pool comprised from 4 3TB HDDs. To maintain 4k block alignment, I followed the instructions given on several sites and I'll sketch them here for the protocol. The operating system is 11.0-CURRENT AND 10.0-BETA2. create a GPT partition on each drive and add one whole-covering partition with the option gpart add -t freebsd-zfs -b 1M -l disk0[0-3] ada[3-6] gnop create -S4096 gtp/disk[3-6] Because I added a disk to an existing RAIDZ, I exported the former ZFS pool, then I deleted on each disk the partition and then destroyed the GPT scheme. The former pool had a ZIL and CACHE residing on the same SSD, partioned. I didn't kill or destroy the partitions on that SSD. To align 4k blocks, I also created on the existing gpt/log00 and gpt/cache00 via=20 gnop create -S4096 gpt/log00|gpt/cache00 the NOP overlays. After I created a new pool via zpool create POOL gpt/disk0[0-3].nop log gpt/log00.nop cache gpt/cache00.nop I "received" a snapshot taken and sent to another storage array, after I the newly created pool didn't show up any signs of illness or corruption. After ~10 hours of receiving the backup, I exported that pool amongst the backup pool, destroyed the appropriate .nop device entries via=20 gnop destroy gpt/disk0[0-3] and the same for cache and log and tried to check via=20 zpool import whether my pool (as well as the backup pool) shows up. And here the nasty mess starts! The "zpool import" command issued on console is now stuck for hours and can not be interrupted via Ctrl-C! No pool shows up! Hitting Ctrl-T shows a state like ... cmd: zpool 4317 [zio->io_cv]: 7345.34r 0.00 [...] Looking with=20 systat -vm 1 at the trhoughput of the CAM devices I realise that two of the four RAIDZ-comprising drives show activities, having 7000 - 8000 tps and ~ 30 MB/s bandwidth - the other two zero! And the pool is still inactive, the console is stuck. Well, this made my day! At this point, I try to understand what's going wrong and try to recall what I did the last time different when the same procedure on three disks on the same hardware worked for me. Now after 10 hours copy orgy and the need for the working array I start believing that using ZFS is still peppered with too many development-like flaws rendering it risky on FreeBSD. Colleagues working on SOLARIS on ZFS I consulted never saw those stuck-behaviour like I realise this moment. I don not want to repeat the procedure again. There must be a possibility to import the pool - even the backup pool, which is working, untouched by the work, should be able to import - but it doesn't. If I address that pool, while this crap "zpool import" command is still blocking the console, not willing to die even with "killall -9 zpool", I can not import the backup pool via "zpool import BACKUP00". The console gets stuck immediately and for the eternity without any notice. Htting Ctrl-T says something like=20 load: 3.59 cmd: zpool 46199 [spa_namespace_lock] 839.18r 0.00u 0.00s 0% 3036k which means I can not even import the backup facility and this means really no fun. --Sig_/Q6DFIaBI2_tjGfCOVEiz0gg Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (FreeBSD) iQEcBAEBAgAGBQJSbQnLAAoJEOgBcD7A/5N8fk0IANrkKBiKZM9VxbJ18x+JpZ6m N+viX6Syz0fts87DXeVrQfx2V4UpS8sTq5aVCe20U9f3PRqX6T3jLHF+TakkpVDV IY9J5uevCqKQzkx4p4K1nCRWazWBPWHnBMH4VlciOD8z6PkiFU+o0NryXcz/6iJS hPInKOTmCHPUeP6DLP8tetzQttxL4GxFrScKtd/EF9+7TKZ1I4ZQZ7ZZ3ITRY5cR CRuxsjFdgXxQFq7zO70dp/1cRTJJUcJA/rDKIxKqMibEoqCDMQ7EYMMxR7chNlAP eOnNrNhouRzCa38pm+y8P4h9rFi78+J2lIOjHnTDrvsRU+wuctK3aOYFx48W4Sk= =gyCZ -----END PGP SIGNATURE----- --Sig_/Q6DFIaBI2_tjGfCOVEiz0gg--