From owner-freebsd-stable@FreeBSD.ORG Fri Oct 10 00:00:11 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B9CAAA3B for ; Fri, 10 Oct 2014 00:00:11 +0000 (UTC) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) by mx1.freebsd.org (Postfix) with ESMTP id 44252F39 for ; Fri, 10 Oct 2014 00:00:11 +0000 (UTC) Received: from amavis-proxy-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3jDTtd3Mq1z2XV for ; Fri, 10 Oct 2014 02:00:09 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= content-transfer-encoding:content-type:content-type:subject :subject:mime-version:user-agent:organization:from:from:date :date:message-id:received:received:received; s=jakla4; t= 1412899204; x=1415491205; bh=Qzl/v0MuT9Y37KtDOXIROU9MTG91MQMkoKh rj31V6Cc=; b=o/jpdkcowtHeZVRfRetPz7vowXpSbZiPkrVsgPBpKgjwWnhQzKs EH8fur6D5chzEXS6wKkKSOTCLdnJ3t//EQ1EeYv04g9lhiPbAkwxrDEbogdi2oQY jVgODM7zxOpPZHQRy/kX2cwUbZWfuaJl2IHE6ppySWGkQecCjbAJzVEc= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-proxy-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10012) with ESMTP id jVJC2NFbKIUR for ; Fri, 10 Oct 2014 02:00:04 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP for ; Fri, 10 Oct 2014 02:00:04 +0200 (CEST) Received: from sleepy.ijs.si (msleepy-1-pt.tunnel.tserv27.prg1.ipv6.he.net [IPv6:2001:470:6e:18e::2]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mildred.ijs.si (Postfix) with ESMTPSA id 3jDTtX13nYz1q for ; Fri, 10 Oct 2014 02:00:04 +0200 (CEST) Message-ID: <54372173.1010100@ijs.si> Date: Fri, 10 Oct 2014 01:59:47 +0200 From: Mark Martinec Organization: J. Stefan Institute User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org Subject: zfs pool import hangs on [tx->tx_sync_done_cv] Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Oct 2014 00:00:11 -0000 In short, after upgrading to 10.1-BETA3 or -RC1 I ended up with several zfs pools that can no longer be imported. The zpool import command (with no arguments) does show all available pools, but trying to import one just hangs and the command cannot be aborted, although the rest of the system is still alive and fine: # zpool import -f Typing ^T just shows an idle process, waiting on [tx->tx_sync_done_cv]: load: 0.20 cmd: zpool 939 [tx->tx_sync_done_cv] 5723.65r 0.01u 0.02s 0% 8220k load: 0.16 cmd: zpool 939 [tx->tx_sync_done_cv] 5735.73r 0.01u 0.02s 0% 8220k load: 0.14 cmd: zpool 939 [tx->tx_sync_done_cv] 5741.83r 0.01u 0.02s 0% 8220k load: 0.13 cmd: zpool 939 [tx->tx_sync_done_cv] 5749.16r 0.01u 0.02s 0% 8220k ps shows (on a system re-booted to a LiveCD running FreeBSD-10.1-RC1): PID TID COMM TDNAME CPU PRI STATE WCHAN 939 100632 zpool - 5 122 sleep tx->tx_s UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 939 801 0 22 0 107732 8236 tx->tx_s D+ v0 0:00.04 zpool import -f -o cachefile=/tmp/zpool.cache -R /tmp/sys0boot sys0boot NWCHAN fffff8007b0f2a20 # procstat -kk 939 PID TID COMM TDNAME KSTACK 939 100632 zpool - mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_wait_synced+0x85 spa_load+0x1cd1 spa_load_best+0x6f spa_import+0x1ff zfs_ioc_pool_import+0x137 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb Background story: the system where this happened was being kept to a fairly recent 10-STABLE. The last upgrade was very close to a BETA3 release. There are a couple of zfs pools there, one on a mirrored pair of SSDs mostly holding the OS, one with a mirrored pair of large spindles, and three more small ones (4 GiB each), mostly for boot redundancy or testing - these small ones are on old smallish disks. These disks are different, and attached to different SATA controllers (LSI and onboard Intel). Pools were mostly kept up-to-date to the most recent zpool features set through their lifetime (some starting their life with 9.0, some with 10.0). About two weeks ago after a reboot to a 10-STABLE of the day the small pools became unavailable, but the regular two large pools were still normal. At first I wasn't giving much attention to that, as these pools were on oldish disks and nonessential for normal operation, blaming a potentially crappy hw. Today I needed to do a reboot (for unrelated reason), and the machine was no longer able to mount the boot pool. The first instinct was - disks are malfunctioning - but ... Booting it to a FreeBSD-10.1-RC1 LiveCD was successful. smartmon disk test shows no problems. dd is able to read whole partititions of each problematic pool. And most importantly, running a 'zdb -e -cc' on each (non-imported) pool was churning normally and steadily, producing a stats report at the end and reported no errors. As a final proof that disks are fine I sacrificed one of the broken 4 GiB GPT partitions with one of the problematic pools, and did a fresh 10.1-RC1 install on it from a distribution ISO DVD. The installation went fine and the system does boot and run fine from the newly installed OS. Trying to import one of the remaining old pools hangs the import command as before. As a final proof, I copied (with dd) one of the broken 4 GiB partitions to a file on another system (running 10.1-BETA3, which did not suffer from this problem), made a memory disk out of this file, then run zfs import on this pool - and it hangs there too! So hardware was not a problem - either these partitions are truly broken (even though zdb -cc says they are fine), or the new OS is somehow no longer able to import them. Please advise. I have a copy of the 4 GiB partition on a 400 MB compressed file available, if somebody would be willing to play with it. Also have a ktrace of the 'zpool import' command. It's last actions before it hangs are: 939 zpool RET madvise 0 939 zpool CALL madvise(0x80604e000,0x1000,MADV_FREE) 939 zpool RET madvise 0 939 zpool CALL close(0x6) 939 zpool RET close 0 939 zpool CALL ioctl(0x3,0xc0185a05,0x7fffffffbf00) 939 zpool RET ioctl -1 errno 2 No such file or directory 939 zpool CALL madvise(0x802c71000,0x10000,MADV_FREE) 939 zpool RET madvise 0 939 zpool CALL madvise(0x802ca5000,0x1000,MADV_FREE) 939 zpool RET madvise 0 939 zpool CALL ioctl(0x3,0xc0185a06,0x7fffffffbf60) 939 zpool RET ioctl 0 939 zpool CALL ioctl(0x3,0xc0185a06,0x7fffffffbf60) 939 zpool RET ioctl 0 939 zpool CALL stat(0x802c380e0,0x7fffffffbc58) 939 zpool NAMI "/tmp" 939 zpool STRU struct stat {dev=273, ino=2, mode=041777, nlink=8, uid=0, gid=0, rdev=96, atime=1412866648, stime=1412871393, ctime=1412871393, birthtime=1412866648, size=512, blksize=32768, blocks=8, flags=0x0 } 939 zpool RET stat 0 939 zpool CALL ioctl(0x3,0xc0185a02,0x7fffffffbc60) Mark