From owner-freebsd-fs@FreeBSD.ORG Sun Oct 12 20:40:41 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6228441F; Sun, 12 Oct 2014 20:40:41 +0000 (UTC) Received: from mail.ijs.si (mail.ijs.si [IPv6:2001:1470:ff80::25]) by mx1.freebsd.org (Postfix) with ESMTP id 021FF643; Sun, 12 Oct 2014 20:40:41 +0000 (UTC) Received: from amavis-proxy-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3jGFK31HN8zTC; Sun, 12 Oct 2014 22:40:39 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h= content-transfer-encoding:content-type:content-type:in-reply-to :references:subject:subject:mime-version:user-agent:organization :from:from:date:date:message-id:received:received:received; s= jakla4; t=1413146433; x=1415738434; bh=sjHenPGix6lixGKB+tLxm1xd3 8Hpi3a8Lwzk0a9KC4k=; b=oz+Tox6sfOfLQczdobwcpNkBOi1xenRbO8xfOoJZp lVsCcPMt6DYIpCYHNJYk4uJsh6krSOQ6ZdpXoYhT9ZRgfaHdrzi16MQUMb/qgdkZ QOPIo8ixeCriDlK3f1VkK3YfLWeJrCCFLInPD0HUr/yQtRxhY7dqmIX9mWkwHOaa 6k= X-Virus-Scanned: amavisd-new at ijs.si Received: from mail.ijs.si ([IPv6:::1]) by amavis-proxy-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10012) with ESMTP id du5XWCIkoZTI; Sun, 12 Oct 2014 22:40:33 +0200 (CEST) Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP; Sun, 12 Oct 2014 22:40:33 +0200 (CEST) Received: from sleepy.ijs.si (msleepy-1-pt.tunnel.tserv27.prg1.ipv6.he.net [IPv6:2001:470:6e:18e::2]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mildred.ijs.si (Postfix) with ESMTPSA id 3jGFJx09zbz1P5; Sun, 12 Oct 2014 22:40:32 +0200 (CEST) Message-ID: <543AE740.7000808@ijs.si> Date: Sun, 12 Oct 2014 22:40:32 +0200 From: Mark Martinec Organization: J. Stefan Institute User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org Subject: Re: zfs pool import hangs on [tx->tx_sync_done_cv] References: <54372173.1010100@ijs.si> <644FA8299BF848E599B82D2C2C298EA7@multiplay.co.uk> <54372EBA.1000908@ijs.si> <543731F3.8090701@ijs.si> In-Reply-To: <543731F3.8090701@ijs.si> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Oct 2014 20:40:41 -0000 I made available an image copy (dd) of my 4 GiB partition (compressed down to a 384 MiB file), holding one of my partitions (a small bsd root) that can no longer be imported into a 10.1-RC1 or 10.1-BETA3, as described in my first posting (below): http://www.ijs.si/usr/mark/bsd/ I would appreciate if it can be confirmed that such pool (one of several I have with this symptom) causes 'zpool import' to hang on 10.1 or 10-STABLE: - download, xz -d sys1boot.img.xz # mdconfig -f sys1boot.img # zpool import sys1boot ... and advise on a solution. Considering that 'zdb -e -cc' is happy and there were no other prior trouble with these pools, except for an upgrade to 10.1-BETA3/-RC1 from 10-STABLE as of cca. late September, it is my belief that these pools are still healthy, just non-importable. I'm reluctant to upgrade any other system from 10.0 to 10.1 without finding out what went wrong here. Mark On 10/10/2014 03:02, Steven Hartland wrote: > Sorry to be a pain but could you attach the full output or link it > somewhere as mail has messed up the formatting :( Now at http://www.ijs.si/usr/mark/bsd/procstat-kka-2074.txt On 2014-10-10 Mark Martinec wrote: > In short, after upgrading to 10.1-BETA3 or -RC1 I ended up with several > zfs pools that can no longer be imported. The zpool import command > (with no arguments) does show all available pools, but trying to > import one just hangs and the command cannot be aborted, although > the rest of the system is still alive and fine: > > # zpool import -f > > Typing ^T just shows an idle process, waiting on [tx->tx_sync_done_cv]: > > load: 0.20 cmd: zpool 939 [tx->tx_sync_done_cv] 5723.65r 0.01u 0.02s 0% 8220k > load: 0.16 cmd: zpool 939 [tx->tx_sync_done_cv] 5735.73r 0.01u 0.02s 0% 8220k > load: 0.14 cmd: zpool 939 [tx->tx_sync_done_cv] 5741.83r 0.01u 0.02s 0% 8220k > load: 0.13 cmd: zpool 939 [tx->tx_sync_done_cv] 5749.16r 0.01u 0.02s 0% 8220k > > ps shows (on a system re-booted to a LiveCD running FreeBSD-10.1-RC1): > > PID TID COMM TDNAME CPU PRI STATE WCHAN > 939 100632 zpool - 5 122 sleep tx->tx_s > UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND > 0 939 801 0 22 0 107732 8236 tx->tx_s D+ v0 0:00.04 > zpool import -f -o cachefile=/tmp/zpool.cache -R /tmp/sys0boot sys0boot > > NWCHAN > fffff8007b0f2a20 > > # procstat -kk 939 > > PID TID COMM TDNAME KSTACK > 939 100632 zpool - mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d txg_wait_synced+0x85 spa_load+0x1cd1 spa_load_best+0x6f spa_import+0x1ff zfs_ioc_pool_import+0x137 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb > > > Background story: the system where this happened was being kept > to a fairly recent 10-STABLE. The last upgrade was very close to > a BETA3 release. There are a couple of zfs pools there, one on a > mirrored pair of SSDs mostly holding the OS, one with a mirrored > pair of large spindles, and three more small ones (4 GiB each), > mostly for boot redundancy or testing - these small ones are on > old smallish disks. These disks are different, and attached to > different SATA controllers (LSI and onboard Intel). Pools were > mostly kept up-to-date to the most recent zpool features set > through their lifetime (some starting their life with 9.0, some > with 10.0). > > About two weeks ago after a reboot to a 10-STABLE of the day > the small pools became unavailable, but the regular two large > pools were still normal. At first I wasn't giving much attention > to that, as these pools were on oldish disks and nonessential > for normal operation, blaming a potentially crappy hw. > > Today I needed to do a reboot (for unrelated reason), and the > machine was no longer able to mount the boot pool. > The first instinct was - disks are malfunctioning - but ... > > Booting it to a FreeBSD-10.1-RC1 LiveCD was successful. > smartmon disk test shows no problems. dd is able to read whole > partititions of each problematic pool. And most importantly, > running a 'zdb -e -cc' on each (non-imported) pool was churning > normally and steadily, producing a stats report at the end > and reported no errors. > > As a final proof that disks are fine I sacrificed one of the broken > 4 GiB GPT partitions with one of the problematic pools, and > did a fresh 10.1-RC1 install on it from a distribution ISO DVD. > The installation went fine and the system does boot and run > fine from the newly installed OS. Trying to import one of the > remaining old pools hangs the import command as before. > > As a final proof, I copied (with dd) one of the broken 4 GiB > partitions to a file on another system (running 10.1-BETA3, > which did not suffer from this problem), made a memory disk > out of this file, then run zfs import on this pool - and it hangs > there too! So hardware was not a problem - either these partitions > are truly broken (even though zdb -cc says they are fine), > or the new OS is somehow no longer able to import them. > > Please advise. > > I have a copy of the 4 GiB partition on a 400 MB compressed > file available, if somebody would be willing to play with it. > > Also have a ktrace of the 'zpool import' command. It's last > actions before it hangs are: > > 939 zpool RET madvise 0 > 939 zpool CALL madvise(0x80604e000,0x1000,MADV_FREE) > 939 zpool RET madvise 0 > 939 zpool CALL close(0x6) > 939 zpool RET close 0 > 939 zpool CALL ioctl(0x3,0xc0185a05,0x7fffffffbf00) > 939 zpool RET ioctl -1 errno 2 No such file or directory > 939 zpool CALL madvise(0x802c71000,0x10000,MADV_FREE) > 939 zpool RET madvise 0 > 939 zpool CALL madvise(0x802ca5000,0x1000,MADV_FREE) > 939 zpool RET madvise 0 > 939 zpool CALL ioctl(0x3,0xc0185a06,0x7fffffffbf60) > 939 zpool RET ioctl 0 > 939 zpool CALL ioctl(0x3,0xc0185a06,0x7fffffffbf60) > 939 zpool RET ioctl 0 > 939 zpool CALL stat(0x802c380e0,0x7fffffffbc58) > 939 zpool NAMI "/tmp" > 939 zpool STRU struct stat {dev=273, ino=2, mode=041777, nlink=8, uid=0, gid=0, rdev=96, atime=1412866648, stime=1412871393, ctime=1412871393, birthtime=1412866648, size=512, blksize=32768, blocks=8, flags=0x0 } > 939 zpool RET stat 0 > 939 zpool CALL ioctl(0x3,0xc0185a02,0x7fffffffbc60) > > > Mark