From owner-freebsd-fs@FreeBSD.ORG Sun Nov 16 05:23:10 2014 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 76A9296D for ; Sun, 16 Nov 2014 05:23:10 +0000 (UTC) Received: from maildrop31.somerville.occnc.com (maildrop31.somerville.occnc.com [IPv6:2001:550:3800:203::3131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 462A4994 for ; Sun, 16 Nov 2014 05:23:10 +0000 (UTC) Received: from harbor31.somerville.occnc.com (harbor31.somerville.occnc.com [IPv6:2001:550:3800:203::3231]) (authenticated bits=128) by maildrop31.somerville.occnc.com (8.14.9/8.14.9) with ESMTP id sAG5MwG7009367; Sun, 16 Nov 2014 00:22:58 -0500 (EST) (envelope-from curtis@ipv6.occnc.com) Message-Id: <201411160522.sAG5MwG7009367@maildrop31.somerville.occnc.com> To: Steven Hartland Reply-To: curtis@ipv6.occnc.com From: Curtis Villamizar Subject: Re: zpool create on md hangs In-reply-to: Your message of "Mon, 10 Nov 2014 09:27:26 +0000." <546084FE.80300@multiplay.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <9365.1416115378.1@harbor31.somerville.occnc.com> Content-Transfer-Encoding: quoted-printable Date: Sun, 16 Nov 2014 00:22:58 -0500 X-Spam-Status: No, score=-101.5 required=5.0 tests=ALL_TRUSTED,MISSING_MID, RP_MATCHES_RCVD,USER_IN_WHITELIST autolearn=unavailable autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on maildrop31.somerville.occnc.com Cc: "freebsd-fs@freebsd.org" , curtis@ipv6.occnc.com X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 16 Nov 2014 05:23:10 -0000 In message <546084FE.80300@multiplay.co.uk> Steven Hartland writes: = > On 10/11/2014 06:48, Andreas Nilsson wrote: > > On Mon, Nov 10, 2014 at 7:37 AM, Curtis Villamizar > > wrote: > > > >> The following shell program produces a hang. Its reproducible (hangs > >> every time). > >> > >> #!sh > >> > >> set -e > >> set -x > >> > >> truncate -s `expr 10 \* 1024 \* 1024 \* 1024` /image-file > >> md_unit=3D`mdconfig -a -n -t vnode -f /image-file` > >> echo "md device is /dev/md$md_unit" > >> zpool create test md$md_unit > >> > >> The zpool command hangs. Kill or kill -9 has no effect. All > >> filesystems are unaffected but any other zpool or zfs command will > >> hang and be unkillable. A reboot is needed. > >> > >> This is running on: > >> > >> FreeBSD 10.0-STABLE (GENERIC) #0 r270645: Wed Aug 27 00:54:29 EDT= 2014 > >> > >> When I get a chance, I will try again with a 10.1 RC3 kernel I > >> recently built. If this still doesn't work, I'll build an r11 kernel > >> since the code differs from 10.1, not having the svm code merged in. > >> I'm asking before poking around further in case anyone has insights > >> into why this might happen. > >> > >> BTW- The reason to create a zfs filesystem on an vnode type md is to > >> create an image that can run under bhyve using a zfs root fs. This > >> works quite nicely for combinations geom types (gmirror, gstripe, > >> gjournal, gcache) but zpool hangs when trying this with zfs. > >> > >> Curtis > >> > >> ps- please keep me on the Cc as I'm not subscribed to freebsd-fs. > >> > > Freezes here on 10.1-RC2-p1 (amd64) as well. > > ^T says: > > load: 0.21 cmd: zpool 74063 [zio->io_cv] 8.84r 0.00u 0.00s 0% 3368k > > > = > I suspect your just seeing the delay as it trim's the file and it will = > complete in time. > = > Try setting vfs.zfs.vdev.trim_on_init=3D0 before running the create and = > see if it completes quickly after that. > = > I tested this on HEAD and confirmed it was the case there. > = > Regards > Steve Steve, Thanks for the hint. I'm doing some testing so I'm doing this quite a bit but its automated. For a while I was continuing to just let it take 4-10 minutes. The symptoms during that time when the trim is happenning are any zpool or zfs commands hang and don't respond to a kill or even a kill -9. I've also had a few cases where a "shutdown -r now" flushed buffers but wouldn't get to the reboot and had to be powered off plus I had one apparent hang of the entire disk subsystem. I'm currently using FreeBSD 10.1-PRERELEASE #0 r274470. All of these symptons go away with vfs.zfs.vdev.trim_on_init=3D0 so I put it in my sysctl.conf files. Maybe it should be the default given the severity of behavior with vfs.zfs.vdev.trim_on_init=3D1 (too late for 10.1). Comments in the code call it an "optimization". Does anyone know exactly what the trim does? Anything useful or necessary? Curtis [fyi- unrelated] I'm performance testing configurations of disk with a vm under a compile load and using make -j maxjobs with various values of maxjobs. So far under bhyve it takes about 50% longer than native. With native combinations of stripe, mirror, journal, cache, and zfs vs ufs make little difference, about 5%. Within a vm, a disk stripe runs faster than mirror (as expected) and I'm still early in testing but trying doing the mirror or stripe on the host vs on the vm and other permutations. I've been running mirrored disks since about 1994 (with the original vinum, then gvinum, then geom mirror, then zfs mirror) but I've never taken the time to check performance. I see a 15:1 difference with the CPUs I have (old single core inel no longer used vs 4 core atom vs 4 core i3) but so far only 50% penalty for big compiles in a vm vs same processor native. Above -j 4 there is a small performance gain with zfs (which is generally slightly slower) but none for the others. I did a fair amount of testing for native disk. I've only started testing vm disk permutations but in doing this testing I'm learning a lot about bhyve and geom and zfs quirks.