From owner-freebsd-fs@freebsd.org Tue Jan 31 22:37:31 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50459CCA603 for ; Tue, 31 Jan 2017 22:37:31 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x22e.google.com (mail-wm0-x22e.google.com [IPv6:2a00:1450:400c:c09::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C244F1EB9 for ; Tue, 31 Jan 2017 22:37:30 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x22e.google.com with SMTP id b65so10864092wmf.0 for ; Tue, 31 Jan 2017 14:37:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=rl+JYiLppLmvQ3rPrl4KicYNfG8/1wBzhE5n2ncex/4=; b=S2xwPK6GHBQtCTZU1rlXk+55522pA2uxwSXuIJQo9K5Rd1VaFO/BBnH0FzbtV7RNQZ Qs+fGWsnDW7Wmf6NF1gkewNzW0GFdj9CD48qi7dYkLz9PlQTuJRkXIIo44qo53Xz5Tes FdTuaEBrcpitbExjdh3pUpaXd4O2jpJDhpGjEIsgQ2dNcu6/3/1TCpiEka8aUfXGq5fu 8USsc7C5oHAmlt371147BhtbY50uvpHo974B77PgXwfcWj3W+BSbiUGw7+FlanmnWP9w zxM4VN8aeFpcWs9z1E90ikhmQWO8LGulegj2FzC28xHxWze+WShMrlySxXI12/DMhRHE dE1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=rl+JYiLppLmvQ3rPrl4KicYNfG8/1wBzhE5n2ncex/4=; b=bJg9pet53I6nDUNERyCEGPsbfvw7TuSL5ChgBoonyFhFQfukS3ZxLoGx3lR4qpQTlG UBlMtjlac4VXn+Fyc9vRvfQuV2BhQVnEOKM5JlegbLl/C/i9ZaAHsfxhjVgW3Z+tmNaE m8fWZRLx+Sdju5/kiUYSjrjbtdiIGoBRvmQnoYJJd71Y606KEKNZN4f1IiIjmvmI0iLV r1anl+XTpAXfTzFMfuUfxUnmqGD6zcC1McQpFmtwpTBCNOBnIc/SNI41uGVDfh1zopcG NGAvkkgwFiAwxCej8mbFWZ6FSSTJ6w9fetvAd1tAm9bE3DIDbL9vixd+1v5aTh5L6d3P kS8A== X-Gm-Message-State: AIkVDXKccyL90m4zhKazojA3XhViQfLtaizhW4npti5YXxF+0F1an7ZJWN6E7o8jKwedrBfm X-Received: by 10.28.143.204 with SMTP id r195mr137582wmd.32.1485902248435; Tue, 31 Jan 2017 14:37:28 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id k70sm26233743wmc.3.2017.01.31.14.37.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jan 2017 14:37:27 -0800 (PST) Subject: Re: 16.0E ExpandSize? -- New Server To: Marie Helene Kvello-Aune , Larry Rosenman References: <00db0ab7243ce6368c246ae20f9c075a@FreeBSD.org> <1a69057c-dc59-9b78-9762-4f98a071105e@multiplay.co.uk> <35a9034f91542bb1329ac5104bf3b773@FreeBSD.org> <76fc9505-f681-0de0-fe0c-5624b29de321@multiplay.co.uk> <22e1bfc5840d972cf93643733682cda1@FreeBSD.org> <8a710dc75c129f58b0372eeaeca575b5@FreeBSD.org> Cc: Freebsd fs From: Steven Hartland Message-ID: Date: Tue, 31 Jan 2017 22:37:28 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jan 2017 22:37:31 -0000 On 31/01/2017 22:02, Marie Helene Kvello-Aune wrote: > On Tue, Jan 31, 2017 at 10:49 PM Larry Rosenman > wrote: > > revert the other patch and apply this one? > > On 01/31/2017 3:47 pm, Steven Hartland wrote: > > > Hmm, looks like there's also a bug in the way vdev_min_asize is > calculated for raidz as it can and has resulted in child min_asize > which won't provided enough space for the parent due to the use of > unrounded integer division. > > > > 1981411579221 * 6 = 11888469475326 < 11888469475328 > > > > You should have vdev_min_asize: 1981411579222 for your children. > > > > Updated patch attached, however calculation still isn't 100% > reversible so may need work, however it does now ensure that the > children will provide enough capacity for min_asize even if all of > them are shrunk to their individual min_asize, which I believe > previously may not have been the case. > > > > This isn't related to the incorrect EXPANDSZ output, but would > be good if you could confirm it doesn't cause any issues for your > pool given its state. > > > > On 31/01/2017 21:00, Larry Rosenman wrote: > > > > borg-new /home/ler $ sudo ./vdev-stats.d > > Password: > > vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0, vdev_min_asize: 0 > > vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize: > 11947478089728, vdev_min_asize: 11888469475328 > > vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712, > vdev_asize: 1991245299712, vdev_min_asize: 1981411579221 > > vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288, > vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 > > vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288, > vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 > > vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152, > vdev_asize: 1991247921152, vdev_min_asize: 1981411579221 > > vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288, > vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 > > vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288, > vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 > > ^C > > > > borg-new /home/ler $ > > > > borg-new /home/ler $ sudo zpool list -v > > Password: > > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT > > zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE - > > raidz1 10.8T 94.3G 10.7T 16.0E 0% 0% > > mfid4p4 - - - - - - > > mfid0p4 - - - - - - > > mfid1p4 - - - - - - > > mfid3p4 - - - - - - > > mfid2p4 - - - - - - > > mfid5p4 - - - - - - > > borg-new /home/ler $ > > > > On 01/31/2017 2:37 pm, Steven Hartland wrote: In that case based > on your zpool history I suspect that the original mfid4p4 was the > same size as mfid0p4 (1991246348288) but its been replaced with a > drive which is (1991245299712), slightly smaller. > > > > This smaller size results in a max_asize of 1991245299712 * 6 > instead of original 1991246348288* 6. > > > > Now given the way min_asize (the value used to check if the > device size is acceptable) is rounded to the the nearest metaslab > I believe that replace would be allowed. > > > https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947 > > > > Now the problem is that on open the calculated asize is only > updated if its expanding: > > > https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424 > > > > The updated dtrace file outputs vdev_min_asize which should > confirm my suspicion about why the replace was allowed. > > > > On 31/01/2017 19:05, Larry Rosenman wrote: > > > > I've replaced some disks due to failure, and some of the > pariition sizes are different. > > > > autoexpand is off: > > > > borg-new /home/ler $ zpool get all zroot > > NAME PROPERTY VALUE SOURCE > > zroot size 10.8T - > > zroot capacity 0% - > > zroot altroot - default > > zroot health ONLINE - > > zroot guid 11945658884309024932 default > > zroot version - default > > zroot bootfs zroot/ROOT/default local > > zroot delegation on default > > zroot autoreplace off default > > zroot cachefile - default > > zroot failmode wait default > > zroot listsnapshots off default > > zroot autoexpand off default > > zroot dedupditto 0 default > > zroot dedupratio 1.00x - > > zroot free 10.7T - > > zroot allocated 94.3G - > > zroot readonly off - > > zroot comment - default > > zroot expandsize 16.0E - > > zroot freeing 0 default > > zroot fragmentation 0% - > > zroot leaked 0 default > > zroot feature@async_destroy enabled local > > zroot feature@empty_bpobj active local > > zroot feature@lz4_compress active local > > zroot feature@multi_vdev_crash_dump enabled local > > zroot feature@spacemap_histogram active local > > zroot feature@enabled_txg active local > > zroot feature@hole_birth active local > > zroot feature@extensible_dataset enabled local > > zroot feature@embedded_data active local > > zroot feature@bookmarks enabled local > > zroot feature@filesystem_limits enabled local > > zroot feature@large_blocks enabled local > > zroot feature@sha512 enabled local > > zroot feature@skein enabled local > > borg-new /home/ler $ > > > > borg-new /home/ler $ gpart show > > => 40 3905945520 mfid0 GPT (1.8T) > > 40 1600 1 efi (800K) > > 1640 1024 2 freebsd-boot (512K) > > 2664 1432 - free - (716K) > > 4096 16777216 3 freebsd-swap (8.0G) > > 16781312 3889162240 4 freebsd-zfs (1.8T) > > 3905943552 2008 - free - (1.0M) > > > > => 40 3905945520 mfid1 GPT (1.8T) > > 40 1600 1 efi (800K) > > 1640 1024 2 freebsd-boot (512K) > > 2664 1432 - free - (716K) > > 4096 16777216 3 freebsd-swap (8.0G) > > 16781312 3889162240 4 freebsd-zfs (1.8T) > > 3905943552 2008 - free - (1.0M) > > > > => 40 3905945520 mfid2 GPT (1.8T) > > 40 1600 1 efi (800K) > > 1640 1024 2 freebsd-boot (512K) > > 2664 1432 - free - (716K) > > 4096 16777216 3 freebsd-swap (8.0G) > > 16781312 3889162240 4 freebsd-zfs (1.8T) > > 3905943552 2008 - free - (1.0M) > > > > => 40 3905945520 mfid3 GPT (1.8T) > > 40 1600 1 efi (800K) > > 1640 1024 2 freebsd-boot (512K) > > 2664 16777216 3 freebsd-swap (8.0G) > > 16779880 3889165680 4 freebsd-zfs (1.8T) > > > > => 40 3905945520 mfid5 GPT (1.8T) > > 40 1600 1 efi (800K) > > 1640 1024 2 freebsd-boot (512K) > > 2664 1432 - free - (716K) > > 4096 16777216 3 freebsd-swap (8.0G) > > 16781312 3889162240 4 freebsd-zfs (1.8T) > > 3905943552 2008 - free - (1.0M) > > > > => 40 3905945520 mfid4 GPT (1.8T) > > 40 1600 1 efi (800K) > > 1640 1024 2 freebsd-boot (512K) > > 2664 1432 - free - (716K) > > 4096 16777216 3 freebsd-swap (8.0G) > > 16781312 3889160192 4 freebsd-zfs (1.8T) > > 3905941504 4056 - free - (2.0M) > > > > borg-new /home/ler $ > > > > this system was built last week, and I **CAN** rebuild it if > necessary, but I didn't do anything strange (so I thought :) ) > > > > On 01/31/2017 12:30 pm, Steven Hartland wrote: Your issue is the > reported vdev_max_asize > vdev_asize: > > vdev_max_asize: 11947471798272 > > vdev_asize: 11947478089728 > > > > max asize is smaller than asize by 6291456 > > > > For raidz1 Xsize should be the smallest disk Xsize * disks so: > > 1991245299712 * 6 = 11947471798272 > > > > So your max asize looks right but asize looks too big > > > > Expand Size is calculated by: > > if (vd->vdev_aux == NULL && tvd != NULL && vd->vdev_max_asize != > 0) { > > vs->vs_esize = P2ALIGN(vd->vdev_max_asize - vd->vdev_asize, > > 1ULL << tvd->vdev_ms_shift); > > } > > > > So the question is why is asize too big? > > > > Given you seem to have some random disk sizes do you have auto > expand turned on? > > > > On 31/01/2017 17:39, Larry Rosenman wrote: vdev_path: n/a, > vdev_max_asize: 11947471798272, vdev_asize: 11947478089728 > > -- > Larry Rosenman http://people.freebsd.org/~ler > [1] > Phone: +1 214-642-9640 > E-Mail: ler@FreeBSD.org > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 > > -- > Larry Rosenman http://people.freebsd.org/~ler > [1] > Phone: +1 214-642-9640 > E-Mail: ler@FreeBSD.org > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 > > -- > Larry Rosenman http://people.freebsd.org/~ler > > Phone: +1 214-642-9640 > E-Mail: ler@FreeBSD.org > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 > > > Links: > ------ > [1] http://people.freebsd.org/%7Eler > _______________________________________________ > freebsd-fs@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to > "freebsd-fs-unsubscribe@freebsd.org > " > > > I have the same observation on my home file server. I've not tried the > patches (will try that once I get time next week), but the output of > the dtrace script while doing 'zpool list -v' shows: > > # ./dtrace.sh > vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0 > vdev_path: n/a, vdev_max_asize: 23907502915584, vdev_asize: 23907504488448 > vdev_path: /dev/gpt/Bay1.eli, vdev_max_asize: 3984583819264, > vdev_asize: 3984583819264 > vdev_path: /dev/gpt/Bay2.eli, vdev_max_asize: 3984583819264, > vdev_asize: 3984583819264 > vdev_path: /dev/gpt/Bay3.eli, vdev_max_asize: 3984583819264, > vdev_asize: 3984583819264 > vdev_path: /dev/gpt/Bay4.eli, vdev_max_asize: 3984583819264, > vdev_asize: 3984583819264 > vdev_path: /dev/gpt/Bay5.eli, vdev_max_asize: 3984583819264, > vdev_asize: 3984583819264 > vdev_path: /dev/gpt/Bay6.eli, vdev_max_asize: 3984583819264, > vdev_asize: 3984583819264 > > The second line has the same discrepancy as above. This pool was > created without geli encryption first, then while the pool was still > empty, each drive was offlined and replaced with its .eli counterpart. > IIRC geli leaves some metadata on the disk, shrinking available space > ever so slightly, which seems to fit the proposed cause earlier in > this thread. > > MH Yes indeed it does. Regards Steve