From owner-freebsd-fs@freebsd.org Tue Jan 31 23:22:25 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4199CCCA221 for ; Tue, 31 Jan 2017 23:22:25 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: from mail-wm0-x231.google.com (mail-wm0-x231.google.com [IPv6:2a00:1450:400c:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D6EB11576 for ; Tue, 31 Jan 2017 23:22:24 +0000 (UTC) (envelope-from killing@multiplay.co.uk) Received: by mail-wm0-x231.google.com with SMTP id c85so12236567wmi.1 for ; Tue, 31 Jan 2017 15:22:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=multiplay-co-uk.20150623.gappssmtp.com; s=20150623; h=subject:to:references:cc:from:message-id:date:user-agent :mime-version:in-reply-to; bh=KPHc/sURsu/jgWh8hoRRketAskb1gfmJYI1C92bClE4=; b=P7zIyAL7SExdZ9y5k9QZVlgQoEi3xNsk75DpILwsoXYjrtBbvXcI+tb0oUmjnG5zyJ iHkz4q/WiTJRE7EVQ/27LjAY9u3GF19pwUVVVdAxC2hswb0o430QBksXhq7QODtPQRbX 0lK5Z+2GTlN/QCQGr7gdet2jBGMiM85JyBv8fhTSKoEAuQYFypXP34XVaIM7T9Zux5Sl smCAA2UdWJ5qRpfqEMektHPx+/4/vE9tVE452vKYu8YzqjLnOkdc9n7zt7A/dOfE6FEF F+fHK0Wie78c1TnoFwyOmOA6ziScO/JwmPrB73fdmF8+wFW/LgBdQBDQZaavz+rn7opK rcag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:cc:from:message-id:date :user-agent:mime-version:in-reply-to; bh=KPHc/sURsu/jgWh8hoRRketAskb1gfmJYI1C92bClE4=; b=rkSXta+8t7vwAAlifwHKr0ICzME/jFt2VuOgVxtGONAe3mwvni3hhv6WuRqh5BAm3E 3xZRcWRrUx1smMQj+7esVEdb57BYZ5dvYb2JogS80dNPLafFNqLLEKCzUrdslGzWmf55 jEz6pwBv35N9wG5nyYYhg6tXkyTHM1mdCO+LrFnYfi3xAJ+DDwwy7CYTQ9c6EMLznY53 0EVUQ/8G1C9ykNmiIjYH7MF+W8aXvhNZwIGQHJrR03BLtzohMAGc8F464TZicEcDBisE RkQ/deY7MERyFtMd4H+4b0KVzQH7rhH/1a/EAxzBzBi/M/EcUxNXEJDXwgfsOHNI8a6w oyrg== X-Gm-Message-State: AIkVDXLfjbrADTWX6TeXuMScsK2iRaHWwYOBNUDg6szPoL+X6h64RFJD1z577i/aV5Uaz0fe X-Received: by 10.28.227.133 with SMTP id a127mr19305399wmh.104.1485904942549; Tue, 31 Jan 2017 15:22:22 -0800 (PST) Received: from [10.10.1.58] ([185.97.61.26]) by smtp.gmail.com with ESMTPSA id 36sm30771198wrz.8.2017.01.31.15.22.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 31 Jan 2017 15:22:21 -0800 (PST) Subject: Re: 16.0E ExpandSize? -- New Server To: Larry Rosenman References: <00db0ab7243ce6368c246ae20f9c075a@FreeBSD.org> <1a69057c-dc59-9b78-9762-4f98a071105e@multiplay.co.uk> <35a9034f91542bb1329ac5104bf3b773@FreeBSD.org> <76fc9505-f681-0de0-fe0c-5624b29de321@multiplay.co.uk> <22e1bfc5840d972cf93643733682cda1@FreeBSD.org> <8a710dc75c129f58b0372eeaeca575b5@FreeBSD.org> Cc: Freebsd fs From: Steven Hartland Message-ID: <96534515-4fcb-774e-a599-8d48aec930cd@multiplay.co.uk> Date: Tue, 31 Jan 2017 23:22:22 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Jan 2017 23:22:25 -0000 Yep On 31/01/2017 21:49, Larry Rosenman wrote: > > revert the other patch and apply this one? > > > > On 01/31/2017 3:47 pm, Steven Hartland wrote: > >> Hmm, looks like there's also a bug in the way vdev_min_asize is >> calculated for raidz as it can and has resulted in child min_asize >> which won't provided enough space for the parent due to the use of >> unrounded integer division. >> >> 1981411579221 * 6 = 11888469475326 < 11888469475328 >> >> You should have vdev_min_asize: 1981411579222 for your children. >> >> Updated patch attached, however calculation still isn't 100% >> reversible so may need work, however it does now ensure that the >> children will provide enough capacity for min_asize even if all of >> them are shrunk to their individual min_asize, which I believe >> previously may not have been the case. >> >> This isn't related to the incorrect EXPANDSZ output, but would be >> good if you could confirm it doesn't cause any issues for your pool >> given its state. >> >> On 31/01/2017 21:00, Larry Rosenman wrote: >>> >>> borg-new /home/ler $ sudo ./vdev-stats.d >>> Password: >>> vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0, vdev_min_asize: 0 >>> vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize: >>> 11947478089728, vdev_min_asize: 11888469475328 >>> vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712, vdev_asize: >>> 1991245299712, vdev_min_asize: 1981411579221 >>> vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288, vdev_asize: >>> 1991246348288, vdev_min_asize: 1981411579221 >>> vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288, vdev_asize: >>> 1991246348288, vdev_min_asize: 1981411579221 >>> vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152, vdev_asize: >>> 1991247921152, vdev_min_asize: 1981411579221 >>> vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288, vdev_asize: >>> 1991246348288, vdev_min_asize: 1981411579221 >>> vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288, vdev_asize: >>> 1991246348288, vdev_min_asize: 1981411579221 >>> ^C >>> >>> borg-new /home/ler $ >>> >>> >>> borg-new /home/ler $ sudo zpool list -v >>> Password: >>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT >>> zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE - >>> raidz1 10.8T 94.3G 10.7T 16.0E 0% 0% >>> mfid4p4 - - - - - - >>> mfid0p4 - - - - - - >>> mfid1p4 - - - - - - >>> mfid3p4 - - - - - - >>> mfid2p4 - - - - - - >>> mfid5p4 - - - - - - >>> borg-new /home/ler $ >>> >>> >>> On 01/31/2017 2:37 pm, Steven Hartland wrote: >>> >>> In that case based on your zpool history I suspect that the >>> original mfid4p4 was the same size as mfid0p4 (1991246348288) >>> but its been replaced with a drive which is (1991245299712), >>> slightly smaller. >>> >>> This smaller size results in a max_asize of 1991245299712 * 6 >>> instead of original 1991246348288* 6. >>> >>> Now given the way min_asize (the value used to check if the >>> device size is acceptable) is rounded to the the nearest >>> metaslab I believe that replace would be allowed. >>> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947 >>> >>> Now the problem is that on open the calculated asize is only >>> updated if its expanding: >>> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424 >>> >>> The updated dtrace file outputs vdev_min_asize which should >>> confirm my suspicion about why the replace was allowed. >>> >>> On 31/01/2017 19:05, Larry Rosenman wrote: >>> >>> I've replaced some disks due to failure, and some of the >>> pariition sizes are different. >>> >>> >>> autoexpand is off: >>> >>> borg-new /home/ler $ zpool get all zroot >>> NAME PROPERTY VALUE SOURCE >>> zroot size 10.8T - >>> zroot capacity 0% - >>> zroot altroot - default >>> zroot health ONLINE - >>> zroot guid 11945658884309024932 default >>> zroot version - default >>> zroot bootfs zroot/ROOT/default local >>> zroot delegation on default >>> zroot autoreplace off default >>> zroot cachefile - default >>> zroot failmode wait default >>> zroot listsnapshots off default >>> zroot autoexpand off default >>> zroot dedupditto 0 default >>> zroot dedupratio 1.00x - >>> zroot free 10.7T - >>> zroot allocated 94.3G - >>> zroot readonly off - >>> zroot comment - default >>> zroot expandsize 16.0E - >>> zroot freeing 0 default >>> zroot fragmentation 0% - >>> zroot leaked 0 default >>> zroot feature@async_destroy enabled local >>> zroot feature@empty_bpobj active local >>> zroot feature@lz4_compress active local >>> zroot feature@multi_vdev_crash_dump enabled local >>> zroot feature@spacemap_histogram active local >>> zroot feature@enabled_txg active local >>> zroot feature@hole_birth active local >>> zroot feature@extensible_dataset enabled local >>> zroot feature@embedded_data active local >>> zroot feature@bookmarks enabled local >>> zroot feature@filesystem_limits enabled local >>> zroot feature@large_blocks enabled local >>> zroot feature@sha512 enabled local >>> zroot feature@skein enabled local >>> borg-new /home/ler $ >>> >>> >>> borg-new /home/ler $ gpart show >>> => 40 3905945520 mfid0 GPT (1.8T) >>> 40 1600 1 efi (800K) >>> 1640 1024 2 freebsd-boot (512K) >>> 2664 1432 - free - (716K) >>> 4096 16777216 3 freebsd-swap (8.0G) >>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>> 3905943552 2008 - free - (1.0M) >>> >>> => 40 3905945520 mfid1 GPT (1.8T) >>> 40 1600 1 efi (800K) >>> 1640 1024 2 freebsd-boot (512K) >>> 2664 1432 - free - (716K) >>> 4096 16777216 3 freebsd-swap (8.0G) >>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>> 3905943552 2008 - free - (1.0M) >>> >>> => 40 3905945520 mfid2 GPT (1.8T) >>> 40 1600 1 efi (800K) >>> 1640 1024 2 freebsd-boot (512K) >>> 2664 1432 - free - (716K) >>> 4096 16777216 3 freebsd-swap (8.0G) >>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>> 3905943552 2008 - free - (1.0M) >>> >>> => 40 3905945520 mfid3 GPT (1.8T) >>> 40 1600 1 efi (800K) >>> 1640 1024 2 freebsd-boot (512K) >>> 2664 16777216 3 freebsd-swap (8.0G) >>> 16779880 3889165680 4 freebsd-zfs (1.8T) >>> >>> => 40 3905945520 mfid5 GPT (1.8T) >>> 40 1600 1 efi (800K) >>> 1640 1024 2 freebsd-boot (512K) >>> 2664 1432 - free - (716K) >>> 4096 16777216 3 freebsd-swap (8.0G) >>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>> 3905943552 2008 - free - (1.0M) >>> >>> => 40 3905945520 mfid4 GPT (1.8T) >>> 40 1600 1 efi (800K) >>> 1640 1024 2 freebsd-boot (512K) >>> 2664 1432 - free - (716K) >>> 4096 16777216 3 freebsd-swap (8.0G) >>> 16781312 3889160192 4 freebsd-zfs (1.8T) >>> 3905941504 4056 - free - (2.0M) >>> >>> borg-new /home/ler $ >>> >>> >>> this system was built last week, and I **CAN** rebuild it if >>> necessary, but I didn't do anything strange (so I thought :) ) >>> >>> >>> >>> >>> On 01/31/2017 12:30 pm, Steven Hartland wrote: >>> >>> Your issue is the reported vdev_max_asize > vdev_asize: >>> vdev_max_asize: 11947471798272 >>> vdev_asize: 11947478089728 >>> >>> max asize is smaller than asize by 6291456 >>> >>> For raidz1 Xsize should be the smallest disk Xsize * >>> disks so: >>> 1991245299712 * 6 = 11947471798272 >>> >>> So your max asize looks right but asize looks too big >>> >>> Expand Size is calculated by: >>> if (vd->vdev_aux == NULL && tvd != NULL && >>> vd->vdev_max_asize != 0) { >>> vs->vs_esize = P2ALIGN(vd->vdev_max_asize - >>> vd->vdev_asize, >>> 1ULL << tvd->vdev_ms_shift); >>> } >>> >>> So the question is why is asize too big? >>> >>> Given you seem to have some random disk sizes do you >>> have auto expand turned on? >>> >>> On 31/01/2017 17:39, Larry Rosenman wrote: >>> >>> vdev_path: n/a, vdev_max_asize: 11947471798272, >>> vdev_asize: 11947478089728 >>> >>> >>> -- >>> Larry Rosenman http://people.freebsd.org/~ler >>> >>> Phone: +1 214-642-9640 E-Mail: >>> ler@FreeBSD.org >>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 >>> >>> >>> -- >>> Larry Rosenman http://people.freebsd.org/~ler >>> >>> Phone: +1 214-642-9640 E-Mail: ler@FreeBSD.org >>> >>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 > > > -- > Larry Rosenman http://people.freebsd.org/~ler > > Phone: +1 214-642-9640 E-Mail: ler@FreeBSD.org > > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281