Date: Sun, 5 Feb 2017 23:16:27 +0000 From: Steven Hartland <killing@multiplay.co.uk> To: Larry Rosenman <ler@FreeBSD.org>, Freebsd fs <freebsd-fs@freebsd.org> Subject: Re: 16.0E ExpandSize? -- New Server Message-ID: <3691c96a-97c4-6bd1-80b5-02cd929cd219@multiplay.co.uk> In-Reply-To: <20170205035438.6gc2ybg6otidzpaz@borg.lerctr.org> References: <22e1bfc5840d972cf93643733682cda1@FreeBSD.org> <f2600a53-0dc1-9f41-1405-ed22d96d30cf@multiplay.co.uk> <8a710dc75c129f58b0372eeaeca575b5@FreeBSD.org> <aef02eb0-0888-6fea-a4b8-4033ca56f4a3@multiplay.co.uk> <d3181bd00c827fb99fbcebe6fe097ef8@FreeBSD.org> <a3d78923-5046-11c8-daea-713eacf47bd2@multiplay.co.uk> <ffc24b7bfacd265d637b633566bbaa51@FreeBSD.org> <96534515-4fcb-774e-a599-8d48aec930cd@multiplay.co.uk> <a98b3a3da1665c8eac6160633a0bc778@FreeBSD.org> <8387d38f-3185-8c07-396b-602c708002a6@multiplay.co.uk> <20170205035438.6gc2ybg6otidzpaz@borg.lerctr.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Its still actually waiting on review, the merged icon at the bottom was for an unrelated PR which reference mine as they saw the same random test failure as I did. On 05/02/2017 03:54, Larry Rosenman wrote: > I saw it was accepted upstream. Can it be committed to FreeBSD? > > > On Wed, Feb 01, 2017 at 02:43:51AM +0000, Steven Hartland wrote: >> Thanks I've put a PR in upstream to get some eyes on the fix. >> https://github.com/openzfs/openzfs/pull/296 >> >> If no objections are raised to the approach I've used I'll commit the fix to >> HEAD too. >> >> On 01/02/2017 02:31, Larry Rosenman wrote: >>> no grief that I can see: >>> >>> borg-new /home/ler $ sudo zdb >>> Password: >>> zroot: >>> version: 5000 >>> name: 'zroot' >>> state: 0 >>> txg: 96143 >>> pool_guid: 11945658884309024932 >>> hostid: 3619181042 >>> hostname: '' >>> com.delphix:has_per_vdev_zaps >>> vdev_children: 1 >>> vdev_tree: >>> type: 'root' >>> id: 0 >>> guid: 11945658884309024932 >>> create_txg: 4 >>> children[0]: >>> type: 'raidz' >>> id: 0 >>> guid: 7596925654112466913 >>> nparity: 1 >>> metaslab_array: 42 >>> metaslab_shift: 36 >>> ashift: 12 >>> asize: 11947471798272 >>> is_log: 0 >>> create_txg: 4 >>> com.delphix:vdev_zap_top: 35 >>> children[0]: >>> type: 'disk' >>> id: 0 >>> guid: 1443238581175429852 >>> path: '/dev/mfid4p4' >>> whole_disk: 1 >>> DTL: 137 >>> create_txg: 4 >>> com.delphix:vdev_zap_leaf: 131 >>> children[1]: >>> type: 'disk' >>> id: 1 >>> guid: 1865792721003775978 >>> path: '/dev/mfid0p4' >>> whole_disk: 1 >>> DTL: 133 >>> create_txg: 4 >>> com.delphix:vdev_zap_leaf: 37 >>> children[2]: >>> type: 'disk' >>> id: 2 >>> guid: 12541720522827927342 >>> path: '/dev/mfid1p4' >>> whole_disk: 1 >>> DTL: 132 >>> create_txg: 4 >>> com.delphix:vdev_zap_leaf: 38 >>> children[3]: >>> type: 'disk' >>> id: 3 >>> guid: 13053934791777776444 >>> path: '/dev/mfid3p4' >>> whole_disk: 1 >>> DTL: 136 >>> create_txg: 4 >>> com.delphix:vdev_zap_leaf: 135 >>> children[4]: >>> type: 'disk' >>> id: 4 >>> guid: 4432707573898874857 >>> path: '/dev/mfid2p4' >>> whole_disk: 1 >>> DTL: 130 >>> create_txg: 4 >>> com.delphix:vdev_zap_leaf: 40 >>> children[5]: >>> type: 'disk' >>> id: 5 >>> guid: 5106293125005422556 >>> path: '/dev/mfid5p4' >>> whole_disk: 1 >>> DTL: 129 >>> create_txg: 4 >>> com.delphix:vdev_zap_leaf: 41 >>> features_for_read: >>> com.delphix:hole_birth >>> com.delphix:embedded_data >>> borg-new /home/ler $ sudo zpool list -v >>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT >>> zroot 10.8T 94.3G 10.7T - 0% 0% 1.00x ONLINE - >>> raidz1 10.8T 94.3G 10.7T - 0% 0% >>> mfid4p4 - - - - - - >>> mfid0p4 - - - - - - >>> mfid1p4 - - - - - - >>> mfid3p4 - - - - - - >>> mfid2p4 - - - - - - >>> mfid5p4 - - - - - - >>> borg-new /home/ler $ sudo zpool get all >>> NAME PROPERTY VALUE SOURCE >>> zroot size 10.8T - >>> zroot capacity 0% - >>> zroot altroot - default >>> zroot health ONLINE - >>> zroot guid 11945658884309024932 default >>> zroot version - default >>> zroot bootfs zroot/ROOT/default local >>> zroot delegation on default >>> zroot autoreplace off default >>> zroot cachefile - default >>> zroot failmode wait default >>> zroot listsnapshots off default >>> zroot autoexpand off default >>> zroot dedupditto 0 default >>> zroot dedupratio 1.00x - >>> zroot free 10.7T - >>> zroot allocated 94.3G - >>> zroot readonly off - >>> zroot comment - default >>> zroot expandsize - - >>> zroot freeing 0 default >>> zroot fragmentation 0% - >>> zroot leaked 0 default >>> zroot feature@async_destroy enabled local >>> zroot feature@empty_bpobj active local >>> zroot feature@lz4_compress active local >>> zroot feature@multi_vdev_crash_dump enabled local >>> zroot feature@spacemap_histogram active local >>> zroot feature@enabled_txg active local >>> zroot feature@hole_birth active local >>> zroot feature@extensible_dataset enabled local >>> zroot feature@embedded_data active local >>> zroot feature@bookmarks enabled local >>> zroot feature@filesystem_limits enabled local >>> zroot feature@large_blocks enabled local >>> zroot feature@sha512 enabled local >>> zroot feature@skein enabled local >>> borg-new /home/ler $ >>> >>> >>> >>> On 01/31/2017 5:22 pm, Steven Hartland wrote: >>> >>>> Yep >>>> >>>> On 31/01/2017 21:49, Larry Rosenman wrote: >>>>> revert the other patch and apply this one? >>>>> >>>>> >>>>> >>>>> On 01/31/2017 3:47 pm, Steven Hartland wrote: >>>>> >>>>> Hmm, looks like there's also a bug in the way vdev_min_asize is >>>>> calculated for raidz as it can and has resulted in child >>>>> min_asize which won't provided enough space for the parent due >>>>> to the use of unrounded integer division. >>>>> >>>>> 1981411579221 * 6 = 11888469475326 < 11888469475328 >>>>> >>>>> You should have vdev_min_asize: 1981411579222 for your children. >>>>> >>>>> Updated patch attached, however calculation still isn't 100% >>>>> reversible so may need work, however it does now ensure that the >>>>> children will provide enough capacity for min_asize even if all >>>>> of them are shrunk to their individual min_asize, which I >>>>> believe previously may not have been the case. >>>>> >>>>> This isn't related to the incorrect EXPANDSZ output, but would >>>>> be good if you could confirm it doesn't cause any issues for >>>>> your pool given its state. >>>>> >>>>> On 31/01/2017 21:00, Larry Rosenman wrote: >>>>> >>>>> borg-new /home/ler $ sudo ./vdev-stats.d >>>>> Password: >>>>> vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0, >>>>> vdev_min_asize: 0 >>>>> vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize: >>>>> 11947478089728, vdev_min_asize: 11888469475328 >>>>> vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712, >>>>> vdev_asize: 1991245299712, vdev_min_asize: 1981411579221 >>>>> vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288, >>>>> vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 >>>>> vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288, >>>>> vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 >>>>> vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152, >>>>> vdev_asize: 1991247921152, vdev_min_asize: 1981411579221 >>>>> vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288, >>>>> vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 >>>>> vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288, >>>>> vdev_asize: 1991246348288, vdev_min_asize: 1981411579221 >>>>> ^C >>>>> >>>>> borg-new /home/ler $ >>>>> >>>>> >>>>> borg-new /home/ler $ sudo zpool list -v >>>>> Password: >>>>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT >>>>> zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE - >>>>> raidz1 10.8T 94.3G 10.7T 16.0E 0% 0% >>>>> mfid4p4 - - - - - - >>>>> mfid0p4 - - - - - - >>>>> mfid1p4 - - - - - - >>>>> mfid3p4 - - - - - - >>>>> mfid2p4 - - - - - - >>>>> mfid5p4 - - - - - - >>>>> borg-new /home/ler $ >>>>> >>>>> >>>>> On 01/31/2017 2:37 pm, Steven Hartland wrote: >>>>> >>>>> In that case based on your zpool history I suspect that >>>>> the original mfid4p4 was the same size as mfid0p4 >>>>> (1991246348288) but its been replaced with a drive which >>>>> is (1991245299712), slightly smaller. >>>>> >>>>> This smaller size results in a max_asize of >>>>> 1991245299712 * 6 instead of original 1991246348288* 6. >>>>> >>>>> Now given the way min_asize (the value used to check if >>>>> the device size is acceptable) is rounded to the the >>>>> nearest metaslab I believe that replace would be allowed. >>>>> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947 >>>>> >>>>> Now the problem is that on open the calculated asize is >>>>> only updated if its expanding: >>>>> https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424 >>>>> >>>>> The updated dtrace file outputs vdev_min_asize which >>>>> should confirm my suspicion about why the replace was >>>>> allowed. >>>>> >>>>> On 31/01/2017 19:05, Larry Rosenman wrote: >>>>> >>>>> I've replaced some disks due to failure, and some of >>>>> the pariition sizes are different. >>>>> >>>>> >>>>> autoexpand is off: >>>>> >>>>> borg-new /home/ler $ zpool get all zroot >>>>> NAME PROPERTY VALUE SOURCE >>>>> zroot size 10.8T - >>>>> zroot capacity 0% - >>>>> zroot altroot - default >>>>> zroot health ONLINE - >>>>> zroot guid 11945658884309024932 default >>>>> zroot version - default >>>>> zroot bootfs zroot/ROOT/default local >>>>> zroot delegation on default >>>>> zroot autoreplace off default >>>>> zroot cachefile - default >>>>> zroot failmode wait default >>>>> zroot listsnapshots off default >>>>> zroot autoexpand off default >>>>> zroot dedupditto 0 default >>>>> zroot dedupratio 1.00x - >>>>> zroot free 10.7T - >>>>> zroot allocated 94.3G - >>>>> zroot readonly off - >>>>> zroot comment - default >>>>> zroot expandsize 16.0E - >>>>> zroot freeing 0 default >>>>> zroot fragmentation 0% - >>>>> zroot leaked 0 default >>>>> zroot feature@async_destroy enabled local >>>>> zroot feature@empty_bpobj active local >>>>> zroot feature@lz4_compress active local >>>>> zroot feature@multi_vdev_crash_dump enabled local >>>>> zroot feature@spacemap_histogram active local >>>>> zroot feature@enabled_txg active local >>>>> zroot feature@hole_birth active local >>>>> zroot feature@extensible_dataset enabled local >>>>> zroot feature@embedded_data active local >>>>> zroot feature@bookmarks enabled local >>>>> zroot feature@filesystem_limits enabled local >>>>> zroot feature@large_blocks enabled local >>>>> zroot feature@sha512 enabled local >>>>> zroot feature@skein enabled local >>>>> borg-new /home/ler $ >>>>> >>>>> >>>>> borg-new /home/ler $ gpart show >>>>> => 40 3905945520 mfid0 GPT (1.8T) >>>>> 40 1600 1 efi (800K) >>>>> 1640 1024 2 freebsd-boot (512K) >>>>> 2664 1432 - free - (716K) >>>>> 4096 16777216 3 freebsd-swap (8.0G) >>>>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>>>> 3905943552 2008 - free - (1.0M) >>>>> >>>>> => 40 3905945520 mfid1 GPT (1.8T) >>>>> 40 1600 1 efi (800K) >>>>> 1640 1024 2 freebsd-boot (512K) >>>>> 2664 1432 - free - (716K) >>>>> 4096 16777216 3 freebsd-swap (8.0G) >>>>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>>>> 3905943552 2008 - free - (1.0M) >>>>> >>>>> => 40 3905945520 mfid2 GPT (1.8T) >>>>> 40 1600 1 efi (800K) >>>>> 1640 1024 2 freebsd-boot (512K) >>>>> 2664 1432 - free - (716K) >>>>> 4096 16777216 3 freebsd-swap (8.0G) >>>>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>>>> 3905943552 2008 - free - (1.0M) >>>>> >>>>> => 40 3905945520 mfid3 GPT (1.8T) >>>>> 40 1600 1 efi (800K) >>>>> 1640 1024 2 freebsd-boot (512K) >>>>> 2664 16777216 3 freebsd-swap (8.0G) >>>>> 16779880 3889165680 4 freebsd-zfs (1.8T) >>>>> >>>>> => 40 3905945520 mfid5 GPT (1.8T) >>>>> 40 1600 1 efi (800K) >>>>> 1640 1024 2 freebsd-boot (512K) >>>>> 2664 1432 - free - (716K) >>>>> 4096 16777216 3 freebsd-swap (8.0G) >>>>> 16781312 3889162240 4 freebsd-zfs (1.8T) >>>>> 3905943552 2008 - free - (1.0M) >>>>> >>>>> => 40 3905945520 mfid4 GPT (1.8T) >>>>> 40 1600 1 efi (800K) >>>>> 1640 1024 2 freebsd-boot (512K) >>>>> 2664 1432 - free - (716K) >>>>> 4096 16777216 3 freebsd-swap (8.0G) >>>>> 16781312 3889160192 4 freebsd-zfs (1.8T) >>>>> 3905941504 4056 - free - (2.0M) >>>>> >>>>> borg-new /home/ler $ >>>>> >>>>> >>>>> this system was built last week, and I **CAN** >>>>> rebuild it if necessary, but I didn't do anything >>>>> strange (so I thought :) ) >>>>> >>>>> >>>>> >>>>> >>>>> On 01/31/2017 12:30 pm, Steven Hartland wrote: >>>>> >>>>> Your issue is the reported vdev_max_asize > >>>>> vdev_asize: >>>>> vdev_max_asize: 11947471798272 >>>>> vdev_asize: 11947478089728 >>>>> >>>>> max asize is smaller than asize by 6291456 >>>>> >>>>> For raidz1 Xsize should be the smallest disk >>>>> Xsize * disks so: >>>>> 1991245299712 * 6 = 11947471798272 >>>>> >>>>> So your max asize looks right but asize looks >>>>> too big >>>>> >>>>> Expand Size is calculated by: >>>>> if (vd->vdev_aux == NULL && tvd != NULL && >>>>> vd->vdev_max_asize != 0) { >>>>> vs->vs_esize = P2ALIGN(vd->vdev_max_asize - >>>>> vd->vdev_asize, >>>>> 1ULL << tvd->vdev_ms_shift); >>>>> } >>>>> >>>>> So the question is why is asize too big? >>>>> >>>>> Given you seem to have some random disk sizes do >>>>> you have auto expand turned on? >>>>> >>>>> On 31/01/2017 17:39, Larry Rosenman wrote: >>>>> >>>>> vdev_path: n/a, vdev_max_asize: >>>>> 11947471798272, vdev_asize: 11947478089728 >>>>> >>>>> >>>>> -- Larry Rosenman >>>>> http://people.freebsd.org/~ler >>>>> <http://people.freebsd.org/%7Eler> >>>>> Phone: +1 214-642-9640 E-Mail: >>>>> ler@FreeBSD.org <mailto:ler@FreeBSD.org> >>>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 >>>>> >>>>> >>>>> -- Larry Rosenman http://people.freebsd.org/~ler >>>>> <http://people.freebsd.org/%7Eler> >>>>> Phone: +1 214-642-9640 E-Mail: >>>>> ler@FreeBSD.org <mailto:ler@FreeBSD.org> >>>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 >>>>> >>>>> >>>>> -- >>>>> Larry Rosenman http://people.freebsd.org/~ler >>>>> <http://people.freebsd.org/%7Eler> >>>>> Phone: +1 214-642-9640 E-Mail: ler@FreeBSD.org >>>>> <mailto:ler@FreeBSD.org> >>>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281 >>> >>> -- >>> Larry Rosenman http://people.freebsd.org/~ler >>> <http://people.freebsd.org/%7Eler> >>> Phone: +1 214-642-9640 E-Mail: ler@FreeBSD.org >>> <mailto:ler@FreeBSD.org> >>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3691c96a-97c4-6bd1-80b5-02cd929cd219>