Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 4 Feb 2017 21:54:39 -0600
From:      Larry Rosenman <ler@lerctr.org>
To:        Steven Hartland <killing@multiplay.co.uk>
Cc:        Larry Rosenman <ler@FreeBSD.org>, Freebsd fs <freebsd-fs@freebsd.org>
Subject:   Re: 16.0E ExpandSize? -- New Server
Message-ID:  <20170205035438.6gc2ybg6otidzpaz@borg.lerctr.org>
In-Reply-To: <8387d38f-3185-8c07-396b-602c708002a6@multiplay.co.uk>
References:  <22e1bfc5840d972cf93643733682cda1@FreeBSD.org> <f2600a53-0dc1-9f41-1405-ed22d96d30cf@multiplay.co.uk> <8a710dc75c129f58b0372eeaeca575b5@FreeBSD.org> <aef02eb0-0888-6fea-a4b8-4033ca56f4a3@multiplay.co.uk> <d3181bd00c827fb99fbcebe6fe097ef8@FreeBSD.org> <a3d78923-5046-11c8-daea-713eacf47bd2@multiplay.co.uk> <ffc24b7bfacd265d637b633566bbaa51@FreeBSD.org> <96534515-4fcb-774e-a599-8d48aec930cd@multiplay.co.uk> <a98b3a3da1665c8eac6160633a0bc778@FreeBSD.org> <8387d38f-3185-8c07-396b-602c708002a6@multiplay.co.uk>

next in thread | previous in thread | raw e-mail | index | archive | help
I saw it was accepted upstream. Can it be committed to FreeBSD?


On Wed, Feb 01, 2017 at 02:43:51AM +0000, Steven Hartland wrote:
> Thanks I've put a PR in upstream to get some eyes on the fix.
> https://github.com/openzfs/openzfs/pull/296
> 
> If no objections are raised to the approach I've used I'll commit the fix to
> HEAD too.
> 
> On 01/02/2017 02:31, Larry Rosenman wrote:
> > 
> > no grief that I can see:
> > 
> > borg-new /home/ler $ sudo zdb
> > Password:
> > zroot:
> > version: 5000
> > name: 'zroot'
> > state: 0
> > txg: 96143
> > pool_guid: 11945658884309024932
> > hostid: 3619181042
> > hostname: ''
> > com.delphix:has_per_vdev_zaps
> > vdev_children: 1
> > vdev_tree:
> > type: 'root'
> > id: 0
> > guid: 11945658884309024932
> > create_txg: 4
> > children[0]:
> > type: 'raidz'
> > id: 0
> > guid: 7596925654112466913
> > nparity: 1
> > metaslab_array: 42
> > metaslab_shift: 36
> > ashift: 12
> > asize: 11947471798272
> > is_log: 0
> > create_txg: 4
> > com.delphix:vdev_zap_top: 35
> > children[0]:
> > type: 'disk'
> > id: 0
> > guid: 1443238581175429852
> > path: '/dev/mfid4p4'
> > whole_disk: 1
> > DTL: 137
> > create_txg: 4
> > com.delphix:vdev_zap_leaf: 131
> > children[1]:
> > type: 'disk'
> > id: 1
> > guid: 1865792721003775978
> > path: '/dev/mfid0p4'
> > whole_disk: 1
> > DTL: 133
> > create_txg: 4
> > com.delphix:vdev_zap_leaf: 37
> > children[2]:
> > type: 'disk'
> > id: 2
> > guid: 12541720522827927342
> > path: '/dev/mfid1p4'
> > whole_disk: 1
> > DTL: 132
> > create_txg: 4
> > com.delphix:vdev_zap_leaf: 38
> > children[3]:
> > type: 'disk'
> > id: 3
> > guid: 13053934791777776444
> > path: '/dev/mfid3p4'
> > whole_disk: 1
> > DTL: 136
> > create_txg: 4
> > com.delphix:vdev_zap_leaf: 135
> > children[4]:
> > type: 'disk'
> > id: 4
> > guid: 4432707573898874857
> > path: '/dev/mfid2p4'
> > whole_disk: 1
> > DTL: 130
> > create_txg: 4
> > com.delphix:vdev_zap_leaf: 40
> > children[5]:
> > type: 'disk'
> > id: 5
> > guid: 5106293125005422556
> > path: '/dev/mfid5p4'
> > whole_disk: 1
> > DTL: 129
> > create_txg: 4
> > com.delphix:vdev_zap_leaf: 41
> > features_for_read:
> > com.delphix:hole_birth
> > com.delphix:embedded_data
> > borg-new /home/ler $ sudo zpool list -v
> > NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
> > zroot 10.8T 94.3G 10.7T - 0% 0% 1.00x ONLINE -
> > raidz1 10.8T 94.3G 10.7T - 0% 0%
> > mfid4p4 - - - - - -
> > mfid0p4 - - - - - -
> > mfid1p4 - - - - - -
> > mfid3p4 - - - - - -
> > mfid2p4 - - - - - -
> > mfid5p4 - - - - - -
> > borg-new /home/ler $ sudo zpool get all
> > NAME PROPERTY VALUE SOURCE
> > zroot size 10.8T -
> > zroot capacity 0% -
> > zroot altroot - default
> > zroot health ONLINE -
> > zroot guid 11945658884309024932 default
> > zroot version - default
> > zroot bootfs zroot/ROOT/default local
> > zroot delegation on default
> > zroot autoreplace off default
> > zroot cachefile - default
> > zroot failmode wait default
> > zroot listsnapshots off default
> > zroot autoexpand off default
> > zroot dedupditto 0 default
> > zroot dedupratio 1.00x -
> > zroot free 10.7T -
> > zroot allocated 94.3G -
> > zroot readonly off -
> > zroot comment - default
> > zroot expandsize - -
> > zroot freeing 0 default
> > zroot fragmentation 0% -
> > zroot leaked 0 default
> > zroot feature@async_destroy enabled local
> > zroot feature@empty_bpobj active local
> > zroot feature@lz4_compress active local
> > zroot feature@multi_vdev_crash_dump enabled local
> > zroot feature@spacemap_histogram active local
> > zroot feature@enabled_txg active local
> > zroot feature@hole_birth active local
> > zroot feature@extensible_dataset enabled local
> > zroot feature@embedded_data active local
> > zroot feature@bookmarks enabled local
> > zroot feature@filesystem_limits enabled local
> > zroot feature@large_blocks enabled local
> > zroot feature@sha512 enabled local
> > zroot feature@skein enabled local
> > borg-new /home/ler $
> > 
> > 
> > 
> > On 01/31/2017 5:22 pm, Steven Hartland wrote:
> > 
> > > Yep
> > > 
> > > On 31/01/2017 21:49, Larry Rosenman wrote:
> > > > 
> > > > revert the other patch and apply this one?
> > > > 
> > > > 
> > > > 
> > > > On 01/31/2017 3:47 pm, Steven Hartland wrote:
> > > > 
> > > >     Hmm, looks like there's also a bug in the way vdev_min_asize is
> > > >     calculated for raidz as it can and has resulted in child
> > > >     min_asize which won't provided enough space for the parent due
> > > >     to the use of unrounded integer division.
> > > > 
> > > >     1981411579221 * 6 = 11888469475326 < 11888469475328
> > > > 
> > > >     You should have vdev_min_asize: 1981411579222 for your children.
> > > > 
> > > >     Updated patch attached, however calculation still isn't 100%
> > > >     reversible so may need work, however it does now ensure that the
> > > >     children will provide enough capacity for min_asize even if all
> > > >     of them are shrunk to their individual min_asize, which I
> > > >     believe previously may not have been the case.
> > > > 
> > > >     This isn't related to the incorrect EXPANDSZ output, but would
> > > >     be good if you could confirm it doesn't cause any issues for
> > > >     your pool given its state.
> > > > 
> > > >     On 31/01/2017 21:00, Larry Rosenman wrote:
> > > > 
> > > >         borg-new /home/ler $ sudo ./vdev-stats.d
> > > >         Password:
> > > >         vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0,
> > > >         vdev_min_asize: 0
> > > >         vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
> > > >         11947478089728, vdev_min_asize: 11888469475328
> > > >         vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712,
> > > >         vdev_asize: 1991245299712, vdev_min_asize: 1981411579221
> > > >         vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288,
> > > >         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
> > > >         vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288,
> > > >         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
> > > >         vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152,
> > > >         vdev_asize: 1991247921152, vdev_min_asize: 1981411579221
> > > >         vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288,
> > > >         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
> > > >         vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288,
> > > >         vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
> > > >         ^C
> > > > 
> > > >         borg-new /home/ler $
> > > > 
> > > > 
> > > >         borg-new /home/ler $ sudo zpool list -v
> > > >         Password:
> > > >         NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
> > > >         zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
> > > >         raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
> > > >         mfid4p4 - - - - - -
> > > >         mfid0p4 - - - - - -
> > > >         mfid1p4 - - - - - -
> > > >         mfid3p4 - - - - - -
> > > >         mfid2p4 - - - - - -
> > > >         mfid5p4 - - - - - -
> > > >         borg-new /home/ler $
> > > > 
> > > > 
> > > >         On 01/31/2017 2:37 pm, Steven Hartland wrote:
> > > > 
> > > >             In that case based on your zpool history I suspect that
> > > >             the original mfid4p4 was the same size as mfid0p4
> > > >             (1991246348288) but its been replaced with a drive which
> > > >             is (1991245299712), slightly smaller.
> > > > 
> > > >             This smaller size results in a max_asize of
> > > >             1991245299712 * 6 instead of original 1991246348288* 6.
> > > > 
> > > >             Now given the way min_asize (the value used to check if
> > > >             the device size is acceptable) is rounded to the the
> > > >             nearest metaslab I believe that replace would be allowed.
> > > >             https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
> > > > 
> > > >             Now the problem is that on open the calculated asize is
> > > >             only updated if its expanding:
> > > >             https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
> > > > 
> > > >             The updated dtrace file outputs vdev_min_asize which
> > > >             should confirm my suspicion about why the replace was
> > > >             allowed.
> > > > 
> > > >             On 31/01/2017 19:05, Larry Rosenman wrote:
> > > > 
> > > >                 I've replaced some disks due to failure, and some of
> > > >                 the pariition sizes are different.
> > > > 
> > > > 
> > > >                 autoexpand is off:
> > > > 
> > > >                 borg-new /home/ler $ zpool get all zroot
> > > >                 NAME PROPERTY VALUE SOURCE
> > > >                 zroot size 10.8T -
> > > >                 zroot capacity 0% -
> > > >                 zroot altroot - default
> > > >                 zroot health ONLINE -
> > > >                 zroot guid 11945658884309024932 default
> > > >                 zroot version - default
> > > >                 zroot bootfs zroot/ROOT/default local
> > > >                 zroot delegation on default
> > > >                 zroot autoreplace off default
> > > >                 zroot cachefile - default
> > > >                 zroot failmode wait default
> > > >                 zroot listsnapshots off default
> > > >                 zroot autoexpand off default
> > > >                 zroot dedupditto 0 default
> > > >                 zroot dedupratio 1.00x -
> > > >                 zroot free 10.7T -
> > > >                 zroot allocated 94.3G -
> > > >                 zroot readonly off -
> > > >                 zroot comment - default
> > > >                 zroot expandsize 16.0E -
> > > >                 zroot freeing 0 default
> > > >                 zroot fragmentation 0% -
> > > >                 zroot leaked 0 default
> > > >                 zroot feature@async_destroy enabled local
> > > >                 zroot feature@empty_bpobj active local
> > > >                 zroot feature@lz4_compress active local
> > > >                 zroot feature@multi_vdev_crash_dump enabled local
> > > >                 zroot feature@spacemap_histogram active local
> > > >                 zroot feature@enabled_txg active local
> > > >                 zroot feature@hole_birth active local
> > > >                 zroot feature@extensible_dataset enabled local
> > > >                 zroot feature@embedded_data active local
> > > >                 zroot feature@bookmarks enabled local
> > > >                 zroot feature@filesystem_limits enabled local
> > > >                 zroot feature@large_blocks enabled local
> > > >                 zroot feature@sha512 enabled local
> > > >                 zroot feature@skein enabled local
> > > >                 borg-new /home/ler $
> > > > 
> > > > 
> > > >                 borg-new /home/ler $ gpart show
> > > >                 => 40 3905945520 mfid0 GPT (1.8T)
> > > >                 40 1600 1 efi (800K)
> > > >                 1640 1024 2 freebsd-boot (512K)
> > > >                 2664 1432 - free - (716K)
> > > >                 4096 16777216 3 freebsd-swap (8.0G)
> > > >                 16781312 3889162240 4 freebsd-zfs (1.8T)
> > > >                 3905943552 2008 - free - (1.0M)
> > > > 
> > > >                 => 40 3905945520 mfid1 GPT (1.8T)
> > > >                 40 1600 1 efi (800K)
> > > >                 1640 1024 2 freebsd-boot (512K)
> > > >                 2664 1432 - free - (716K)
> > > >                 4096 16777216 3 freebsd-swap (8.0G)
> > > >                 16781312 3889162240 4 freebsd-zfs (1.8T)
> > > >                 3905943552 2008 - free - (1.0M)
> > > > 
> > > >                 => 40 3905945520 mfid2 GPT (1.8T)
> > > >                 40 1600 1 efi (800K)
> > > >                 1640 1024 2 freebsd-boot (512K)
> > > >                 2664 1432 - free - (716K)
> > > >                 4096 16777216 3 freebsd-swap (8.0G)
> > > >                 16781312 3889162240 4 freebsd-zfs (1.8T)
> > > >                 3905943552 2008 - free - (1.0M)
> > > > 
> > > >                 => 40 3905945520 mfid3 GPT (1.8T)
> > > >                 40 1600 1 efi (800K)
> > > >                 1640 1024 2 freebsd-boot (512K)
> > > >                 2664 16777216 3 freebsd-swap (8.0G)
> > > >                 16779880 3889165680 4 freebsd-zfs (1.8T)
> > > > 
> > > >                 => 40 3905945520 mfid5 GPT (1.8T)
> > > >                 40 1600 1 efi (800K)
> > > >                 1640 1024 2 freebsd-boot (512K)
> > > >                 2664 1432 - free - (716K)
> > > >                 4096 16777216 3 freebsd-swap (8.0G)
> > > >                 16781312 3889162240 4 freebsd-zfs (1.8T)
> > > >                 3905943552 2008 - free - (1.0M)
> > > > 
> > > >                 => 40 3905945520 mfid4 GPT (1.8T)
> > > >                 40 1600 1 efi (800K)
> > > >                 1640 1024 2 freebsd-boot (512K)
> > > >                 2664 1432 - free - (716K)
> > > >                 4096 16777216 3 freebsd-swap (8.0G)
> > > >                 16781312 3889160192 4 freebsd-zfs (1.8T)
> > > >                 3905941504 4056 - free - (2.0M)
> > > > 
> > > >                 borg-new /home/ler $
> > > > 
> > > > 
> > > >                 this system was built last week, and I **CAN**
> > > >                 rebuild it if necessary, but I didn't do anything
> > > >                 strange (so I thought :) )
> > > > 
> > > > 
> > > > 
> > > > 
> > > >                 On 01/31/2017 12:30 pm, Steven Hartland wrote:
> > > > 
> > > >                     Your issue is the reported vdev_max_asize >
> > > >                     vdev_asize:
> > > >                     vdev_max_asize: 11947471798272
> > > >                     vdev_asize:     11947478089728
> > > > 
> > > >                     max asize is smaller than asize by 6291456
> > > > 
> > > >                     For raidz1 Xsize should be the smallest disk
> > > >                     Xsize * disks so:
> > > >                     1991245299712 * 6 = 11947471798272
> > > > 
> > > >                     So your max asize looks right but asize looks
> > > >                     too big
> > > > 
> > > >                     Expand Size is calculated by:
> > > >                     if (vd->vdev_aux == NULL && tvd != NULL &&
> > > >                     vd->vdev_max_asize != 0) {
> > > >                         vs->vs_esize = P2ALIGN(vd->vdev_max_asize -
> > > >                     vd->vdev_asize,
> > > >                             1ULL << tvd->vdev_ms_shift);
> > > >                     }
> > > > 
> > > >                     So the question is why is asize too big?
> > > > 
> > > >                     Given you seem to have some random disk sizes do
> > > >                     you have auto expand turned on?
> > > > 
> > > >                     On 31/01/2017 17:39, Larry Rosenman wrote:
> > > > 
> > > >                         vdev_path: n/a, vdev_max_asize:
> > > >                         11947471798272, vdev_asize: 11947478089728
> > > > 
> > > > 
> > > >                 --                 Larry Rosenman
> > > > http://people.freebsd.org/~ler
> > > >                 <http://people.freebsd.org/%7Eler>;
> > > >                 Phone: +1 214-642-9640                 E-Mail:
> > > >                 ler@FreeBSD.org <mailto:ler@FreeBSD.org>
> > > >                 US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
> > > > 
> > > > 
> > > >         --         Larry Rosenman http://people.freebsd.org/~ler
> > > >         <http://people.freebsd.org/%7Eler>;
> > > >         Phone: +1 214-642-9640                 E-Mail:
> > > >         ler@FreeBSD.org <mailto:ler@FreeBSD.org>
> > > >         US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
> > > > 
> > > > 
> > > > -- 
> > > > Larry Rosenman http://people.freebsd.org/~ler
> > > > <http://people.freebsd.org/%7Eler>;
> > > > Phone: +1 214-642-9640                 E-Mail: ler@FreeBSD.org
> > > > <mailto:ler@FreeBSD.org>
> > > > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
> > 
> > 
> > -- 
> > Larry Rosenman http://people.freebsd.org/~ler
> > <http://people.freebsd.org/%7Eler>;
> > Phone: +1 214-642-9640                 E-Mail: ler@FreeBSD.org
> > <mailto:ler@FreeBSD.org>
> > US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
> 

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 214-642-9640                 E-Mail: ler@lerctr.org
US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170205035438.6gc2ybg6otidzpaz>