Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 Feb 2017 23:16:27 +0000
From:      Steven Hartland <killing@multiplay.co.uk>
To:        Larry Rosenman <ler@FreeBSD.org>, Freebsd fs <freebsd-fs@freebsd.org>
Subject:   Re: 16.0E ExpandSize? -- New Server
Message-ID:  <3691c96a-97c4-6bd1-80b5-02cd929cd219@multiplay.co.uk>
In-Reply-To: <20170205035438.6gc2ybg6otidzpaz@borg.lerctr.org>
References:  <22e1bfc5840d972cf93643733682cda1@FreeBSD.org> <f2600a53-0dc1-9f41-1405-ed22d96d30cf@multiplay.co.uk> <8a710dc75c129f58b0372eeaeca575b5@FreeBSD.org> <aef02eb0-0888-6fea-a4b8-4033ca56f4a3@multiplay.co.uk> <d3181bd00c827fb99fbcebe6fe097ef8@FreeBSD.org> <a3d78923-5046-11c8-daea-713eacf47bd2@multiplay.co.uk> <ffc24b7bfacd265d637b633566bbaa51@FreeBSD.org> <96534515-4fcb-774e-a599-8d48aec930cd@multiplay.co.uk> <a98b3a3da1665c8eac6160633a0bc778@FreeBSD.org> <8387d38f-3185-8c07-396b-602c708002a6@multiplay.co.uk> <20170205035438.6gc2ybg6otidzpaz@borg.lerctr.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Its still actually waiting on review, the merged icon at the bottom was 
for an unrelated PR which reference mine as they saw the same random 
test failure as I did.

On 05/02/2017 03:54, Larry Rosenman wrote:
> I saw it was accepted upstream. Can it be committed to FreeBSD?
>
>
> On Wed, Feb 01, 2017 at 02:43:51AM +0000, Steven Hartland wrote:
>> Thanks I've put a PR in upstream to get some eyes on the fix.
>> https://github.com/openzfs/openzfs/pull/296
>>
>> If no objections are raised to the approach I've used I'll commit the fix to
>> HEAD too.
>>
>> On 01/02/2017 02:31, Larry Rosenman wrote:
>>> no grief that I can see:
>>>
>>> borg-new /home/ler $ sudo zdb
>>> Password:
>>> zroot:
>>> version: 5000
>>> name: 'zroot'
>>> state: 0
>>> txg: 96143
>>> pool_guid: 11945658884309024932
>>> hostid: 3619181042
>>> hostname: ''
>>> com.delphix:has_per_vdev_zaps
>>> vdev_children: 1
>>> vdev_tree:
>>> type: 'root'
>>> id: 0
>>> guid: 11945658884309024932
>>> create_txg: 4
>>> children[0]:
>>> type: 'raidz'
>>> id: 0
>>> guid: 7596925654112466913
>>> nparity: 1
>>> metaslab_array: 42
>>> metaslab_shift: 36
>>> ashift: 12
>>> asize: 11947471798272
>>> is_log: 0
>>> create_txg: 4
>>> com.delphix:vdev_zap_top: 35
>>> children[0]:
>>> type: 'disk'
>>> id: 0
>>> guid: 1443238581175429852
>>> path: '/dev/mfid4p4'
>>> whole_disk: 1
>>> DTL: 137
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 131
>>> children[1]:
>>> type: 'disk'
>>> id: 1
>>> guid: 1865792721003775978
>>> path: '/dev/mfid0p4'
>>> whole_disk: 1
>>> DTL: 133
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 37
>>> children[2]:
>>> type: 'disk'
>>> id: 2
>>> guid: 12541720522827927342
>>> path: '/dev/mfid1p4'
>>> whole_disk: 1
>>> DTL: 132
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 38
>>> children[3]:
>>> type: 'disk'
>>> id: 3
>>> guid: 13053934791777776444
>>> path: '/dev/mfid3p4'
>>> whole_disk: 1
>>> DTL: 136
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 135
>>> children[4]:
>>> type: 'disk'
>>> id: 4
>>> guid: 4432707573898874857
>>> path: '/dev/mfid2p4'
>>> whole_disk: 1
>>> DTL: 130
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 40
>>> children[5]:
>>> type: 'disk'
>>> id: 5
>>> guid: 5106293125005422556
>>> path: '/dev/mfid5p4'
>>> whole_disk: 1
>>> DTL: 129
>>> create_txg: 4
>>> com.delphix:vdev_zap_leaf: 41
>>> features_for_read:
>>> com.delphix:hole_birth
>>> com.delphix:embedded_data
>>> borg-new /home/ler $ sudo zpool list -v
>>> NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>>> zroot 10.8T 94.3G 10.7T - 0% 0% 1.00x ONLINE -
>>> raidz1 10.8T 94.3G 10.7T - 0% 0%
>>> mfid4p4 - - - - - -
>>> mfid0p4 - - - - - -
>>> mfid1p4 - - - - - -
>>> mfid3p4 - - - - - -
>>> mfid2p4 - - - - - -
>>> mfid5p4 - - - - - -
>>> borg-new /home/ler $ sudo zpool get all
>>> NAME PROPERTY VALUE SOURCE
>>> zroot size 10.8T -
>>> zroot capacity 0% -
>>> zroot altroot - default
>>> zroot health ONLINE -
>>> zroot guid 11945658884309024932 default
>>> zroot version - default
>>> zroot bootfs zroot/ROOT/default local
>>> zroot delegation on default
>>> zroot autoreplace off default
>>> zroot cachefile - default
>>> zroot failmode wait default
>>> zroot listsnapshots off default
>>> zroot autoexpand off default
>>> zroot dedupditto 0 default
>>> zroot dedupratio 1.00x -
>>> zroot free 10.7T -
>>> zroot allocated 94.3G -
>>> zroot readonly off -
>>> zroot comment - default
>>> zroot expandsize - -
>>> zroot freeing 0 default
>>> zroot fragmentation 0% -
>>> zroot leaked 0 default
>>> zroot feature@async_destroy enabled local
>>> zroot feature@empty_bpobj active local
>>> zroot feature@lz4_compress active local
>>> zroot feature@multi_vdev_crash_dump enabled local
>>> zroot feature@spacemap_histogram active local
>>> zroot feature@enabled_txg active local
>>> zroot feature@hole_birth active local
>>> zroot feature@extensible_dataset enabled local
>>> zroot feature@embedded_data active local
>>> zroot feature@bookmarks enabled local
>>> zroot feature@filesystem_limits enabled local
>>> zroot feature@large_blocks enabled local
>>> zroot feature@sha512 enabled local
>>> zroot feature@skein enabled local
>>> borg-new /home/ler $
>>>
>>>
>>>
>>> On 01/31/2017 5:22 pm, Steven Hartland wrote:
>>>
>>>> Yep
>>>>
>>>> On 31/01/2017 21:49, Larry Rosenman wrote:
>>>>> revert the other patch and apply this one?
>>>>>
>>>>>
>>>>>
>>>>> On 01/31/2017 3:47 pm, Steven Hartland wrote:
>>>>>
>>>>>      Hmm, looks like there's also a bug in the way vdev_min_asize is
>>>>>      calculated for raidz as it can and has resulted in child
>>>>>      min_asize which won't provided enough space for the parent due
>>>>>      to the use of unrounded integer division.
>>>>>
>>>>>      1981411579221 * 6 = 11888469475326 < 11888469475328
>>>>>
>>>>>      You should have vdev_min_asize: 1981411579222 for your children.
>>>>>
>>>>>      Updated patch attached, however calculation still isn't 100%
>>>>>      reversible so may need work, however it does now ensure that the
>>>>>      children will provide enough capacity for min_asize even if all
>>>>>      of them are shrunk to their individual min_asize, which I
>>>>>      believe previously may not have been the case.
>>>>>
>>>>>      This isn't related to the incorrect EXPANDSZ output, but would
>>>>>      be good if you could confirm it doesn't cause any issues for
>>>>>      your pool given its state.
>>>>>
>>>>>      On 31/01/2017 21:00, Larry Rosenman wrote:
>>>>>
>>>>>          borg-new /home/ler $ sudo ./vdev-stats.d
>>>>>          Password:
>>>>>          vdev_path: n/a, vdev_max_asize: 0, vdev_asize: 0,
>>>>>          vdev_min_asize: 0
>>>>>          vdev_path: n/a, vdev_max_asize: 11947471798272, vdev_asize:
>>>>>          11947478089728, vdev_min_asize: 11888469475328
>>>>>          vdev_path: /dev/mfid4p4, vdev_max_asize: 1991245299712,
>>>>>          vdev_asize: 1991245299712, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid0p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid1p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid3p4, vdev_max_asize: 1991247921152,
>>>>>          vdev_asize: 1991247921152, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid2p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          vdev_path: /dev/mfid5p4, vdev_max_asize: 1991246348288,
>>>>>          vdev_asize: 1991246348288, vdev_min_asize: 1981411579221
>>>>>          ^C
>>>>>
>>>>>          borg-new /home/ler $
>>>>>
>>>>>
>>>>>          borg-new /home/ler $ sudo zpool list -v
>>>>>          Password:
>>>>>          NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>>>>>          zroot 10.8T 94.3G 10.7T 16.0E 0% 0% 1.00x ONLINE -
>>>>>          raidz1 10.8T 94.3G 10.7T 16.0E 0% 0%
>>>>>          mfid4p4 - - - - - -
>>>>>          mfid0p4 - - - - - -
>>>>>          mfid1p4 - - - - - -
>>>>>          mfid3p4 - - - - - -
>>>>>          mfid2p4 - - - - - -
>>>>>          mfid5p4 - - - - - -
>>>>>          borg-new /home/ler $
>>>>>
>>>>>
>>>>>          On 01/31/2017 2:37 pm, Steven Hartland wrote:
>>>>>
>>>>>              In that case based on your zpool history I suspect that
>>>>>              the original mfid4p4 was the same size as mfid0p4
>>>>>              (1991246348288) but its been replaced with a drive which
>>>>>              is (1991245299712), slightly smaller.
>>>>>
>>>>>              This smaller size results in a max_asize of
>>>>>              1991245299712 * 6 instead of original 1991246348288* 6.
>>>>>
>>>>>              Now given the way min_asize (the value used to check if
>>>>>              the device size is acceptable) is rounded to the the
>>>>>              nearest metaslab I believe that replace would be allowed.
>>>>>              https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c#L4947
>>>>>
>>>>>              Now the problem is that on open the calculated asize is
>>>>>              only updated if its expanding:
>>>>>              https://github.com/freebsd/freebsd/blob/master/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c#L1424
>>>>>
>>>>>              The updated dtrace file outputs vdev_min_asize which
>>>>>              should confirm my suspicion about why the replace was
>>>>>              allowed.
>>>>>
>>>>>              On 31/01/2017 19:05, Larry Rosenman wrote:
>>>>>
>>>>>                  I've replaced some disks due to failure, and some of
>>>>>                  the pariition sizes are different.
>>>>>
>>>>>
>>>>>                  autoexpand is off:
>>>>>
>>>>>                  borg-new /home/ler $ zpool get all zroot
>>>>>                  NAME PROPERTY VALUE SOURCE
>>>>>                  zroot size 10.8T -
>>>>>                  zroot capacity 0% -
>>>>>                  zroot altroot - default
>>>>>                  zroot health ONLINE -
>>>>>                  zroot guid 11945658884309024932 default
>>>>>                  zroot version - default
>>>>>                  zroot bootfs zroot/ROOT/default local
>>>>>                  zroot delegation on default
>>>>>                  zroot autoreplace off default
>>>>>                  zroot cachefile - default
>>>>>                  zroot failmode wait default
>>>>>                  zroot listsnapshots off default
>>>>>                  zroot autoexpand off default
>>>>>                  zroot dedupditto 0 default
>>>>>                  zroot dedupratio 1.00x -
>>>>>                  zroot free 10.7T -
>>>>>                  zroot allocated 94.3G -
>>>>>                  zroot readonly off -
>>>>>                  zroot comment - default
>>>>>                  zroot expandsize 16.0E -
>>>>>                  zroot freeing 0 default
>>>>>                  zroot fragmentation 0% -
>>>>>                  zroot leaked 0 default
>>>>>                  zroot feature@async_destroy enabled local
>>>>>                  zroot feature@empty_bpobj active local
>>>>>                  zroot feature@lz4_compress active local
>>>>>                  zroot feature@multi_vdev_crash_dump enabled local
>>>>>                  zroot feature@spacemap_histogram active local
>>>>>                  zroot feature@enabled_txg active local
>>>>>                  zroot feature@hole_birth active local
>>>>>                  zroot feature@extensible_dataset enabled local
>>>>>                  zroot feature@embedded_data active local
>>>>>                  zroot feature@bookmarks enabled local
>>>>>                  zroot feature@filesystem_limits enabled local
>>>>>                  zroot feature@large_blocks enabled local
>>>>>                  zroot feature@sha512 enabled local
>>>>>                  zroot feature@skein enabled local
>>>>>                  borg-new /home/ler $
>>>>>
>>>>>
>>>>>                  borg-new /home/ler $ gpart show
>>>>>                  => 40 3905945520 mfid0 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid1 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid2 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid3 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 16777216 3 freebsd-swap (8.0G)
>>>>>                  16779880 3889165680 4 freebsd-zfs (1.8T)
>>>>>
>>>>>                  => 40 3905945520 mfid5 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889162240 4 freebsd-zfs (1.8T)
>>>>>                  3905943552 2008 - free - (1.0M)
>>>>>
>>>>>                  => 40 3905945520 mfid4 GPT (1.8T)
>>>>>                  40 1600 1 efi (800K)
>>>>>                  1640 1024 2 freebsd-boot (512K)
>>>>>                  2664 1432 - free - (716K)
>>>>>                  4096 16777216 3 freebsd-swap (8.0G)
>>>>>                  16781312 3889160192 4 freebsd-zfs (1.8T)
>>>>>                  3905941504 4056 - free - (2.0M)
>>>>>
>>>>>                  borg-new /home/ler $
>>>>>
>>>>>
>>>>>                  this system was built last week, and I **CAN**
>>>>>                  rebuild it if necessary, but I didn't do anything
>>>>>                  strange (so I thought :) )
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                  On 01/31/2017 12:30 pm, Steven Hartland wrote:
>>>>>
>>>>>                      Your issue is the reported vdev_max_asize >
>>>>>                      vdev_asize:
>>>>>                      vdev_max_asize: 11947471798272
>>>>>                      vdev_asize:     11947478089728
>>>>>
>>>>>                      max asize is smaller than asize by 6291456
>>>>>
>>>>>                      For raidz1 Xsize should be the smallest disk
>>>>>                      Xsize * disks so:
>>>>>                      1991245299712 * 6 = 11947471798272
>>>>>
>>>>>                      So your max asize looks right but asize looks
>>>>>                      too big
>>>>>
>>>>>                      Expand Size is calculated by:
>>>>>                      if (vd->vdev_aux == NULL && tvd != NULL &&
>>>>>                      vd->vdev_max_asize != 0) {
>>>>>                          vs->vs_esize = P2ALIGN(vd->vdev_max_asize -
>>>>>                      vd->vdev_asize,
>>>>>                              1ULL << tvd->vdev_ms_shift);
>>>>>                      }
>>>>>
>>>>>                      So the question is why is asize too big?
>>>>>
>>>>>                      Given you seem to have some random disk sizes do
>>>>>                      you have auto expand turned on?
>>>>>
>>>>>                      On 31/01/2017 17:39, Larry Rosenman wrote:
>>>>>
>>>>>                          vdev_path: n/a, vdev_max_asize:
>>>>>                          11947471798272, vdev_asize: 11947478089728
>>>>>
>>>>>
>>>>>                  --                 Larry Rosenman
>>>>> http://people.freebsd.org/~ler
>>>>>                  <http://people.freebsd.org/%7Eler>;
>>>>>                  Phone: +1 214-642-9640                 E-Mail:
>>>>>                  ler@FreeBSD.org <mailto:ler@FreeBSD.org>
>>>>>                  US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>>>
>>>>>
>>>>>          --         Larry Rosenman http://people.freebsd.org/~ler
>>>>>          <http://people.freebsd.org/%7Eler>;
>>>>>          Phone: +1 214-642-9640                 E-Mail:
>>>>>          ler@FreeBSD.org <mailto:ler@FreeBSD.org>
>>>>>          US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>>>
>>>>>
>>>>> -- 
>>>>> Larry Rosenman http://people.freebsd.org/~ler
>>>>> <http://people.freebsd.org/%7Eler>;
>>>>> Phone: +1 214-642-9640                 E-Mail: ler@FreeBSD.org
>>>>> <mailto:ler@FreeBSD.org>
>>>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281
>>>
>>> -- 
>>> Larry Rosenman http://people.freebsd.org/~ler
>>> <http://people.freebsd.org/%7Eler>;
>>> Phone: +1 214-642-9640                 E-Mail: ler@FreeBSD.org
>>> <mailto:ler@FreeBSD.org>
>>> US Mail: 17716 Limpia Crk, Round Rock, TX 78664-7281




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3691c96a-97c4-6bd1-80b5-02cd929cd219>