Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 17 Jan 2015 01:07:50 +0200
From:      Mihai Vintila <unixro@gmail.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: Poor performance on Intel P3600 NVME driver
Message-ID:  <54B999C6.2090909@gmail.com>
In-Reply-To: <20150116221344.GA72201@pit.databus.com>
References:  <54B7F769.40605@gmail.com> <20150115175927.GA19071@zxy.spb.ru> <54B8C7E9.3030602@gmail.com> <CAKAYmM%2BEpvOYfFnF1i02aoTC2vfVT%2Bq=6XvCPBnYg-mRQAVD1A@mail.gmail.com> <CAORpBLUYzHw%2BRRccRpkcxYrnjnckRMLyAc86J9UPpirRSeWxqQ@mail.gmail.com> <20150116221344.GA72201@pit.databus.com>

next in thread | previous in thread | raw e-mail | index | archive | help
I've remade the test with atime=off. Drive has 512b physical, but I've 
created it with 4k gnop anyway. Results are similar with atime
         Processor cache line size set to 32 bytes.
         File stride size set to 17 * record size.
random  random bk wd   record   stride
               KB  reclen   write rewrite    read    reread read   write 
re ad  rewrite     read   fwrite frewrite   fread  freread
          1048576       4   74427       0   101744        0 93529   47925
          1048576       8   39072       0    64693        0 61104   25452

I've also tried to increase vfs.zfs.vdev.aggregation_limit and ended up 
with a crash (screenshot attached)

I'm attaching zfs tunables:
sysctl -a|grep vfs.zfs
vfs.zfs.arc_max: 34359738368
vfs.zfs.arc_min: 4294967296
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_meta_used: 5732232
vfs.zfs.arc_meta_limit: 8589934592
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_norw: 1
vfs.zfs.anon_size: 32768
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_data_lsize: 0
vfs.zfs.mru_size: 17841664
vfs.zfs.mru_metadata_lsize: 858624
vfs.zfs.mru_data_lsize: 13968384
vfs.zfs.mru_ghost_size: 0
vfs.zfs.mru_ghost_metadata_lsize: 0
vfs.zfs.mru_ghost_data_lsize: 0
vfs.zfs.mfu_size: 4574208
vfs.zfs.mfu_metadata_lsize: 465408
vfs.zfs.mfu_data_lsize: 4051456
vfs.zfs.mfu_ghost_size: 0
vfs.zfs.mfu_ghost_metadata_lsize: 0
vfs.zfs.mfu_ghost_data_lsize: 0
vfs.zfs.l2c_only_size: 0
vfs.zfs.dedup.prefetch: 1
vfs.zfs.nopwrite_enabled: 1
vfs.zfs.mdcomp_disable: 0
vfs.zfs.dirty_data_max: 4294967296
vfs.zfs.dirty_data_max_max: 4294967296
vfs.zfs.dirty_data_max_percent: 10
vfs.zfs.dirty_data_sync: 67108864
vfs.zfs.delay_min_dirty_percent: 60
vfs.zfs.delay_scale: 500000
vfs.zfs.prefetch_disable: 1
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.top_maxinflight: 32
vfs.zfs.resilver_delay: 2
vfs.zfs.scrub_delay: 4
vfs.zfs.scan_idle: 50
vfs.zfs.scan_min_time_ms: 1000
vfs.zfs.free_min_time_ms: 1000
vfs.zfs.resilver_min_time_ms: 3000
vfs.zfs.no_scrub_io: 0
vfs.zfs.no_scrub_prefetch: 0
vfs.zfs.metaslab.gang_bang: 131073
vfs.zfs.metaslab.fragmentation_threshold: 70
vfs.zfs.metaslab.debug_load: 0
vfs.zfs.metaslab.debug_unload: 0
vfs.zfs.metaslab.df_alloc_threshold: 131072
vfs.zfs.metaslab.df_free_pct: 4
vfs.zfs.metaslab.min_alloc_size: 10485760
vfs.zfs.metaslab.load_pct: 50
vfs.zfs.metaslab.unload_delay: 8
vfs.zfs.metaslab.preload_limit: 3
vfs.zfs.metaslab.preload_enabled: 1
vfs.zfs.metaslab.fragmentation_factor_enabled: 1
vfs.zfs.metaslab.lba_weighting_enabled: 1
vfs.zfs.metaslab.bias_enabled: 1
vfs.zfs.condense_pct: 200
vfs.zfs.mg_noalloc_threshold: 0
vfs.zfs.mg_fragmentation_threshold: 85
vfs.zfs.check_hostid: 1
vfs.zfs.spa_load_verify_maxinflight: 10000
vfs.zfs.spa_load_verify_metadata: 1
vfs.zfs.spa_load_verify_data: 1
vfs.zfs.recover: 0
vfs.zfs.deadman_synctime_ms: 1000000
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_enabled: 1
vfs.zfs.spa_asize_inflation: 24
vfs.zfs.txg.timeout: 5
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.cache.size: 0
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.trim_on_init: 0
vfs.zfs.vdev.mirror.rotating_inc: 0
vfs.zfs.vdev.mirror.rotating_seek_inc: 5
vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576
vfs.zfs.vdev.mirror.non_rotating_inc: 0
vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1
vfs.zfs.vdev.max_active: 1000
vfs.zfs.vdev.sync_read_min_active: 32
vfs.zfs.vdev.sync_read_max_active: 32
vfs.zfs.vdev.sync_write_min_active: 32
vfs.zfs.vdev.sync_write_max_active: 32
vfs.zfs.vdev.async_read_min_active: 32
vfs.zfs.vdev.async_read_max_active: 32
vfs.zfs.vdev.async_write_min_active: 32
vfs.zfs.vdev.async_write_max_active: 32
vfs.zfs.vdev.scrub_min_active: 1
vfs.zfs.vdev.scrub_max_active: 2
vfs.zfs.vdev.trim_min_active: 1
vfs.zfs.vdev.trim_max_active: 64
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.vdev.bio_delete_disable: 0
vfs.zfs.vdev.trim_max_bytes: 2147483648
vfs.zfs.vdev.trim_max_pending: 64
vfs.zfs.max_auto_ashift: 13
vfs.zfs.min_auto_ashift: 9
vfs.zfs.zil_replay_disable: 0
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zio.use_uma: 1
vfs.zfs.zio.exclude_metadata: 0
vfs.zfs.sync_pass_deferred_free: 2
vfs.zfs.sync_pass_dont_compress: 5
vfs.zfs.sync_pass_rewrite: 2
vfs.zfs.snapshot_list_prefetch: 0
vfs.zfs.super_owner: 0
vfs.zfs.debug: 0
vfs.zfs.version.ioctl: 4
vfs.zfs.version.acl: 1
vfs.zfs.version.spa: 5000
vfs.zfs.version.zpl: 5
vfs.zfs.vol.mode: 1
vfs.zfs.trim.enabled: 0
vfs.zfs.trim.txg_delay: 32
vfs.zfs.trim.timeout: 30
vfs.zfs.trim.max_interval: 1

And nvm:
ev.nvme.%parent:
dev.nvme.0.%desc: Generic NVMe Device
dev.nvme.0.%driver: nvme
dev.nvme.0.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3A.D08A
dev.nvme.0.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 
subdevice=0x370a class=0x010802
dev.nvme.0.%parent: pci4
dev.nvme.0.int_coal_time: 0
dev.nvme.0.int_coal_threshold: 0
dev.nvme.0.timeout_period: 30
dev.nvme.0.num_cmds: 811857
dev.nvme.0.num_intr_handler_calls: 485242
dev.nvme.0.reset_stats: 0
dev.nvme.0.adminq.num_entries: 128
dev.nvme.0.adminq.num_trackers: 16
dev.nvme.0.adminq.sq_head: 12
dev.nvme.0.adminq.sq_tail: 12
dev.nvme.0.adminq.cq_head: 8
dev.nvme.0.adminq.num_cmds: 12
dev.nvme.0.adminq.num_intr_handler_calls: 7
dev.nvme.0.adminq.dump_debug: 0
dev.nvme.0.ioq0.num_entries: 256
dev.nvme.0.ioq0.num_trackers: 128
dev.nvme.0.ioq0.sq_head: 69
dev.nvme.0.ioq0.sq_tail: 69
dev.nvme.0.ioq0.cq_head: 69
dev.nvme.0.ioq0.num_cmds: 811845
dev.nvme.0.ioq0.num_intr_handler_calls: 485235
dev.nvme.0.ioq0.dump_debug: 0
dev.nvme.1.%desc: Generic NVMe Device
dev.nvme.1.%driver: nvme
dev.nvme.1.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3B.H000
dev.nvme.1.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 
subdevice=0x370a class=0x010802
dev.nvme.1.%parent: pci5
dev.nvme.1.int_coal_time: 0
dev.nvme.1.int_coal_threshold: 0
dev.nvme.1.timeout_period: 30
dev.nvme.1.num_cmds: 167
dev.nvme.1.num_intr_handler_calls: 163
dev.nvme.1.reset_stats: 0
dev.nvme.1.adminq.num_entries: 128
dev.nvme.1.adminq.num_trackers: 16
dev.nvme.1.adminq.sq_head: 12
dev.nvme.1.adminq.sq_tail: 12
dev.nvme.1.adminq.cq_head: 8
dev.nvme.1.adminq.num_cmds: 12
dev.nvme.1.adminq.num_intr_handler_calls: 8
dev.nvme.1.adminq.dump_debug: 0
dev.nvme.1.ioq0.num_entries: 256
dev.nvme.1.ioq0.num_trackers: 128
dev.nvme.1.ioq0.sq_head: 155
dev.nvme.1.ioq0.sq_tail: 155
dev.nvme.1.ioq0.cq_head: 155
dev.nvme.1.ioq0.num_cmds: 155
dev.nvme.1.ioq0.num_intr_handler_calls: 155
dev.nvme.1.ioq0.dump_debug: 0

Best regards,
Vintila Mihai Alexandru

On 1/17/2015 12:13 AM, Barney Wolff wrote:
> I suspect Linux defaults to noatime - at least it does on my rpi.  I
> believe the FreeBSD default is the other way.  That may explain some
> of the difference.
>
> Also, did you use gnop to force the zpool to start on a 4k boundary?
> If not, and the zpool happens to be offset, that's another big hit.
> Same for ufs, especially if the disk has logical sectors of 512 but
> physical of 4096.  One can complain that FreeBSD should prevent, or
> at least warn about, this sort of foot-shooting.
>
> On Fri, Jan 16, 2015 at 10:21:07PM +0200, Mihai-Alexandru Vintila wrote:
>> @Barney Wolff it's a new pool with only changes recordsize=4k and
>> compression=lz4 . On linux test is on ext4 with default values. Penalty is
>> pretty high. Also there is a read penalty for read as well between ufs and
>> zfs. Even on nvmecontrol perftest you can see the read penalty it's not
>> normal to have same result for both write and read




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B999C6.2090909>