Date: Sat, 17 Jan 2015 01:07:50 +0200 From: Mihai Vintila <unixro@gmail.com> To: freebsd-stable@freebsd.org Subject: Re: Poor performance on Intel P3600 NVME driver Message-ID: <54B999C6.2090909@gmail.com> In-Reply-To: <20150116221344.GA72201@pit.databus.com> References: <54B7F769.40605@gmail.com> <20150115175927.GA19071@zxy.spb.ru> <54B8C7E9.3030602@gmail.com> <CAKAYmM%2BEpvOYfFnF1i02aoTC2vfVT%2Bq=6XvCPBnYg-mRQAVD1A@mail.gmail.com> <CAORpBLUYzHw%2BRRccRpkcxYrnjnckRMLyAc86J9UPpirRSeWxqQ@mail.gmail.com> <20150116221344.GA72201@pit.databus.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I've remade the test with atime=off. Drive has 512b physical, but I've created it with 4k gnop anyway. Results are similar with atime Processor cache line size set to 32 bytes. File stride size set to 17 * record size. random random bk wd record stride KB reclen write rewrite read reread read write re ad rewrite read fwrite frewrite fread freread 1048576 4 74427 0 101744 0 93529 47925 1048576 8 39072 0 64693 0 61104 25452 I've also tried to increase vfs.zfs.vdev.aggregation_limit and ended up with a crash (screenshot attached) I'm attaching zfs tunables: sysctl -a|grep vfs.zfs vfs.zfs.arc_max: 34359738368 vfs.zfs.arc_min: 4294967296 vfs.zfs.arc_average_blocksize: 8192 vfs.zfs.arc_meta_used: 5732232 vfs.zfs.arc_meta_limit: 8589934592 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_noprefetch: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_norw: 1 vfs.zfs.anon_size: 32768 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_data_lsize: 0 vfs.zfs.mru_size: 17841664 vfs.zfs.mru_metadata_lsize: 858624 vfs.zfs.mru_data_lsize: 13968384 vfs.zfs.mru_ghost_size: 0 vfs.zfs.mru_ghost_metadata_lsize: 0 vfs.zfs.mru_ghost_data_lsize: 0 vfs.zfs.mfu_size: 4574208 vfs.zfs.mfu_metadata_lsize: 465408 vfs.zfs.mfu_data_lsize: 4051456 vfs.zfs.mfu_ghost_size: 0 vfs.zfs.mfu_ghost_metadata_lsize: 0 vfs.zfs.mfu_ghost_data_lsize: 0 vfs.zfs.l2c_only_size: 0 vfs.zfs.dedup.prefetch: 1 vfs.zfs.nopwrite_enabled: 1 vfs.zfs.mdcomp_disable: 0 vfs.zfs.dirty_data_max: 4294967296 vfs.zfs.dirty_data_max_max: 4294967296 vfs.zfs.dirty_data_max_percent: 10 vfs.zfs.dirty_data_sync: 67108864 vfs.zfs.delay_min_dirty_percent: 60 vfs.zfs.delay_scale: 500000 vfs.zfs.prefetch_disable: 1 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.top_maxinflight: 32 vfs.zfs.resilver_delay: 2 vfs.zfs.scrub_delay: 4 vfs.zfs.scan_idle: 50 vfs.zfs.scan_min_time_ms: 1000 vfs.zfs.free_min_time_ms: 1000 vfs.zfs.resilver_min_time_ms: 3000 vfs.zfs.no_scrub_io: 0 vfs.zfs.no_scrub_prefetch: 0 vfs.zfs.metaslab.gang_bang: 131073 vfs.zfs.metaslab.fragmentation_threshold: 70 vfs.zfs.metaslab.debug_load: 0 vfs.zfs.metaslab.debug_unload: 0 vfs.zfs.metaslab.df_alloc_threshold: 131072 vfs.zfs.metaslab.df_free_pct: 4 vfs.zfs.metaslab.min_alloc_size: 10485760 vfs.zfs.metaslab.load_pct: 50 vfs.zfs.metaslab.unload_delay: 8 vfs.zfs.metaslab.preload_limit: 3 vfs.zfs.metaslab.preload_enabled: 1 vfs.zfs.metaslab.fragmentation_factor_enabled: 1 vfs.zfs.metaslab.lba_weighting_enabled: 1 vfs.zfs.metaslab.bias_enabled: 1 vfs.zfs.condense_pct: 200 vfs.zfs.mg_noalloc_threshold: 0 vfs.zfs.mg_fragmentation_threshold: 85 vfs.zfs.check_hostid: 1 vfs.zfs.spa_load_verify_maxinflight: 10000 vfs.zfs.spa_load_verify_metadata: 1 vfs.zfs.spa_load_verify_data: 1 vfs.zfs.recover: 0 vfs.zfs.deadman_synctime_ms: 1000000 vfs.zfs.deadman_checktime_ms: 5000 vfs.zfs.deadman_enabled: 1 vfs.zfs.spa_asize_inflation: 24 vfs.zfs.txg.timeout: 5 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.cache.size: 0 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.trim_on_init: 0 vfs.zfs.vdev.mirror.rotating_inc: 0 vfs.zfs.vdev.mirror.rotating_seek_inc: 5 vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 vfs.zfs.vdev.mirror.non_rotating_inc: 0 vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 vfs.zfs.vdev.max_active: 1000 vfs.zfs.vdev.sync_read_min_active: 32 vfs.zfs.vdev.sync_read_max_active: 32 vfs.zfs.vdev.sync_write_min_active: 32 vfs.zfs.vdev.sync_write_max_active: 32 vfs.zfs.vdev.async_read_min_active: 32 vfs.zfs.vdev.async_read_max_active: 32 vfs.zfs.vdev.async_write_min_active: 32 vfs.zfs.vdev.async_write_max_active: 32 vfs.zfs.vdev.scrub_min_active: 1 vfs.zfs.vdev.scrub_max_active: 2 vfs.zfs.vdev.trim_min_active: 1 vfs.zfs.vdev.trim_max_active: 64 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.read_gap_limit: 32768 vfs.zfs.vdev.write_gap_limit: 4096 vfs.zfs.vdev.bio_flush_disable: 0 vfs.zfs.vdev.bio_delete_disable: 0 vfs.zfs.vdev.trim_max_bytes: 2147483648 vfs.zfs.vdev.trim_max_pending: 64 vfs.zfs.max_auto_ashift: 13 vfs.zfs.min_auto_ashift: 9 vfs.zfs.zil_replay_disable: 0 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zio.use_uma: 1 vfs.zfs.zio.exclude_metadata: 0 vfs.zfs.sync_pass_deferred_free: 2 vfs.zfs.sync_pass_dont_compress: 5 vfs.zfs.sync_pass_rewrite: 2 vfs.zfs.snapshot_list_prefetch: 0 vfs.zfs.super_owner: 0 vfs.zfs.debug: 0 vfs.zfs.version.ioctl: 4 vfs.zfs.version.acl: 1 vfs.zfs.version.spa: 5000 vfs.zfs.version.zpl: 5 vfs.zfs.vol.mode: 1 vfs.zfs.trim.enabled: 0 vfs.zfs.trim.txg_delay: 32 vfs.zfs.trim.timeout: 30 vfs.zfs.trim.max_interval: 1 And nvm: ev.nvme.%parent: dev.nvme.0.%desc: Generic NVMe Device dev.nvme.0.%driver: nvme dev.nvme.0.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3A.D08A dev.nvme.0.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 subdevice=0x370a class=0x010802 dev.nvme.0.%parent: pci4 dev.nvme.0.int_coal_time: 0 dev.nvme.0.int_coal_threshold: 0 dev.nvme.0.timeout_period: 30 dev.nvme.0.num_cmds: 811857 dev.nvme.0.num_intr_handler_calls: 485242 dev.nvme.0.reset_stats: 0 dev.nvme.0.adminq.num_entries: 128 dev.nvme.0.adminq.num_trackers: 16 dev.nvme.0.adminq.sq_head: 12 dev.nvme.0.adminq.sq_tail: 12 dev.nvme.0.adminq.cq_head: 8 dev.nvme.0.adminq.num_cmds: 12 dev.nvme.0.adminq.num_intr_handler_calls: 7 dev.nvme.0.adminq.dump_debug: 0 dev.nvme.0.ioq0.num_entries: 256 dev.nvme.0.ioq0.num_trackers: 128 dev.nvme.0.ioq0.sq_head: 69 dev.nvme.0.ioq0.sq_tail: 69 dev.nvme.0.ioq0.cq_head: 69 dev.nvme.0.ioq0.num_cmds: 811845 dev.nvme.0.ioq0.num_intr_handler_calls: 485235 dev.nvme.0.ioq0.dump_debug: 0 dev.nvme.1.%desc: Generic NVMe Device dev.nvme.1.%driver: nvme dev.nvme.1.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3B.H000 dev.nvme.1.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 subdevice=0x370a class=0x010802 dev.nvme.1.%parent: pci5 dev.nvme.1.int_coal_time: 0 dev.nvme.1.int_coal_threshold: 0 dev.nvme.1.timeout_period: 30 dev.nvme.1.num_cmds: 167 dev.nvme.1.num_intr_handler_calls: 163 dev.nvme.1.reset_stats: 0 dev.nvme.1.adminq.num_entries: 128 dev.nvme.1.adminq.num_trackers: 16 dev.nvme.1.adminq.sq_head: 12 dev.nvme.1.adminq.sq_tail: 12 dev.nvme.1.adminq.cq_head: 8 dev.nvme.1.adminq.num_cmds: 12 dev.nvme.1.adminq.num_intr_handler_calls: 8 dev.nvme.1.adminq.dump_debug: 0 dev.nvme.1.ioq0.num_entries: 256 dev.nvme.1.ioq0.num_trackers: 128 dev.nvme.1.ioq0.sq_head: 155 dev.nvme.1.ioq0.sq_tail: 155 dev.nvme.1.ioq0.cq_head: 155 dev.nvme.1.ioq0.num_cmds: 155 dev.nvme.1.ioq0.num_intr_handler_calls: 155 dev.nvme.1.ioq0.dump_debug: 0 Best regards, Vintila Mihai Alexandru On 1/17/2015 12:13 AM, Barney Wolff wrote: > I suspect Linux defaults to noatime - at least it does on my rpi. I > believe the FreeBSD default is the other way. That may explain some > of the difference. > > Also, did you use gnop to force the zpool to start on a 4k boundary? > If not, and the zpool happens to be offset, that's another big hit. > Same for ufs, especially if the disk has logical sectors of 512 but > physical of 4096. One can complain that FreeBSD should prevent, or > at least warn about, this sort of foot-shooting. > > On Fri, Jan 16, 2015 at 10:21:07PM +0200, Mihai-Alexandru Vintila wrote: >> @Barney Wolff it's a new pool with only changes recordsize=4k and >> compression=lz4 . On linux test is on ext4 with default values. Penalty is >> pretty high. Also there is a read penalty for read as well between ufs and >> zfs. Even on nvmecontrol perftest you can see the read penalty it's not >> normal to have same result for both write and read
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54B999C6.2090909>