From owner-freebsd-stable@FreeBSD.ORG Mon Jan 19 16:22:19 2015 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5D9A1916 for ; Mon, 19 Jan 2015 16:22:19 +0000 (UTC) Received: from mail-qc0-x230.google.com (mail-qc0-x230.google.com [IPv6:2607:f8b0:400d:c01::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0B3D3316 for ; Mon, 19 Jan 2015 16:22:19 +0000 (UTC) Received: by mail-qc0-f176.google.com with SMTP id c9so5168526qcz.7 for ; Mon, 19 Jan 2015 08:22:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=y+RWCZ9PYia0qYPPHfmegci3iTMUgjGCxIx0NBlOi5s=; b=qsBm//6yx8NqT86YryXXKBVoHeS3vXlEyhl/lWrZDwLsbxraN5XYkFKFM+MWsAdT+z OMbsJZBE/YJ/EW/ia4dM3T6g11baWHJmI/I+oKfEDm9ldcvUu2cEcM3f0FM5rfFraRcK MDy/VF5x10/1A6NQLUAPROvJHXGEIYSwafhC85hpQp+KV219VBnRryA7Dga3Vn/O0+eO M9b9RkHu4TpjF5eZCwqJz+l+pPQfZKxHQubicp4BQkC2dK098cK5hrQ9vrsPU8gRmfxk AA0hlkAQ9AoxyJDEhABvvBD6ImQaMwA990XsrAvUcqT99J2mQB6rPpyxaprUlzOcOsoY wMgw== MIME-Version: 1.0 X-Received: by 10.140.83.100 with SMTP id i91mr45489256qgd.97.1421684538142; Mon, 19 Jan 2015 08:22:18 -0800 (PST) Received: by 10.140.20.194 with HTTP; Mon, 19 Jan 2015 08:22:17 -0800 (PST) In-Reply-To: References: <54B7F769.40605@gmail.com> <20150115175927.GA19071@zxy.spb.ru> <54B8C7E9.3030602@gmail.com> <20150116221344.GA72201@pit.databus.com> <54B999C6.2090909@gmail.com> <54B99DA1.2000001@multiplay.co.uk> Date: Mon, 19 Jan 2015 09:22:17 -0700 Message-ID: Subject: Re: Poor performance on Intel P3600 NVME driver From: Jim Harris To: Oliver Pinter Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.18-1 Cc: Mihai-Alexandru Vintila , "freebsd-stable@freebsd.org" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Jan 2015 16:22:19 -0000 On Sat, Jan 17, 2015 at 6:29 AM, Oliver Pinter < oliver.pinter@hardenedbsd.org> wrote: > Added Jim to thread, as he is the nvme driver's author. > Thanks Oliver. Hi Mihai-Alexandru, Can you start by sending me the following? pciconf -lc nvme0 pciconf -lc nvme1 nvmecontrol identify nvme0 nvmecontrol identify nvme0ns1 nvmecontrol identify nvme1 nvmecontrol identify nvme1ns1 nvmecontrol logpage -p 1 nvme0 nvmecontrol logpage -p 1 nvme1 nvmecontrol logpage -p 2 nvme0 nvmecontrol logpage -p 2 nvme1 I see mention of a FW update, but it wasn't clear if you have run nvmecontrol perftest after the FW update? If not, could you run those same nvmecontrol perftest runs again? Thanks, -Jim > On Sat, Jan 17, 2015 at 10:26 AM, Mihai-Alexandru Vintila > wrote: > > Trim is already disabled as you can see in previous mail > > > > Best regards, > > Mihai Vintila > > > >> On 17 ian. 2015, at 01:24, Steven Hartland > wrote: > >> > >> Any difference if you disable trim? > >> > >>> On 16/01/2015 23:07, Mihai Vintila wrote: > >>> I've remade the test with atime=off. Drive has 512b physical, but I've > created it with 4k gnop anyway. Results are similar with atime > >>> Processor cache line size set to 32 bytes. > >>> File stride size set to 17 * record size. > >>> random random bk wd record stride > >>> KB reclen write rewrite read reread read write > re ad rewrite read fwrite frewrite fread freread > >>> 1048576 4 74427 0 101744 0 93529 47925 > >>> 1048576 8 39072 0 64693 0 61104 25452 > >>> > >>> I've also tried to increase vfs.zfs.vdev.aggregation_limit and ended > up with a crash (screenshot attached) > >>> > >>> I'm attaching zfs tunables: > >>> sysctl -a|grep vfs.zfs > >>> vfs.zfs.arc_max: 34359738368 > >>> vfs.zfs.arc_min: 4294967296 > >>> vfs.zfs.arc_average_blocksize: 8192 > >>> vfs.zfs.arc_meta_used: 5732232 > >>> vfs.zfs.arc_meta_limit: 8589934592 > >>> vfs.zfs.l2arc_write_max: 8388608 > >>> vfs.zfs.l2arc_write_boost: 8388608 > >>> vfs.zfs.l2arc_headroom: 2 > >>> vfs.zfs.l2arc_feed_secs: 1 > >>> vfs.zfs.l2arc_feed_min_ms: 200 > >>> vfs.zfs.l2arc_noprefetch: 1 > >>> vfs.zfs.l2arc_feed_again: 1 > >>> vfs.zfs.l2arc_norw: 1 > >>> vfs.zfs.anon_size: 32768 > >>> vfs.zfs.anon_metadata_lsize: 0 > >>> vfs.zfs.anon_data_lsize: 0 > >>> vfs.zfs.mru_size: 17841664 > >>> vfs.zfs.mru_metadata_lsize: 858624 > >>> vfs.zfs.mru_data_lsize: 13968384 > >>> vfs.zfs.mru_ghost_size: 0 > >>> vfs.zfs.mru_ghost_metadata_lsize: 0 > >>> vfs.zfs.mru_ghost_data_lsize: 0 > >>> vfs.zfs.mfu_size: 4574208 > >>> vfs.zfs.mfu_metadata_lsize: 465408 > >>> vfs.zfs.mfu_data_lsize: 4051456 > >>> vfs.zfs.mfu_ghost_size: 0 > >>> vfs.zfs.mfu_ghost_metadata_lsize: 0 > >>> vfs.zfs.mfu_ghost_data_lsize: 0 > >>> vfs.zfs.l2c_only_size: 0 > >>> vfs.zfs.dedup.prefetch: 1 > >>> vfs.zfs.nopwrite_enabled: 1 > >>> vfs.zfs.mdcomp_disable: 0 > >>> vfs.zfs.dirty_data_max: 4294967296 > >>> vfs.zfs.dirty_data_max_max: 4294967296 > >>> vfs.zfs.dirty_data_max_percent: 10 > >>> vfs.zfs.dirty_data_sync: 67108864 > >>> vfs.zfs.delay_min_dirty_percent: 60 > >>> vfs.zfs.delay_scale: 500000 > >>> vfs.zfs.prefetch_disable: 1 > >>> vfs.zfs.zfetch.max_streams: 8 > >>> vfs.zfs.zfetch.min_sec_reap: 2 > >>> vfs.zfs.zfetch.block_cap: 256 > >>> vfs.zfs.zfetch.array_rd_sz: 1048576 > >>> vfs.zfs.top_maxinflight: 32 > >>> vfs.zfs.resilver_delay: 2 > >>> vfs.zfs.scrub_delay: 4 > >>> vfs.zfs.scan_idle: 50 > >>> vfs.zfs.scan_min_time_ms: 1000 > >>> vfs.zfs.free_min_time_ms: 1000 > >>> vfs.zfs.resilver_min_time_ms: 3000 > >>> vfs.zfs.no_scrub_io: 0 > >>> vfs.zfs.no_scrub_prefetch: 0 > >>> vfs.zfs.metaslab.gang_bang: 131073 > >>> vfs.zfs.metaslab.fragmentation_threshold: 70 > >>> vfs.zfs.metaslab.debug_load: 0 > >>> vfs.zfs.metaslab.debug_unload: 0 > >>> vfs.zfs.metaslab.df_alloc_threshold: 131072 > >>> vfs.zfs.metaslab.df_free_pct: 4 > >>> vfs.zfs.metaslab.min_alloc_size: 10485760 > >>> vfs.zfs.metaslab.load_pct: 50 > >>> vfs.zfs.metaslab.unload_delay: 8 > >>> vfs.zfs.metaslab.preload_limit: 3 > >>> vfs.zfs.metaslab.preload_enabled: 1 > >>> vfs.zfs.metaslab.fragmentation_factor_enabled: 1 > >>> vfs.zfs.metaslab.lba_weighting_enabled: 1 > >>> vfs.zfs.metaslab.bias_enabled: 1 > >>> vfs.zfs.condense_pct: 200 > >>> vfs.zfs.mg_noalloc_threshold: 0 > >>> vfs.zfs.mg_fragmentation_threshold: 85 > >>> vfs.zfs.check_hostid: 1 > >>> vfs.zfs.spa_load_verify_maxinflight: 10000 > >>> vfs.zfs.spa_load_verify_metadata: 1 > >>> vfs.zfs.spa_load_verify_data: 1 > >>> vfs.zfs.recover: 0 > >>> vfs.zfs.deadman_synctime_ms: 1000000 > >>> vfs.zfs.deadman_checktime_ms: 5000 > >>> vfs.zfs.deadman_enabled: 1 > >>> vfs.zfs.spa_asize_inflation: 24 > >>> vfs.zfs.txg.timeout: 5 > >>> vfs.zfs.vdev.cache.max: 16384 > >>> vfs.zfs.vdev.cache.size: 0 > >>> vfs.zfs.vdev.cache.bshift: 16 > >>> vfs.zfs.vdev.trim_on_init: 0 > >>> vfs.zfs.vdev.mirror.rotating_inc: 0 > >>> vfs.zfs.vdev.mirror.rotating_seek_inc: 5 > >>> vfs.zfs.vdev.mirror.rotating_seek_offset: 1048576 > >>> vfs.zfs.vdev.mirror.non_rotating_inc: 0 > >>> vfs.zfs.vdev.mirror.non_rotating_seek_inc: 1 > >>> vfs.zfs.vdev.max_active: 1000 > >>> vfs.zfs.vdev.sync_read_min_active: 32 > >>> vfs.zfs.vdev.sync_read_max_active: 32 > >>> vfs.zfs.vdev.sync_write_min_active: 32 > >>> vfs.zfs.vdev.sync_write_max_active: 32 > >>> vfs.zfs.vdev.async_read_min_active: 32 > >>> vfs.zfs.vdev.async_read_max_active: 32 > >>> vfs.zfs.vdev.async_write_min_active: 32 > >>> vfs.zfs.vdev.async_write_max_active: 32 > >>> vfs.zfs.vdev.scrub_min_active: 1 > >>> vfs.zfs.vdev.scrub_max_active: 2 > >>> vfs.zfs.vdev.trim_min_active: 1 > >>> vfs.zfs.vdev.trim_max_active: 64 > >>> vfs.zfs.vdev.aggregation_limit: 131072 > >>> vfs.zfs.vdev.read_gap_limit: 32768 > >>> vfs.zfs.vdev.write_gap_limit: 4096 > >>> vfs.zfs.vdev.bio_flush_disable: 0 > >>> vfs.zfs.vdev.bio_delete_disable: 0 > >>> vfs.zfs.vdev.trim_max_bytes: 2147483648 > >>> vfs.zfs.vdev.trim_max_pending: 64 > >>> vfs.zfs.max_auto_ashift: 13 > >>> vfs.zfs.min_auto_ashift: 9 > >>> vfs.zfs.zil_replay_disable: 0 > >>> vfs.zfs.cache_flush_disable: 0 > >>> vfs.zfs.zio.use_uma: 1 > >>> vfs.zfs.zio.exclude_metadata: 0 > >>> vfs.zfs.sync_pass_deferred_free: 2 > >>> vfs.zfs.sync_pass_dont_compress: 5 > >>> vfs.zfs.sync_pass_rewrite: 2 > >>> vfs.zfs.snapshot_list_prefetch: 0 > >>> vfs.zfs.super_owner: 0 > >>> vfs.zfs.debug: 0 > >>> vfs.zfs.version.ioctl: 4 > >>> vfs.zfs.version.acl: 1 > >>> vfs.zfs.version.spa: 5000 > >>> vfs.zfs.version.zpl: 5 > >>> vfs.zfs.vol.mode: 1 > >>> vfs.zfs.trim.enabled: 0 > >>> vfs.zfs.trim.txg_delay: 32 > >>> vfs.zfs.trim.timeout: 30 > >>> vfs.zfs.trim.max_interval: 1 > >>> > >>> And nvm: > >>> ev.nvme.%parent: > >>> dev.nvme.0.%desc: Generic NVMe Device > >>> dev.nvme.0.%driver: nvme > >>> dev.nvme.0.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3A.D08A > >>> dev.nvme.0.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 > subdevice=0x370a class=0x010802 > >>> dev.nvme.0.%parent: pci4 > >>> dev.nvme.0.int_coal_time: 0 > >>> dev.nvme.0.int_coal_threshold: 0 > >>> dev.nvme.0.timeout_period: 30 > >>> dev.nvme.0.num_cmds: 811857 > >>> dev.nvme.0.num_intr_handler_calls: 485242 > >>> dev.nvme.0.reset_stats: 0 > >>> dev.nvme.0.adminq.num_entries: 128 > >>> dev.nvme.0.adminq.num_trackers: 16 > >>> dev.nvme.0.adminq.sq_head: 12 > >>> dev.nvme.0.adminq.sq_tail: 12 > >>> dev.nvme.0.adminq.cq_head: 8 > >>> dev.nvme.0.adminq.num_cmds: 12 > >>> dev.nvme.0.adminq.num_intr_handler_calls: 7 > >>> dev.nvme.0.adminq.dump_debug: 0 > >>> dev.nvme.0.ioq0.num_entries: 256 > >>> dev.nvme.0.ioq0.num_trackers: 128 > >>> dev.nvme.0.ioq0.sq_head: 69 > >>> dev.nvme.0.ioq0.sq_tail: 69 > >>> dev.nvme.0.ioq0.cq_head: 69 > >>> dev.nvme.0.ioq0.num_cmds: 811845 > >>> dev.nvme.0.ioq0.num_intr_handler_calls: 485235 > >>> dev.nvme.0.ioq0.dump_debug: 0 > >>> dev.nvme.1.%desc: Generic NVMe Device > >>> dev.nvme.1.%driver: nvme > >>> dev.nvme.1.%location: slot=0 function=0 handle=\_SB_.PCI0.BR3B.H000 > >>> dev.nvme.1.%pnpinfo: vendor=0x8086 device=0x0953 subvendor=0x8086 > subdevice=0x370a class=0x010802 > >>> dev.nvme.1.%parent: pci5 > >>> dev.nvme.1.int_coal_time: 0 > >>> dev.nvme.1.int_coal_threshold: 0 > >>> dev.nvme.1.timeout_period: 30 > >>> dev.nvme.1.num_cmds: 167 > >>> dev.nvme.1.num_intr_handler_calls: 163 > >>> dev.nvme.1.reset_stats: 0 > >>> dev.nvme.1.adminq.num_entries: 128 > >>> dev.nvme.1.adminq.num_trackers: 16 > >>> dev.nvme.1.adminq.sq_head: 12 > >>> dev.nvme.1.adminq.sq_tail: 12 > >>> dev.nvme.1.adminq.cq_head: 8 > >>> dev.nvme.1.adminq.num_cmds: 12 > >>> dev.nvme.1.adminq.num_intr_handler_calls: 8 > >>> dev.nvme.1.adminq.dump_debug: 0 > >>> dev.nvme.1.ioq0.num_entries: 256 > >>> dev.nvme.1.ioq0.num_trackers: 128 > >>> dev.nvme.1.ioq0.sq_head: 155 > >>> dev.nvme.1.ioq0.sq_tail: 155 > >>> dev.nvme.1.ioq0.cq_head: 155 > >>> dev.nvme.1.ioq0.num_cmds: 155 > >>> dev.nvme.1.ioq0.num_intr_handler_calls: 155 > >>> dev.nvme.1.ioq0.dump_debug: 0 > >>> > >>> Best regards, > >>> Vintila Mihai Alexandru > >>> > >>>> On 1/17/2015 12:13 AM, Barney Wolff wrote: > >>>> I suspect Linux defaults to noatime - at least it does on my rpi. I > >>>> believe the FreeBSD default is the other way. That may explain some > >>>> of the difference. > >>>> > >>>> Also, did you use gnop to force the zpool to start on a 4k boundary? > >>>> If not, and the zpool happens to be offset, that's another big hit. > >>>> Same for ufs, especially if the disk has logical sectors of 512 but > >>>> physical of 4096. One can complain that FreeBSD should prevent, or > >>>> at least warn about, this sort of foot-shooting. > >>>> > >>>>> On Fri, Jan 16, 2015 at 10:21:07PM +0200, Mihai-Alexandru Vintila > wrote: > >>>>> @Barney Wolff it's a new pool with only changes recordsize=4k and > >>>>> compression=lz4 . On linux test is on ext4 with default values. > Penalty is > >>>>> pretty high. Also there is a read penalty for read as well between > ufs and > >>>>> zfs. Even on nvmecontrol perftest you can see the read penalty it's > not > >>>>> normal to have same result for both write and read > >>> > >>> _______________________________________________ > >>> freebsd-stable@freebsd.org mailing list > >>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable > >>> To unsubscribe, send any mail to " > freebsd-stable-unsubscribe@freebsd.org" > >> > >> _______________________________________________ > >> freebsd-stable@freebsd.org mailing list > >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable > >> To unsubscribe, send any mail to " > freebsd-stable-unsubscribe@freebsd.org" > > _______________________________________________ > > freebsd-stable@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org > " >