From owner-freebsd-fs@freebsd.org Tue Dec 6 16:20:09 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59FF3C6AE31 for ; Tue, 6 Dec 2016 16:20:09 +0000 (UTC) (envelope-from lexa@lexa.ru) Received: from mx3.lexa.ru (mx3.lexa.ru [198.27.68.102]) by mx1.freebsd.org (Postfix) with ESMTP id 3CAA7E4E for ; Tue, 6 Dec 2016 16:20:08 +0000 (UTC) (envelope-from lexa@lexa.ru) Received: by mx3.lexa.ru (Postfix, from userid 66) id 5B11B224A5D; Tue, 6 Dec 2016 11:20:08 -0500 (EST) Received: from [193.124.130.130] (unknown [193.124.130.130]) by home-gw.lexa.ru (Postfix) with ESMTP id 6077FB0583 for ; Tue, 6 Dec 2016 19:17:40 +0300 (MSK) Subject: Re: 11-STABLE vs 11.0-RELENG test To: freebsd-fs@freebsd.org References: <477264f9-9263-6df8-4566-2911e84f2ad8@lexa.ru> From: Alex Tutubalin Message-ID: Date: Tue, 6 Dec 2016 19:17:40 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <477264f9-9263-6df8-4566-2911e84f2ad8@lexa.ru> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 16:20:09 -0000 Followup, same box as in 1st message, but with different HBA and 5 HDD: Summary: 1) read speed depends on 'on-disk' data, not by code used while read. 2) FreeBSD 11.0-releng (svn up today) creates 'fast' big files (420+ Mb/s read speed) 3) FreeBSD 11-STABLE creates 'slow' big files. (~200 Mb/s) 4) zfs send slow-dataset | zfs recv (under 11.0-releng) creates 'intermediate' files (320Mb/s) 5) file copy slow-file ... (under 11.0-releng) creates fast copy (400+Mb/s) So, ZFS write code in 11-STABLE looks broken. Here detailed log of my tests: Box: Intel i5-2400, 16Gb of RAM, Adaptec 5805, 5x4Tb drives (HGST NAS drives) in JBOD, merged in RAIDZ2, enough free space: $ zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT zdata 18,1T 5,11T 13,0T - 9% 28% 1.00x ONLINE - tunables in loader.conf: vm.kmem_size="15G" vfs.zfs.arc_max="14G" Boot 11.0-releng: $ uname -a FreeBSD iscsi.lexa.ru 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #7 r309605: Tue Dec 6 12:21:10 MSK 2016 lexa@iscsi.lexa.ru:/usr/obj/usr/src/sys/GENERIC amd64 create separate dataset for test: # zfs create -o recordsize=1m -o mountpoint=/test-11releng zdata/11releng write 100gb file: # dd if=/dev/zero of=/test-11releng/100g-created-on-11releng bs=1m count=100k 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 257.686751 secs (416684917 bytes/sec) reboot (still 11.0-releng) and read test file again: Cold read: $ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 250.526067 secs (428594850 bytes/sec) read again to see how cached metadata helps (if any): $ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 243.992662 secs (440071358 bytes/sec) Not much difference between cold and cached metadata, OK. Now boot 11.0-STABLE on same box: $ uname -a FreeBSD iscsi.lexa.ru 11.0-STABLE FreeBSD 11.0-STABLE #0 r309616: Tue Dec 6 16:15:48 MSK 2016 lexa@iscsi.lexa.ru:/usr/obj/usr/src11/sys/GENERIC amd64 Cold read of same data file: $ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 250.640950 secs (428398402 bytes/sec) Still OK: data created on 11.0-releng reads at full speed under 11-stable Create 11-stable own dataset: # zfs create -o recordsize=1m -o mountpoint=/test-11stable zdata/11stable And write similar 100g file: # dd if=/dev/zero of=/test-11stable/100g-created-on-11stable bs=1m count=100k 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 246.409501 secs (435755041 bytes/sec) Reboot (again into 11-stable) and read back created file $ dd if=/test-11stable/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 524.488506 secs (204721707 bytes/sec) Hmm, about 1/2 of expected speed. Let's read fist test file, created on 11-releng: $ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 250.591102 secs (428483620 bytes/sec) Speed is still here. So, problem is related to on-disk data written by 11-stable. Let's try to recover 'bad data' into good one (assuming we're downgraded to 11-releng and want to get speed back) Boot into 11-releng: $ uname -a FreeBSD iscsi.lexa.ru 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #7 r309605: Tue Dec 6 12:21:10 MSK 2016 lexa@iscsi.lexa.ru:/usr/obj/usr/src/sys/GENERIC amd64 Try zfs send|recv: # zfs snapshot zdata/11stable@1 # zfs send zdata/11stable@1 | zfs recv zdata/11stable-sendrecv-on11releng # slow because same dataset used # zfs destroy zdata/11stable-sendrecv-on11releng@1 # zfs destroy zdata/11stable@1 # zfs set mountpoint=/test-11stable-sendrecv-on11releng zdata/11stable-sendrecv-on11releng Reboot and test read speed: $ dd if=/test-11stable-sendrecv-on11releng/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 336.193935 secs (319381676 bytes/sec) INTERMEDIATE speed :( Better than 'bad datafile', worse than 'good datafile' Try file copy to another dataset: # zfs create -o recordsize=1m -o mountpoint=/test-11stable-copy-on11releng zdata/11stable-copy-on11releng # cd /test-11stable # tar cf - . | (cd /test-11stable-copy-on11releng/; tar xvf - ) Reboot again and test speed another time: $ dd if=/test-11stable-copy-on11releng/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 261.484845 secs (410632526 bytes/sec) OK, per-file copy solves the problem (as expected). Finally, let's check that all testfiles still reads at the expected speed 'good' one, created on 11.0-releng: ]$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 249.435476 secs (430468770 bytes/sec) 'bad' one, created on 11-stable: $ dd if=/test-11stable/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 525.558477 secs (204304920 bytes/sec) and 'intermediate', created by zfs send bad-dataset | zfs recv: $ dd if=/test-11stable-sendrecv-on11releng/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 336.570705 secs (319024148 bytes/sec) Boot into 11-stable again and test read speed too: $ uname -a FreeBSD iscsi.lexa.ru 11.0-STABLE FreeBSD 11.0-STABLE #0 r309616: Tue Dec 6 16:15:48 MSK 2016 lexa@iscsi.lexa.ru:/usr/obj/usr/src11/sys/GENERIC amd64 Good: ]$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 256.322824 secs (418902151 bytes/sec) Bad: $ dd if=/test-11stable/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 529.918422 secs (202623985 bytes/sec) Copied by tar (from bad to some intermediate state): $ dd if=/test-11stable-copy-on11releng/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 260.234944 secs (412604783 bytes/sec) Copied by zfs send | zfs recv: $ dd if=/test-11stable-sendrecv-on11releng/100g-created-on-11stable of=/dev/null bs=1m 102400+0 records in 102400+0 records out 107374182400 bytes transferred in 336.240919 secs (319337047 bytes/sec) (virtually) same speed as under 11.0-releng, so read speed is limited by on-disk structures. -- Alex Tutubalin Web: http://blog.lexa.ru mailto:lexa@lexa.ru