Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 6 Dec 2016 19:17:40 +0300
From:      Alex Tutubalin <lexa@lexa.ru>
To:        freebsd-fs@freebsd.org
Subject:   Re: 11-STABLE vs 11.0-RELENG test
Message-ID:  <a10b10ed-9200-26f3-5ca4-976a13922dee@lexa.ru>
In-Reply-To: <477264f9-9263-6df8-4566-2911e84f2ad8@lexa.ru>
References:  <477264f9-9263-6df8-4566-2911e84f2ad8@lexa.ru>

next in thread | previous in thread | raw e-mail | index | archive | help
Followup, same box as in 1st message, but with different HBA and 5 HDD:

Summary:
  1) read speed depends on 'on-disk' data, not by code used while read.
  2) FreeBSD 11.0-releng (svn up today) creates 'fast' big files (420+ 
Mb/s read speed)
  3)  FreeBSD 11-STABLE creates 'slow' big files. (~200  Mb/s)
  4) zfs send slow-dataset | zfs recv   (under 11.0-releng) creates 
'intermediate' files (320Mb/s)
  5) file copy slow-file ... (under 11.0-releng) creates fast copy 
(400+Mb/s)

So, ZFS write code in 11-STABLE looks broken.

Here detailed log of my tests:

Box: Intel i5-2400, 16Gb of RAM, Adaptec 5805, 5x4Tb drives (HGST NAS 
drives) in JBOD, merged in RAIDZ2, enough free space:
$ zpool list
NAME      SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH ALTROOT
zdata    18,1T  5,11T  13,0T         -     9%    28%  1.00x  ONLINE -

tunables in loader.conf:
vm.kmem_size="15G"
vfs.zfs.arc_max="14G"

Boot 11.0-releng:

$ uname -a
FreeBSD iscsi.lexa.ru 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #7 
r309605: Tue Dec  6 12:21:10 MSK 2016 
lexa@iscsi.lexa.ru:/usr/obj/usr/src/sys/GENERIC  amd64

create separate dataset for test:

# zfs create -o recordsize=1m  -o mountpoint=/test-11releng zdata/11releng

write 100gb file:

# dd if=/dev/zero of=/test-11releng/100g-created-on-11releng bs=1m 
count=100k
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 257.686751 secs (416684917 bytes/sec)

reboot (still 11.0-releng) and read test file again:

Cold read:
$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 250.526067 secs (428594850 bytes/sec)

read again to see how cached metadata helps (if any):
$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 243.992662 secs (440071358 bytes/sec)

Not much difference between cold and cached metadata, OK.

Now boot 11.0-STABLE on same box:
$ uname -a
FreeBSD iscsi.lexa.ru 11.0-STABLE FreeBSD 11.0-STABLE #0 r309616: Tue 
Dec  6 16:15:48 MSK 2016 
lexa@iscsi.lexa.ru:/usr/obj/usr/src11/sys/GENERIC  amd64

Cold read of same data file:
$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 250.640950 secs (428398402 bytes/sec)

Still OK: data created on 11.0-releng reads at full speed under 11-stable

Create 11-stable own dataset:
# zfs create -o recordsize=1m  -o mountpoint=/test-11stable zdata/11stable

And write similar 100g file:
# dd if=/dev/zero of=/test-11stable/100g-created-on-11stable bs=1m 
count=100k
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 246.409501 secs (435755041 bytes/sec)


Reboot (again into 11-stable) and read back created file

$ dd if=/test-11stable/100g-created-on-11stable of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 524.488506 secs (204721707 bytes/sec)

Hmm, about 1/2 of expected speed.

Let's read fist test file, created on 11-releng:
$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 250.591102 secs (428483620 bytes/sec)

Speed is still here.

So, problem is related to on-disk data written by 11-stable.

Let's try to recover 'bad data' into good one (assuming we're downgraded 
to 11-releng and want to get speed back)

Boot into 11-releng:

$ uname -a
FreeBSD iscsi.lexa.ru 11.0-RELEASE-p3 FreeBSD 11.0-RELEASE-p3 #7 
r309605: Tue Dec  6 12:21:10 MSK 2016 
lexa@iscsi.lexa.ru:/usr/obj/usr/src/sys/GENERIC  amd64

Try zfs send|recv:

# zfs snapshot zdata/11stable@1
# zfs send zdata/11stable@1 | zfs recv 
zdata/11stable-sendrecv-on11releng  # slow because same dataset used
# zfs destroy zdata/11stable-sendrecv-on11releng@1
# zfs destroy zdata/11stable@1
# zfs set mountpoint=/test-11stable-sendrecv-on11releng 
zdata/11stable-sendrecv-on11releng

Reboot and test read speed:

$ dd if=/test-11stable-sendrecv-on11releng/100g-created-on-11stable 
of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 336.193935 secs (319381676 bytes/sec)

INTERMEDIATE speed :(  Better than 'bad datafile', worse than 'good 
datafile'

Try file copy to another dataset:
# zfs create -o recordsize=1m -o 
mountpoint=/test-11stable-copy-on11releng zdata/11stable-copy-on11releng
# cd /test-11stable
# tar cf - . | (cd /test-11stable-copy-on11releng/; tar xvf - )

Reboot again and test speed another time:

$ dd if=/test-11stable-copy-on11releng/100g-created-on-11stable 
of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 261.484845 secs (410632526 bytes/sec)

OK, per-file copy solves the problem (as expected).

Finally, let's check that all testfiles still reads at the expected speed

'good' one, created on 11.0-releng:
]$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 249.435476 secs (430468770 bytes/sec)

'bad' one, created on 11-stable:
$ dd if=/test-11stable/100g-created-on-11stable of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 525.558477 secs (204304920 bytes/sec)

and 'intermediate', created by zfs send bad-dataset | zfs recv:

$ dd if=/test-11stable-sendrecv-on11releng/100g-created-on-11stable 
of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 336.570705 secs (319024148 bytes/sec)

Boot into 11-stable again and test read speed too:
$ uname -a
FreeBSD iscsi.lexa.ru 11.0-STABLE FreeBSD 11.0-STABLE #0 r309616: Tue 
Dec  6 16:15:48 MSK 2016 
lexa@iscsi.lexa.ru:/usr/obj/usr/src11/sys/GENERIC  amd64

Good:
]$ dd if=/test-11releng/100g-created-on-11releng of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 256.322824 secs (418902151 bytes/sec)

Bad:
$ dd if=/test-11stable/100g-created-on-11stable of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 529.918422 secs (202623985 bytes/sec)

Copied by tar (from bad to some intermediate state):
$ dd if=/test-11stable-copy-on11releng/100g-created-on-11stable 
of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 260.234944 secs (412604783 bytes/sec)

Copied by zfs send | zfs recv:
$ dd if=/test-11stable-sendrecv-on11releng/100g-created-on-11stable 
of=/dev/null bs=1m
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 336.240919 secs (319337047 bytes/sec)

(virtually) same speed as under 11.0-releng, so read speed is limited by 
on-disk structures.
-- 

Alex Tutubalin
Web: http://blog.lexa.ru
mailto:lexa@lexa.ru




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a10b10ed-9200-26f3-5ca4-976a13922dee>