Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 2 Apr 2018 00:18:56 +0200
From:      Stefan Esser <se@freebsd.org>
To:        "M. Warner Losh" <imp@freebsd.org>
Cc:        FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Extremely low disk throughput under high compute load
Message-ID:  <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org>
In-Reply-To: <CANCZdfoieekesqKa5RmOp=z2vycsVqnVss7ROnO87YTV-qBUzA@mail.gmail.com>
References:  <dc8d0285-1916-6581-2b2d-e8320ec3d894@freebsd.org> <CANCZdfoieekesqKa5RmOp=z2vycsVqnVss7ROnO87YTV-qBUzA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 01.04.18 um 18:33 schrieb Warner Losh:
> 
> 
> On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser <se@freebsd.org
> <mailto:se@freebsd.org>> wrote:
> 
>     My i7-2600K based system with 24 GB RAM was in the midst of a buildworld -j8
>     (starting from a clean state) which caused a load average of 12 for more than
>     1 hour, when I decided to move a directory structure holding some 10 GB to its
>     own ZFS file system. File sizes varied, but were mostly in the range 0f 500KB.
> 
>     I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus there
>     was nearly no disk activity caused by the buildworld.
> 
>     The copying proceeded at a rate of at most 10 MB/s, but most of the time less
>     than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and thus a
>     much better priority than the compute bound compiler processes, but it got
>     just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was scheduled
>     at such a low rate, that it only managed to issue a few controller writes per
>     second.
> 
>     The system is healthy and does not show any problems or anomalies under
>     normal use (e.g., file copies are fast, without the high compute load).
> 
>     This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging.
> 
>     Is this a regression in -CURRENT?
> 
> Does 'sync' push a lot of I/O to the disk?

Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so
slow, not much is written.

Normal gstat output for the 3 drives the RAIDZ1 consists of:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      2      2     84   39.1      0      0    0.0    7.8  ada0
    0      4      4     92   66.6      0      0    0.0   26.6  ada1
    0      6      6    259   66.9      0      0    0.0   36.2  ada3
dT: 1.058s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      1      1     60   70.6      0      0    0.0    6.7  ada0
    0      3      3     68   71.3      0      0    0.0   20.2  ada1
    0      6      6    242   65.5      0      0    0.0   28.8  ada3
dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0      5      5    192   44.8      0      0    0.0   22.4  ada0
    0      6      6    160   61.9      0      0    0.0   26.5  ada1
    0      6      6    172   43.7      0      0    0.0   26.2  ada3

This includes the copy process and the reads caused by "make -j 8 world"
(but I assume that all the source files are already cached in ARC).

During sync:

dT: 1.002s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1    101      9    132   14.6     90   1760    5.6   59.7  ada0
    2    110     16    267   15.0     92   1756    6.0   50.7  ada1
    2     82     13    291   17.8     67   1653    7.4   34.3  ada3

ZFS is configured to flush dirty buffers after 5 seconds, so there are
not many dirty buffers in RAM at any time, anyway.

> Is the effective throughput of CP tiny or large? It's tiny, if I read right,
> and the I/O is slow (as opposed to it all buffering in memory and being slow
> to drain own), right?

Yes, reading is very slow, with less than 10 read operations scheduled
per second.

Top output taken at the same time as above gstat samples:

last pid: 24306;  load averages: 12.07, 11.51,  8.13

                      up 2+05:41:57  00:10:22
132 processes: 10 running, 122 sleeping
CPU: 98.2% user,  0.0% nice,  1.7% system,  0.1% interrupt,  0.0% idle
Mem: 1069M Active, 1411M Inact, 269M Laundry, 20G Wired, 1076M Free
ARC: 16G Total, 1234M MFU, 14G MRU, 83M Anon, 201M Header, 786M Other
     14G Compressed, 30G Uncompressed, 2.09:1 Ratio
Swap: 24G Total, 533M Used, 23G Free, 2% Inuse

  PID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
24284 root             1  92    0   228M   199M CPU6    6   0:11 101.34% c++
24287 root             1  91    0   269M   241M CPU3    3   0:10 101.32% c++
24266 root             1  97    0   303M   276M CPU0    0   0:17 101.13% c++
24297 root             1  85    0   213M   184M CPU1    1   0:06  98.40% c++
24281 root             1  93    0   245M   217M CPU7    7   0:12  96.76% c++
24300 root             1  76    0   114M 89268K RUN     2   0:02  83.22% c++
24303 root             1  75    0   105M 79908K CPU4    4   0:01  59.94% c++
24302 root             1  52    0 74940K 47264K wait    4   0:00   0.35% c++
24299 root             1  52    0 74960K 47268K wait    2   0:00   0.33% c++
20954 root             1  20    0 15528K  4900K zio->i  3   0:02   0.11% cp

ARC is limited to 18 GB to leave 6 GB RAM for use by kernel and user programs.

vfs.zfs.arc_meta_limit: 4500000000
vfs.zfs.arc_free_target: 42339
vfs.zfs.compressed_arc_enabled: 1
vfs.zfs.arc_grow_retry: 60
vfs.zfs.arc_shrink_shift: 7
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_no_grow_shift: 5
vfs.zfs.arc_min: 2250000000
vfs.zfs.arc_max: 18000000000

Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1d188cb0-ebc8-075f-ed51-57641ede1fd6>