From owner-freebsd-current@freebsd.org Sun Apr 1 22:19:11 2018 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ED427F7FCA9 for ; Sun, 1 Apr 2018 22:19:10 +0000 (UTC) (envelope-from se@freebsd.org) Received: from mailout09.t-online.de (mailout09.t-online.de [194.25.134.84]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mailout00.t-online.de", Issuer "TeleSec ServerPass DE-2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 733C68088F; Sun, 1 Apr 2018 22:19:10 +0000 (UTC) (envelope-from se@freebsd.org) Received: from fwd06.aul.t-online.de (fwd06.aul.t-online.de [172.20.26.150]) by mailout09.t-online.de (Postfix) with SMTP id B647D423831F; Mon, 2 Apr 2018 00:19:02 +0200 (CEST) Received: from Stefans-MBP-7.fritz.box (GuEPj-Z1rhtBSOiYp2JVJ1hyl3IUmXoqIiWy3S4vF3COcXJScxI-m-aJ9cfulOFQ6z@[84.154.109.148]) by fwd06.t-online.de with (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384 encrypted) esmtp id 1f2lJQ-2I9BTs0; Mon, 2 Apr 2018 00:18:56 +0200 Subject: Re: Extremely low disk throughput under high compute load References: From: Stefan Esser Openpgp: preference=signencrypt Autocrypt: addr=se@freebsd.org; keydata= xsBNBFVxiRIBCADOLNOZBsqlplHUQ3tG782FNtVT33rQli9EjNt2fhFERHIo4NxHlWBpHLnU b0s4L/eItx7au0i7Gegv01A9LUMwOnAc9EFAm4EW3Wmoa6MYrcP7xDClohg/Y69f7SNpEs3x YATBy+L6NzWZbJjZXD4vqPgZSDuMcLU7BEdJf0f+6h1BJPnGuwHpsSdnnMrZeIM8xQ8PPUVQ L0GZkVojHgNUngJH6e21qDrud0BkdiBcij0M3TCP4GQrJ/YMdurfc8mhueLpwGR2U1W8TYB7 4UY+NLw0McThOCLCxXflIeF/Y7jSB0zxzvb/H3LWkodUTkV57yX9IbUAGA5RKRg9zsUtABEB AAHNLlN0ZWZhbiBFw59lciAoVC1PbmxpbmUpIDxzdC5lc3NlckB0LW9ubGluZS5kZT7CwH8E EwEIACkFAlhtTvQCGwMFCQWjmoAHCwkIBwMCAQYVCAIJCgsEFgIDAQIeAQIXgAAKCRBH67Xv Wv31RAn0B/9skuajrZxjtCiaOFeJw9l8qEOSNF6PKMN2i/wosqNK57yRQ9AS18x4+mJKXQtc mwyejjQTO9wasBcniKMYyUiie3p7iGuFR4kSqi4xG7dXKjMkYvArWH5DxeWBrVf94yPDexEV FnEG9t1sIXjL17iFR8ng5Kkya5yGWWmikmPdtZChj9OUq4NKHKR7/HGM2dxP3I7BheOwY9PF 4mhqVN2Hu1ZpbzzJo68N8GGBmpQNmahnTsLQ97lsirbnPWyMviWcbzfBCocI9IlepwTCqzlN FMctBpLYjpgBwHZVGXKucU+eQ/FAm+6NWatcs7fpGr7dN99S8gVxnCFX1Lzp/T1YzsBNBFVx iRIBCACxI/aglzGVbnI6XHd0MTP05VK/fJub4hHdc+LQpz1MkVnCAhFbY9oecTB/togdKtfi loavjbFrb0nJhJnx57K+3SdSuu+znaQ4SlWiZOtXnkbpRWNUeMm+gtTDMSvloGAfr76RtFHs kdDOLgXsHD70bKuMhlBxUCrSwGzHaD00q8iQPhJZ5itb3WPqz3B4IjiDAWTO2obD1wtAvSuH uUj/XJRsiKDKW3x13cfavkad81bZW4cpNwUv8XHLv/vaZPSAly+hkY7NrDZydMMXVNQ7AJQu fWuTJ0q7sImRcEZ5EIa98esJPey4O7C0vY405wjeyxpVZkpqThDMurqtQFn1ABEBAAHCwGUE GAEKAA8FAlVxiRICGwwFCQWjmoAACgkQR+u171r99UQEHAf/ZxNbMxwX1v/hXc2ytE6yCAil piZzOffT1VtS3ET66iQRe5VVKL1RXHoIkDRXP7ihm3WF7ZKy9yA9BafMmFxsbXR3+2f+oND6 nRFqQHpiVB/QsVFiRssXeJ2f0WuPYqhpJMFpKTTW/wUWhsDbytFAKXLLfesKdUlpcrwpPnJo KqtVbWAtQ2/o3y+icYOUYzUig+CHl/0pEPr7cUhdDWqZfVdRGVIk6oy00zNYYUmlkkVoU7MB V5D7ZwcBPtjs254P3ecG42szSiEo2cvY9vnMTCIL37tX0M5fE/rHub/uKfG2+JdYSlPJUlva RS1+ODuLoy1pzRd907hl8a7eaVLQWA== To: "M. Warner Losh" Cc: FreeBSD Current Message-ID: <1d188cb0-ebc8-075f-ed51-57641ede1fd6@freebsd.org> Date: Mon, 2 Apr 2018 00:18:56 +0200 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Language: de-CH Content-Transfer-Encoding: 7bit X-ID: GuEPj-Z1rhtBSOiYp2JVJ1hyl3IUmXoqIiWy3S4vF3COcXJScxI-m-aJ9cfulOFQ6z X-TOI-MSGID: e0c12624-b252-488f-a7eb-d5e17608be36 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 01 Apr 2018 22:19:11 -0000 Am 01.04.18 um 18:33 schrieb Warner Losh: > > > On Sun, Apr 1, 2018 at 9:18 AM, Stefan Esser > wrote: > > My i7-2600K based system with 24 GB RAM was in the midst of a buildworld -j8 > (starting from a clean state) which caused a load average of 12 for more than > 1 hour, when I decided to move a directory structure holding some 10 GB to its > own ZFS file system. File sizes varied, but were mostly in the range 0f 500KB. > > I had just thrown away /usr/obj, but /usr/src was cached in ARC and thus there > was nearly no disk activity caused by the buildworld. > > The copying proceeded at a rate of at most 10 MB/s, but most of the time less > than 100 KB/s were transferred. The "cp" process had a PRIO of 20 and thus a > much better priority than the compute bound compiler processes, but it got > just 0.2% to 0.5% of 1 CPU core. Apparently, the copy process was scheduled > at such a low rate, that it only managed to issue a few controller writes per > second. > > The system is healthy and does not show any problems or anomalies under > normal use (e.g., file copies are fast, without the high compute load). > > This was with SCHED_ULE on a -CURRENT without WITNESS or malloc debugging. > > Is this a regression in -CURRENT? > > Does 'sync' push a lot of I/O to the disk? Each sync takes 0.7 to 1.5 seconds to complete, but since reading is so slow, not much is written. Normal gstat output for the 3 drives the RAIDZ1 consists of: dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 2 2 84 39.1 0 0 0.0 7.8 ada0 0 4 4 92 66.6 0 0 0.0 26.6 ada1 0 6 6 259 66.9 0 0 0.0 36.2 ada3 dT: 1.058s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 1 1 60 70.6 0 0 0.0 6.7 ada0 0 3 3 68 71.3 0 0 0.0 20.2 ada1 0 6 6 242 65.5 0 0 0.0 28.8 ada3 dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 0 5 5 192 44.8 0 0 0.0 22.4 ada0 0 6 6 160 61.9 0 0 0.0 26.5 ada1 0 6 6 172 43.7 0 0 0.0 26.2 ada3 This includes the copy process and the reads caused by "make -j 8 world" (but I assume that all the source files are already cached in ARC). During sync: dT: 1.002s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 1 101 9 132 14.6 90 1760 5.6 59.7 ada0 2 110 16 267 15.0 92 1756 6.0 50.7 ada1 2 82 13 291 17.8 67 1653 7.4 34.3 ada3 ZFS is configured to flush dirty buffers after 5 seconds, so there are not many dirty buffers in RAM at any time, anyway. > Is the effective throughput of CP tiny or large? It's tiny, if I read right, > and the I/O is slow (as opposed to it all buffering in memory and being slow > to drain own), right? Yes, reading is very slow, with less than 10 read operations scheduled per second. Top output taken at the same time as above gstat samples: last pid: 24306; load averages: 12.07, 11.51, 8.13 up 2+05:41:57 00:10:22 132 processes: 10 running, 122 sleeping CPU: 98.2% user, 0.0% nice, 1.7% system, 0.1% interrupt, 0.0% idle Mem: 1069M Active, 1411M Inact, 269M Laundry, 20G Wired, 1076M Free ARC: 16G Total, 1234M MFU, 14G MRU, 83M Anon, 201M Header, 786M Other 14G Compressed, 30G Uncompressed, 2.09:1 Ratio Swap: 24G Total, 533M Used, 23G Free, 2% Inuse PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 24284 root 1 92 0 228M 199M CPU6 6 0:11 101.34% c++ 24287 root 1 91 0 269M 241M CPU3 3 0:10 101.32% c++ 24266 root 1 97 0 303M 276M CPU0 0 0:17 101.13% c++ 24297 root 1 85 0 213M 184M CPU1 1 0:06 98.40% c++ 24281 root 1 93 0 245M 217M CPU7 7 0:12 96.76% c++ 24300 root 1 76 0 114M 89268K RUN 2 0:02 83.22% c++ 24303 root 1 75 0 105M 79908K CPU4 4 0:01 59.94% c++ 24302 root 1 52 0 74940K 47264K wait 4 0:00 0.35% c++ 24299 root 1 52 0 74960K 47268K wait 2 0:00 0.33% c++ 20954 root 1 20 0 15528K 4900K zio->i 3 0:02 0.11% cp ARC is limited to 18 GB to leave 6 GB RAM for use by kernel and user programs. vfs.zfs.arc_meta_limit: 4500000000 vfs.zfs.arc_free_target: 42339 vfs.zfs.compressed_arc_enabled: 1 vfs.zfs.arc_grow_retry: 60 vfs.zfs.arc_shrink_shift: 7 vfs.zfs.arc_average_blocksize: 8192 vfs.zfs.arc_no_grow_shift: 5 vfs.zfs.arc_min: 2250000000 vfs.zfs.arc_max: 18000000000 Regards, STefan