From nobody Tue Dec 14 03:06:50 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id CC68A18EE9EB for ; Tue, 14 Dec 2021 03:06:53 +0000 (UTC) (envelope-from sblachmann@gmail.com) Received: from mail-lj1-x22c.google.com (mail-lj1-x22c.google.com [IPv6:2a00:1450:4864:20::22c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JCjvw3ZJNz3j40; Tue, 14 Dec 2021 03:06:52 +0000 (UTC) (envelope-from sblachmann@gmail.com) Received: by mail-lj1-x22c.google.com with SMTP id p8so26449546ljo.5; Mon, 13 Dec 2021 19:06:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to:cc; bh=CzzL3QljhounuQzPrOZs8zYHOj1IBBAWh7WM/OIsupU=; b=AgPeLK/BhKkwi+w/NBd/WJnQnDzuEyjiHKu4gvleXMcBI/sukOWrK5urCbIN45EVUP c8+WWDD3aUjDe1vTeE6rAEPjXhgCVF1aTAtlts3Wy3ubeXkSy6Ti54py+8o0hzwZ2U+d P1Ch5gwBsGfLvIiWvrDYH7KXMDfv1f9GcQUzPAlw0LFZ+mLTB6KSR6Yi9VNlbzycwMA0 H6fEntufq0sM63jlW0fjF+0MLQ5CHvg1rXY85VA0ef64WbeD4baThngaz5WfsV0XhbnQ 5Ai3lBYw50apKqHjvp8fgjBVQDYC65PW7Vt2S62+ddcpDm4yALSiUtcjZnhhun7wX6fZ MvTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=CzzL3QljhounuQzPrOZs8zYHOj1IBBAWh7WM/OIsupU=; b=jKwbc+Q88lW3qxuk6gCbByR0LChY4wGEufCj2hSM4BhkICuWrstpJacJWJF55P0IRM JEUhiZJqgWKHXwNitnmozTxwZRleKLOqz+eJDmsTt5TL4h1a0Z2nrAg2Edyk0VhaWeEI 7YMFECCkPQzIiyXvNJlAvuMUFEDIKCB/wVj1dx8nZ2dB8KN1+r4VJCeuBGBq6UxtjaK5 NOxiiqXRVr6O2y9QRJVOZAMbtwaD6Ef/H/SrG3kIpvkHZNFRKxcn5CXkSyuFi06xzEfg eGrosjp79nzhOwjri3JEI6zEDxcq1Tcfti00P1xSqaz/0LTY0pslUtARcjXIuGj+04i0 Ucvg== X-Gm-Message-State: AOAM530stmy6ViNcdSD2lYiEOyIU/TwV/b+Q86JgK/kB9Sv5xuRfZ2tq DAR8bMPuRtrYFid2RKHnGsgAdUPlGld4Q5w6GGbBxqKX X-Google-Smtp-Source: ABdhPJw7QcxTVz6hYkN4xKPeTXI4ucxiUolyEyD5AQgHx1I+LnPxac3lRmp6oHmJqp3156QG1gC8vMT8Z/m0rAR9nlY= X-Received: by 2002:a05:651c:b12:: with SMTP id b18mr2466855ljr.306.1639451211007; Mon, 13 Dec 2021 19:06:51 -0800 (PST) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 Received: by 2002:a05:651c:1242:0:0:0:0 with HTTP; Mon, 13 Dec 2021 19:06:50 -0800 (PST) From: Stefan Blachmann Date: Tue, 14 Dec 2021 04:06:50 +0100 Message-ID: Subject: Weak disk I/O performance on daX compared to adaX, was: Re: dd performance [was Re: Call for Foundation-supported Project Ideas] To: Alan Somers Cc: Johannes Totz , FreeBSD Hackers Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4JCjvw3ZJNz3j40 X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b="AgPeLK/B"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of sblachmann@gmail.com designates 2a00:1450:4864:20::22c as permitted sender) smtp.mailfrom=sblachmann@gmail.com X-Spamd-Result: default: False [-4.00 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::22c:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-ThisMailContainsUnwantedMimeParts: N I am wondering what could be the cause for the weak disk I/O performance on FreeBSD when using daX drivers instead of adaX drivers. Explanation: The HP Z420 has 6 SATA ports. SATA drives that get connected to port #1 to #4 are being shown as daX drives on FreeBSD. Drives connected to ports 5 and 6 appear as adaX drives. > On 12/2/21, Alan Somers wrote: >> That is your problem then. The default value for dd if 512B. If it >> took 3 days to erase a 2 TB HDD, that means you were writing 15,000 >> IOPs. Frankly, I'm impressed that the SATA bus could handle that This shows that on the ada driver, the disk I/O performance is acceptable. However, after 14 days dd is still working on the same type drive on connector 4 (da3). So my questions: - Why does FreeBSD use the da driver instead of the ada driver for drives on SATA ports 1-4? - And, why is the da driver so slow? (For example, on HP Z800 when used with FreeBSD, 15k SAS drives seem as slow as normal consumer drives, while on Linux disk I/O is just snappy.) - Is there a way to configure FreeBSD to use the ada driver instead of the da driver, so using FreeBSD is still an alternative to Linux if disk speed matters? - Or is it impossible to use the ada drivers on SATA connectors 1-4 for maybe some HP Z420 hardware related reasons? Cheers, Stefan On 12/2/21, Stefan Blachmann wrote: > Ah, the buffer cache! Didn't think of that. > Top shows the weighted cpu load is about 4%, so your guess that it was > the SATA scheduler might be correct. > Will try this on Linux the next days using conv=direct with a pair of > identical HDDs. > Already curious for the results. > > > > On 12/2/21, Alan Somers wrote: >> That is your problem then. The default value for dd if 512B. If it >> took 3 days to erase a 2 TB HDD, that means you were writing 15,000 >> IOPs. Frankly, I'm impressed that the SATA bus could handle that >> many. By using such a small block size, you were doing an excellent >> job of exercising the SATA bus and the HDD's host interface, but its >> servo and write head were mostly just idle. >> >> The reason why Linux is different is because unlike FreeBSD it has a >> buffer cache. Even though dd was writing with 512B blocks, those >> writes probably got combined by the buffer cache before going to SATA. >> However, if you use the conv=direct option with dd, then they probably >> won't be combined. I haven't tested this; it's just a guess. You can >> probably verify using iostat. >> >> When you were trying to erase two HDDs concurrently but only one was >> getting all of the IOPs and CPU time, was your CPU saturated? I'm >> guessing not. On my machine, with a similar HDD, dd only consumes 10% >> of the CPU when I write zeros with a 512B block size. I need to use a >> 16k block size or larger to get the IOPs under 10,000. So I'm >> guessing that in your case the CPU scheduler was working just fine, >> but the SATA bus was saturated, and the SATA scheduler was the source >> of the unfairness. >> -Alan >> >> On Thu, Dec 2, 2021 at 10:37 AM Stefan Blachmann >> wrote: >>> >>> I intentionally used dd without the bs parameter, as I do care less >>> about "maximum speed" than clearing the drives completely and also do >>> a lot of I/O transactions. >>> The latter because drives that are becoming unreliable tend to >>> occasionally throw errors, and the more I/O transactions one does the >>> better the chance is to spot this kind of drives. >>> >>> The system is a HP Z420, the mainboard/chipset/controller specs can be >>> found in the web. >>> The drives in question here (quite old) 2TB WD Black enterprise grade >>> 3.5" SATA drives. Their SMART data is good, not hinting at any >>> problems. >>> >>> On Linux, erasing them both concurrently finished at almost the same >>> time. >>> Thus I do not really understand why on FreeBSD this is so much >>> different. >>> >>> On 12/2/21, Alan Somers wrote: >>> > This is very surprising to me. I never see dd take significant CPU >>> > consumption until the speed gets up into the GB/s range. What are you >>> > using for the bs= option? If you set that too low, or use the >>> > default, it will needlessly consume extra CPU and IOPs. I usually set >>> > it to 1m for this kind of usage. And what kind of HDDs are these, >>> > connected to what kind of controller? >>> > >>> > On Thu, Dec 2, 2021 at 9:54 AM Stefan Blachmann >>> > wrote: >>> >> >>> >> Regarding the suggestions to either improve or replace the ULE >>> >> scheduler, I would like to share another observation. >>> >> >>> >> Usually when I need to zero out HDDs using dd, I use a live Linux. >>> >> This time I did that on FreeBSD (13). >>> >> My observations: >>> >> - On the same hardware, the data transfer rate is a small fraction >>> >> (about 1/4th) of which is achieved by Linux. >>> >> - The first dd process, which erases the first HDD, gets almost all >>> >> CPU and I/O time. The second process which does the second HDD is >>> >> getting starved. It actually really starts only after the first one >>> >> finished. >>> >> >>> >> To me it was *very* surprising to find out that, while erasing two >>> >> similar HDDs concurrently takes about one day on Linux, on FreeBSD, >>> >> the first HDD was finished after three days, and only after that the >>> >> remaining second dd process got the same CPU time, making it proceed >>> >> fast instead of creepingly slowly. >>> >> >>> >> So I guess this might be a scheduler issue. >>> >> I certainly will do some tests using the old scheduler when I got >>> >> time. >>> >> And, I ask myself: >>> >> Could it be a good idea to sponsor porting the Dragonfly scheduler to >>> >> FreeBSD? >>> >> >>> >> On 12/2/21, Johannes Totz wrote: >>> >> > On 29/11/2021 03:17, Ed Maste wrote: >>> >> >> On Sun, 28 Nov 2021 at 19:37, Steve Kargl >>> >> >> wrote: >>> >> >>> >>> >> >>> It's certainly not the latest and greatest, >>> >> >>> CPU: Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz (1995.04-MHz >>> >> >>> K8-class CPU) >>> >> >> >>> >> >> If you're content to use a compiler from a package you can save a >>> >> >> lot >>> >> >> of time by building with `CROSS_TOOLCHAIN=llvm13` and >>> >> >> `WITHOUT_TOOLCHAIN=yes`. Or, instead of WITHOUT_TOOLCHAIN perhaps >>> >> >> `WITHOUT_CLANG=yes`, `WITHOUT_LLD=yes` and `WITHOUT_LLDB=yes`. >>> >> > >>> >> > (re-send to list, sorry) >>> >> > Can we disconnect the compiler optimisation flag for base and >>> >> > clang? >>> >> > I >>> >> > don't need the compiler to be build with -O2 but I want the >>> >> > resulting >>> >> > base system to have optimisations enabled. >>> >> > Right now, looks like both get -O2 and a lot of time is spent on >>> >> > optimising the compiler (for no good reason). >>> >> > >>> >> > >>> >> >>> > >> >