From nobody Tue Dec 14 05:58:35 2021 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4C92518E5CD2 for ; Tue, 14 Dec 2021 05:58:47 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-vk1-xa34.google.com (mail-vk1-xa34.google.com [IPv6:2607:f8b0:4864:20::a34]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JCnkH1BbGz4Y9g for ; Tue, 14 Dec 2021 05:58:47 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-vk1-xa34.google.com with SMTP id h1so11868967vkh.0 for ; Mon, 13 Dec 2021 21:58:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=90MqaEUVsiG0llvFYd8yBXTi5BSr3QXz/R62omYzg14=; b=6dLQdb7InHLTMhTE97AsmipoMEuQshcCASDGAVEn2uxqgvtHVTFUmAvvBOv1HQmoJp jHHXERVOI8lDXYHSAv6ricgr2tCC4xWBUx7WLfsZxYsixF1Wh2QsE74ZQbSyTXEhcFur vm3Jz9JxXhYT5k9qzbSIN5GyUkmyqTGIhPlFmrB/XzP1jXvk0kaZzt1T7rw3TdiYSeah XF7W75wbMWhVj7Bz0MGmKu3oqzigsR+9kfYmH+w55/5JRT4jFQ6Yo2QhWuVltxmdTlkj azMTH9hbATjp/sBLSPowFAyndIXq3w1sCdR5SaKimRj9TecnZWGq1PU8+Q3dv/W5CO5H lbog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=90MqaEUVsiG0llvFYd8yBXTi5BSr3QXz/R62omYzg14=; b=uTsKwFHy3gRDf/n6+InzJnZrxSU3eYC4N8+yjeXRDzzPfRNYfG58quEIrkn+iypxgV 0sFZy0eHpn4qy17du/hxTzFUM+PO1gaqyGxJXBb/uLErqB/O6AlLZLHEfRy+0HEOdczh EXQlBUcbqF5YS+4qCDiNxGMH9OViMsSZfNE+PlzDWW+h5mUPANRGtRhXXZAZ29a2eF7e Qe12L8AbjG9qgMM85CsSY4naaObNg0QjL+opjiom5i9mCnnAN9xru7nJ4B3dpFBeITsM gQr3kSoS1xlL5O8BYVpxgr8SMvsyWzYv3IGoq0QTdi++FBLbTdS+ezz6NxV0S5DkFV29 DzYw== X-Gm-Message-State: AOAM530zXDCqW7gbQQBkaFODGgdiG+sX5A3djZ7V+pObN5q8ctYge8jW RaLS8wMTKloupZv/u4yeoadWeOmH/yjXK4fz0Ki3TL1yfSRSLA== X-Google-Smtp-Source: ABdhPJyzPSr2yYrpvhlmv8Y5/uvUws8CINg+XIBvmaH5OZkiKHVA95A1I0swipJyrc7TJTnwjvVawhRfwY/Zc2fcHc0= X-Received: by 2002:a05:6122:988:: with SMTP id g8mr4327226vkd.2.1639461526409; Mon, 13 Dec 2021 21:58:46 -0800 (PST) List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Mon, 13 Dec 2021 22:58:35 -0700 Message-ID: Subject: Re: Weak disk I/O performance on daX compared to adaX, was: Re: dd performance [was Re: Call for Foundation-supported Project Ideas] To: Stefan Blachmann Cc: Alan Somers , Johannes Totz , FreeBSD Hackers Content-Type: multipart/alternative; boundary="000000000000e7abd205d314e168" X-Rspamd-Queue-Id: 4JCnkH1BbGz4Y9g X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --000000000000e7abd205d314e168 Content-Type: text/plain; charset="UTF-8" Oh. isci. This silicon has known limitations. First, the reported speed is 3Gbps, which is 1/2 the speed of most SATA drives. This is a big problem for SSDs. Second, there's a limit for all the channels that's less than 4*3Gbps. However, this limit isn't documented: you have to measure it. Third, there may be a bug in isci that's likely causing excess CPU. I've had issues in the past with this driver. I don't use it much any more since the machines that had it in it have aged out. And I've never done much work with the isci driver, so I'm not sure how much more help I can be.... Finally, If you have a free PCIe slot, you might be happier with an old LSI mps or mrsas card. The card will be about $30-40 and another $15-20 for the cable. I know this is an unsatisfying answer... Warner On Mon, Dec 13, 2021 at 9:42 PM Stefan Blachmann wrote: > Hi Warner, thank you for the reply! > > > Because it isn't an ahci controller, but something else. dmesg will tell > > you, or camcontrol devlist. It might be crappy hardware, poorly > configured > > or maybe you've discovered a driver bug that's easy to fix. > > There is a RAID BIOS on the HP Z420 that apparently cannot be deactivated. > According to what the RAID BIOS screen displays, there is no RAID > configured, and the drives connected to the system seem to be "passed > through" to the OS. > > The camcontrol devlist output (see the two WD 2003FYYS): > > at scbus0 target 0 lun 0 (pass0,da0) > at scbus0 target 1 lun 0 (pass1,da1) > at scbus0 target 2 lun 0 (pass2,da2) > at scbus0 target 3 lun 0 (pass3,da3) > at scbus1 target 0 lun 0 (cd0,pass4) > at scbus2 target 0 lun 0 (pass5,ada0) > at scbus3 target 0 lun 0 (pass6,ses0) > at scbus4 target 0 lun 0 (da4,pass7) > > The dmesg (not obviously disk-related stuff snipped) tells about a > SATA controller and a SAS controller in SATA mode: > > > isci0: port > 0xd000-0xd > 0ff mem 0xe2800000-0xe2803fff,0xe2400000-0xe27fffff irq 16 at device 0.0 > on pci4 > > ahci0: port > 0xe0d0-0xe0d7,0xe0c0-0xe0c3,0x > e0b0-0xe0b7,0xe0a0-0xe0a3,0xe020-0xe03f mem 0xef72c000-0xef72c7ff irq 19 > at devi > ce 31.2 on pci0 > ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported > ahcich0: at channel 0 on ahci0 > ahcich1: at channel 1 on ahci0 > ahciem0: on ahci0 > > ses0 at ahciem0 bus 0 scbus3 target 0 lun 0 > ses0: SEMB S-E-S 2.00 device > ses0: SEMB SES Device > ada0 at ahcich1 bus 0 scbus2 target 0 lun 0 > ada0: ATA8-ACS SATA 2.x device > ada0: Serial Number WD-WMAY04325670 > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 1907729MB (3907029168 512 byte sectors) > ses0: (none) in 'Slot 00', SATA Slot: scbus1 target 0 > ses0: ada0 in 'Slot 01', SATA Slot: scbus2 target 0 > da0 at isci0 bus 0 scbus0 target 0 lun 0 > da0: Fixed Direct Access SPC-3 SCSI device > da0: Serial Number 14260C786E76 > da0: 300.000MB/s transfers > da0: Command Queueing enabled > da0: 244198MB (500118192 512 byte sectors) > da1 at isci0 bus 0 scbus0 target 1 lun 0 > da1: Fixed Direct Access SPC-3 SCSI device > da1: Serial Number Z7747X6AS > da1: 300.000MB/s transfers > da1: Command Queueing enabled > da1: 2861588MB (5860533168 512 byte sectors) > da2 at isci0 bus 0 scbus0 target 2 lun 0 > da2: Fixed Direct Access SPC-3 SCSI device > da2: Serial Number Z7747SSAS > da2: 300.000MB/s transfers > da2: Command Queueing enabled > da2: 2861588MB (5860533168 512 byte sectors) > cd0 at ahcich0 bus 0 scbus1 target 0 lun 0 > cd0: Removable CD-ROM SCSI device > cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes) > cd0: Attempt to query device size failed: NOT READY, Medium not present - > tray c > losed > da3 at isci0 bus 0 scbus0 target 3 lun 0 > da3: Fixed Direct Access SPC-3 SCSI device > da3: Serial Number WD-WMAY04325722 > da3: 300.000MB/s transfers > da3: Command Queueing enabled > da3: 1907729MB (3907029168 512 byte sectors) > > GEOM_ELI: Device da0p2.eli created. > GEOM_ELI: Encryption: AES-XTS 128 > GEOM_ELI: Crypto: accelerated software > > > System is only running dd, except for a few sc consoles idling at > command prompt nothing else. > > top -m io varies 99.8 to 100% for dd > top without options dd shows a time of >800:x for dd and a WCPU > varying between 3.5-4.2%. > For the drive on ada0 dd total time was ~150:x with a WCPU of ~1%. > > > > > > On 12/14/21, Warner Losh wrote: > > On Mon, Dec 13, 2021 at 8:08 PM Stefan Blachmann > > wrote: > > > >> I am wondering what could be the cause for the weak disk I/O > >> performance on FreeBSD when using daX drivers instead of adaX drivers. > >> > >> Explanation: > >> The HP Z420 has 6 SATA ports. > >> SATA drives that get connected to port #1 to #4 are being shown as daX > >> drives on FreeBSD. > >> Drives connected to ports 5 and 6 appear as adaX drives. > >> > >> > On 12/2/21, Alan Somers wrote: > >> >> That is your problem then. The default value for dd if 512B. If it > >> >> took 3 days to erase a 2 TB HDD, that means you were writing 15,000 > >> >> IOPs. Frankly, I'm impressed that the SATA bus could handle that > >> > >> This shows that on the ada driver, the disk I/O performance is > >> acceptable. > >> However, after 14 days dd is still working on the same type drive on > >> connector 4 (da3). > >> > >> So my questions: > >> - Why does FreeBSD use the da driver instead of the ada driver for > >> drives on SATA ports 1-4? > >> > > > > Because it isn't an ahci controller, but something else. dmesg will tell > > you, or camcontrol devlist. It might be crappy hardware, poorly > configured > > or maybe you've discovered a driver bug that's easy to fix. > > > > > >> - And, why is the da driver so slow? (For example, on HP Z800 when > >> used with FreeBSD, 15k SAS drives seem as slow as normal consumer > >> drives, while on Linux disk I/O is just snappy.) > >> > > > > It isn't. It's more likely the controller they are attached to that's > slow. > > At work > > we get line rate out of daX and adaX all the time. They are just protocol > > translators and hands it off to the host adapter (what's called the SIM). > > > > > >> - Is there a way to configure FreeBSD to use the ada driver instead of > >> the da driver, so using FreeBSD is still an alternative to Linux if > >> disk speed matters? > >> > > > > Unlikely. > > > > > >> - Or is it impossible to use the ada drivers on SATA connectors 1-4 > >> for maybe some HP Z420 hardware related reasons? > >> > > > > What does camcontrol devlist tell you? Chances are it's the SIM that's to > > blame for the poor performance (we run all kinds of crazy I/O through ada > > and da and if anything da is a smidge faster). The key question is why > > things are so seemingly slow. > > > > Warner > > > > > >> Cheers, > >> Stefan > >> > >> > >> On 12/2/21, Stefan Blachmann wrote: > >> > Ah, the buffer cache! Didn't think of that. > >> > Top shows the weighted cpu load is about 4%, so your guess that it was > >> > the SATA scheduler might be correct. > >> > Will try this on Linux the next days using conv=direct with a pair of > >> > identical HDDs. > >> > Already curious for the results. > >> > > >> > > >> > > >> > On 12/2/21, Alan Somers wrote: > >> >> That is your problem then. The default value for dd if 512B. If it > >> >> took 3 days to erase a 2 TB HDD, that means you were writing 15,000 > >> >> IOPs. Frankly, I'm impressed that the SATA bus could handle that > >> >> many. By using such a small block size, you were doing an excellent > >> >> job of exercising the SATA bus and the HDD's host interface, but its > >> >> servo and write head were mostly just idle. > >> >> > >> >> The reason why Linux is different is because unlike FreeBSD it has a > >> >> buffer cache. Even though dd was writing with 512B blocks, those > >> >> writes probably got combined by the buffer cache before going to > SATA. > >> >> However, if you use the conv=direct option with dd, then they > probably > >> >> won't be combined. I haven't tested this; it's just a guess. You > can > >> >> probably verify using iostat. > >> >> > >> >> When you were trying to erase two HDDs concurrently but only one was > >> >> getting all of the IOPs and CPU time, was your CPU saturated? I'm > >> >> guessing not. On my machine, with a similar HDD, dd only consumes > 10% > >> >> of the CPU when I write zeros with a 512B block size. I need to use > a > >> >> 16k block size or larger to get the IOPs under 10,000. So I'm > >> >> guessing that in your case the CPU scheduler was working just fine, > >> >> but the SATA bus was saturated, and the SATA scheduler was the source > >> >> of the unfairness. > >> >> -Alan > >> >> > >> >> On Thu, Dec 2, 2021 at 10:37 AM Stefan Blachmann > >> >> > >> >> wrote: > >> >>> > >> >>> I intentionally used dd without the bs parameter, as I do care less > >> >>> about "maximum speed" than clearing the drives completely and also > do > >> >>> a lot of I/O transactions. > >> >>> The latter because drives that are becoming unreliable tend to > >> >>> occasionally throw errors, and the more I/O transactions one does > the > >> >>> better the chance is to spot this kind of drives. > >> >>> > >> >>> The system is a HP Z420, the mainboard/chipset/controller specs can > >> >>> be > >> >>> found in the web. > >> >>> The drives in question here (quite old) 2TB WD Black enterprise > grade > >> >>> 3.5" SATA drives. Their SMART data is good, not hinting at any > >> >>> problems. > >> >>> > >> >>> On Linux, erasing them both concurrently finished at almost the same > >> >>> time. > >> >>> Thus I do not really understand why on FreeBSD this is so much > >> >>> different. > >> >>> > >> >>> On 12/2/21, Alan Somers wrote: > >> >>> > This is very surprising to me. I never see dd take significant > CPU > >> >>> > consumption until the speed gets up into the GB/s range. What are > >> you > >> >>> > using for the bs= option? If you set that too low, or use the > >> >>> > default, it will needlessly consume extra CPU and IOPs. I usually > >> set > >> >>> > it to 1m for this kind of usage. And what kind of HDDs are these, > >> >>> > connected to what kind of controller? > >> >>> > > >> >>> > On Thu, Dec 2, 2021 at 9:54 AM Stefan Blachmann < > >> sblachmann@gmail.com> > >> >>> > wrote: > >> >>> >> > >> >>> >> Regarding the suggestions to either improve or replace the ULE > >> >>> >> scheduler, I would like to share another observation. > >> >>> >> > >> >>> >> Usually when I need to zero out HDDs using dd, I use a live > Linux. > >> >>> >> This time I did that on FreeBSD (13). > >> >>> >> My observations: > >> >>> >> - On the same hardware, the data transfer rate is a small > fraction > >> >>> >> (about 1/4th) of which is achieved by Linux. > >> >>> >> - The first dd process, which erases the first HDD, gets almost > >> >>> >> all > >> >>> >> CPU and I/O time. The second process which does the second HDD is > >> >>> >> getting starved. It actually really starts only after the first > >> >>> >> one > >> >>> >> finished. > >> >>> >> > >> >>> >> To me it was *very* surprising to find out that, while erasing > two > >> >>> >> similar HDDs concurrently takes about one day on Linux, on > >> >>> >> FreeBSD, > >> >>> >> the first HDD was finished after three days, and only after that > >> >>> >> the > >> >>> >> remaining second dd process got the same CPU time, making it > >> >>> >> proceed > >> >>> >> fast instead of creepingly slowly. > >> >>> >> > >> >>> >> So I guess this might be a scheduler issue. > >> >>> >> I certainly will do some tests using the old scheduler when I got > >> >>> >> time. > >> >>> >> And, I ask myself: > >> >>> >> Could it be a good idea to sponsor porting the Dragonfly > scheduler > >> to > >> >>> >> FreeBSD? > >> >>> >> > >> >>> >> On 12/2/21, Johannes Totz wrote: > >> >>> >> > On 29/11/2021 03:17, Ed Maste wrote: > >> >>> >> >> On Sun, 28 Nov 2021 at 19:37, Steve Kargl > >> >>> >> >> wrote: > >> >>> >> >>> > >> >>> >> >>> It's certainly not the latest and greatest, > >> >>> >> >>> CPU: Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz > >> (1995.04-MHz > >> >>> >> >>> K8-class CPU) > >> >>> >> >> > >> >>> >> >> If you're content to use a compiler from a package you can > save > >> >>> >> >> a > >> >>> >> >> lot > >> >>> >> >> of time by building with `CROSS_TOOLCHAIN=llvm13` and > >> >>> >> >> `WITHOUT_TOOLCHAIN=yes`. Or, instead of WITHOUT_TOOLCHAIN > >> >>> >> >> perhaps > >> >>> >> >> `WITHOUT_CLANG=yes`, `WITHOUT_LLD=yes` and `WITHOUT_LLDB=yes`. > >> >>> >> > > >> >>> >> > (re-send to list, sorry) > >> >>> >> > Can we disconnect the compiler optimisation flag for base and > >> >>> >> > clang? > >> >>> >> > I > >> >>> >> > don't need the compiler to be build with -O2 but I want the > >> >>> >> > resulting > >> >>> >> > base system to have optimisations enabled. > >> >>> >> > Right now, looks like both get -O2 and a lot of time is spent > on > >> >>> >> > optimising the compiler (for no good reason). > >> >>> >> > > >> >>> >> > > >> >>> >> > >> >>> > > >> >> > >> > > >> > >> > > > --000000000000e7abd205d314e168--