From owner-freebsd-stable@freebsd.org Mon Mar 15 21:58:00 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 638615698DC for ; Mon, 15 Mar 2021 21:58:00 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com [IPv6:2607:f8b0:4864:20::231]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4DzqzX27P8z4WT8; Mon, 15 Mar 2021 21:57:59 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: by mail-oi1-x231.google.com with SMTP id d16so26751271oic.0; Mon, 15 Mar 2021 14:57:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FxN5eXlmu0JdVUsn0x5OJ0pOkVxu7o9h28jAEPTMdsA=; b=R5VbbBvqebz0JaQqrxWLibMpVAnXC2Ngt8plwQLSqU/pfUQzcPJ5Eh+GcNHxrCgzLx 2FEZh1XnAyPwXluYp4whGG2fex4OzTfN3nywFqGtJZXXDyzGQRGiSazs3B5zTGyxmDOR XfvsdCbcMezgURmPNGlj1r5ZIip6/CdDxOPMaotHpbWJUj/LEt4WHT9DTlge67EqyvCS lrjqYtUelLgZjqe4+be8kgPhxNpKqkChBlAq67aV5j6j4d0RgBtgRMRoZ8VJsbEyHAnj kXEdqGa/u9H7psGw73jU4TLsS4dhhiKBTQDBb3aZdV7bJVXnX7OeoT/4pfMCoYcY5DiY 52Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FxN5eXlmu0JdVUsn0x5OJ0pOkVxu7o9h28jAEPTMdsA=; b=GjrIXzwpsGZozIiO6+5heqmTvPAw5/0UXmyHj1578J0Pw09Kckkxd/GSWEzJcta8vB zcD+vzcomlCeddaJw5IxIqfVuqVhghayEXFIuPqcH/RMzInedHUyIAQ/eFk1vXTds3ye w2UCurhJ4njmmTuw1qXgQ87QlpH+IsupsnpZmOYlOAQG4E/KL2nEDBdXduYjLnb4T+00 FmxwTEOiF6r83KsdlI2B8OZKWFMlXHFAWIi7Rz+mX4u1xQcj7aHDCd1Xn9/QjtQcyt18 IHfMO0Ghfb9myEKVe4AflnlULGqdY9axFvf/QgC7SEgTqFZOygNVnudFDwv6aa33JHtr xF3A== X-Gm-Message-State: AOAM531jH3ZqMgxLC5X0jBjZuuacNXoHHyvpkDUMQTxuTmqUEM7N3BVR ed4yy+iKRpe4tn2kdTlvOlH5kaFWB354jF7KvYY= X-Google-Smtp-Source: ABdhPJwfVFceZ7LLOwbeGm+KvFez4quCYbizBPZ8K8k4QYlWvGr49ytM/6IXJBSu6878clKA+t1RjkhOJTPvGXBQkgQ= X-Received: by 2002:aca:ac04:: with SMTP id v4mr921987oie.57.1615845479014; Mon, 15 Mar 2021 14:57:59 -0700 (PDT) MIME-Version: 1.0 References: <12705C29-53EA-4484-8291-C409AF4B3DE5.ref@yahoo.com> <12705C29-53EA-4484-8291-C409AF4B3DE5@yahoo.com> In-Reply-To: From: Kevin Oberman Date: Mon, 15 Mar 2021 14:57:42 -0700 Message-ID: Subject: Re: Filesystem operations slower in 13.0 than 12.2 To: Mark Millard Cc: Warner Losh , Konstantin Belousov , FreeBSD-STABLE Mailing List X-Rspamd-Queue-Id: 4DzqzX27P8z4WT8 X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.34 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2021 21:58:00 -0000 Responses in-line. On Sun, Mar 14, 2021 at 3:09 PM Mark Millard wrote: > > > On 2021-Mar-14, at 11:09, Kevin Oberman wrote: > > > . . . > > > > Seems to only occur on large r/w operations from/to the same disk. "sp > > big-file /other/file/on/same/disk" or tar/untar operations on large > files. > > Hit this today updating firefox. > > > > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other > > things while it was running slowly, the disk would appear to lock up. > E.g. > > pwd(1) seemed to completely lock up the system, but I could still ping it > > and, after about 30 seconds, things came back to life. It was also not > > instantaneous. Disc activity dropped to <1MB/s for a few seconds before > > everything froze. > > > > During the untar of firefox, I saw; this several times. I also looked at > my > > console where I found these errors during : > > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192 > > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096 > > Does anyone know: > Are those messages normal "reading is taking a rather long > time" notices or is their presence more useful information > in some way about the type of problem or context for the > problem? > > As for the tests: > Are these messages always present when near a time frame > when the problem occurs? Never present in a near time > frame to a period when the problem does not occur? > In a large number of test, these errors have not repeated. They baffle me for another reason. This system has 20G or RAM. Tyically, all swap space is unused. ATM I see 16384M free out of 16384. Not sure that I have ever seen it used, though it might have been while building rust. I have not built rust for a month. > > It appears that the messages are associated with reading > the disk(s), not directly with writing them, where the > reads take more than "hz * 20" time units to complete. > (I'm looking at main (14) code.) What might contribute > to the time taken for the pending read(s)? > The reference to hz * 20 woke up a few sleeping memory cells. I forgot that I cleaned up my loader.conf. It was largely a copy of the one on my decade-old T520. I commented out "kern.hz=100". I don't recall the details, but I think it was actually from an even older system, my T42 from before I retired. In any case, restoring this setting has greatly improved the situation. I now have really bad disk I/O performance on large disk to disk activity (untarring the firefox distro) instead of terrible performance and the system freezes have vanished, though I do see pauses in response to clicks or text entry, but the display remains active and the pauses are short... 1 to 15 seconds, I'd guess. No, I have no idea what this indicates. I'm still not seeing the performance I was seeing back in February when 40 MB/s for extended intervals was common and I once untarred firefox.tar.gz2 in under a minute and performance seldom dropped below 1.4 MB/s. > > /* > * swap_pager_getpages() - bring pages in from swap > * > * Attempt to page in the pages in array "ma" of length "count". The > * caller may optionally specify that additional pages preceding and > * succeeding the specified range be paged in. The number of such > pages > * is returned in the "rbehind" and "rahead" parameters, and they will > * be in the inactive queue upon return. > * > * The pages in "ma" must be busied and will remain busied upon > return. > */ > static int > swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count, > int *rbehind, int *rahead) > { > . . . > /* > * Wait for the pages we want to complete. VPO_SWAPINPROG is > always > * cleared on completion. If an I/O error occurs, SWAPBLK_NONE > * is set in the metadata for each page in the request. > */ > VM_OBJECT_WLOCK(object); > /* This could be implemented more efficiently with aflags */ > while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) { > ma[0]->oflags |= VPO_SWAPSLEEP; > VM_CNT_INC(v_intrans); > if (VM_OBJECT_SLEEP(object, &object->handle, PSWP, > "swread", hz * 20)) { > printf( > "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", > bp->b_bufobj, (intmax_t)bp->b_blkno, > bp->b_bcount); > } > } > VM_OBJECT_WUNLOCK(object); > . . . > > where: > > #define VM_OBJECT_SLEEP(object, wchan, pri, wmesg, timo) \ > rw_sleep((wchan), &(object)->lock, (pri), (wmesg), (timo)) > > and: > > #define rw_sleep(chan, rw, pri, wmesg, timo) \ > _sleep((chan), &(rw)->lock_object, (pri), (wmesg), \ > tick_sbt * (timo), 0, C_HARDCLOCK) > > (I do not claim to be able to interpret the implications > of the code that leads to the messages. But seeing some > of the code might prompt a thought by someone that knows > the code's context and operation.) > > > . . . > > Backing off to Mar. 4 was not an improvement. My untar did seem better > for a couple of minutes, but then the display froze again for 30 seconds > and disk performance dropped to <1M. > > You were able to see the disk performance drop while > the display was frozen? > No, but it refreshed the display when it un-froze and the refreshed display showed a one-minute history that showed that data was still transferring during the pause. > > It might not be the best for monitoring but I'll ask > this in terms of top output: Does Inact, Laundry, > Wired, Free, or other such show anything fairly unique > for around the problematical time frame(s)? > These all look pretty normal. Free memory stays at 13G throughout hte operatioin. > > > then things got really bad and behaved in a manner that was baffling to > me. The screen froze again, but stayed frozen after half a minute. I > clicked on a couple of buttons in Firefox to no effect and then hit ctrl-q > to quit. After the long pause, I pressed the power button to try to force a > shutdown. Suddenly, it started unwinding everything I had done during the > freeze. My browser did the updates from my mouse clicks including quitting. > It then switched to a different workspace from ctrl-alt-right and did a > clean shutdown. ???? > > > > Do I also have a graphics issue? Examining log files show no indication > that anything was happening. SMART shows no errors and reasonable values > for everything. No indication of a HW problem. The system performs well > unless I do something that tries a bulk disk data move. Building world > takes about 75 minutes. I just have a very hard time building big ports. > > Almost like things were stuck-sleeping and then the > sleep(s) finished? > Exactly! > > === > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) > -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkoberman@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683