From owner-freebsd-stable@freebsd.org  Mon Mar 15 21:58:00 2021
Return-Path: <owner-freebsd-stable@freebsd.org>
Delivered-To: freebsd-stable@mailman.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.nyi.freebsd.org (Postfix) with ESMTP id 638615698DC
 for <freebsd-stable@mailman.nyi.freebsd.org>;
 Mon, 15 Mar 2021 21:58:00 +0000 (UTC)
 (envelope-from kob6558@gmail.com)
Received: from mail-oi1-x231.google.com (mail-oi1-x231.google.com
 [IPv6:2607:f8b0:4864:20::231])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
 client-signature RSA-PSS (2048 bits) client-digest SHA256)
 (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 4DzqzX27P8z4WT8;
 Mon, 15 Mar 2021 21:57:59 +0000 (UTC)
 (envelope-from kob6558@gmail.com)
Received: by mail-oi1-x231.google.com with SMTP id d16so26751271oic.0;
 Mon, 15 Mar 2021 14:57:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=FxN5eXlmu0JdVUsn0x5OJ0pOkVxu7o9h28jAEPTMdsA=;
 b=R5VbbBvqebz0JaQqrxWLibMpVAnXC2Ngt8plwQLSqU/pfUQzcPJ5Eh+GcNHxrCgzLx
 2FEZh1XnAyPwXluYp4whGG2fex4OzTfN3nywFqGtJZXXDyzGQRGiSazs3B5zTGyxmDOR
 XfvsdCbcMezgURmPNGlj1r5ZIip6/CdDxOPMaotHpbWJUj/LEt4WHT9DTlge67EqyvCS
 lrjqYtUelLgZjqe4+be8kgPhxNpKqkChBlAq67aV5j6j4d0RgBtgRMRoZ8VJsbEyHAnj
 kXEdqGa/u9H7psGw73jU4TLsS4dhhiKBTQDBb3aZdV7bJVXnX7OeoT/4pfMCoYcY5DiY
 52Ew==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=FxN5eXlmu0JdVUsn0x5OJ0pOkVxu7o9h28jAEPTMdsA=;
 b=GjrIXzwpsGZozIiO6+5heqmTvPAw5/0UXmyHj1578J0Pw09Kckkxd/GSWEzJcta8vB
 zcD+vzcomlCeddaJw5IxIqfVuqVhghayEXFIuPqcH/RMzInedHUyIAQ/eFk1vXTds3ye
 w2UCurhJ4njmmTuw1qXgQ87QlpH+IsupsnpZmOYlOAQG4E/KL2nEDBdXduYjLnb4T+00
 FmxwTEOiF6r83KsdlI2B8OZKWFMlXHFAWIi7Rz+mX4u1xQcj7aHDCd1Xn9/QjtQcyt18
 IHfMO0Ghfb9myEKVe4AflnlULGqdY9axFvf/QgC7SEgTqFZOygNVnudFDwv6aa33JHtr
 xF3A==
X-Gm-Message-State: AOAM531jH3ZqMgxLC5X0jBjZuuacNXoHHyvpkDUMQTxuTmqUEM7N3BVR
 ed4yy+iKRpe4tn2kdTlvOlH5kaFWB354jF7KvYY=
X-Google-Smtp-Source: ABdhPJwfVFceZ7LLOwbeGm+KvFez4quCYbizBPZ8K8k4QYlWvGr49ytM/6IXJBSu6878clKA+t1RjkhOJTPvGXBQkgQ=
X-Received: by 2002:aca:ac04:: with SMTP id v4mr921987oie.57.1615845479014;
 Mon, 15 Mar 2021 14:57:59 -0700 (PDT)
MIME-Version: 1.0
References: <12705C29-53EA-4484-8291-C409AF4B3DE5.ref@yahoo.com>
 <12705C29-53EA-4484-8291-C409AF4B3DE5@yahoo.com>
 <CAN6yY1tT+joi=eyqmSPYS3apSy3-6WVM13z+ifEzCzqqHY6oLA@mail.gmail.com>
 <CANCZdfpkXNUcDyLHXufM3qAVbaBV7RW8Oh6bHCQzv3+rafHssg@mail.gmail.com>
 <CAN6yY1sd-7CzGczu_HK2Q8WbUoDOfGSS2+b-Sr08EBMuke=3Bw@mail.gmail.com>
 <CANCZdfqYb8VdNbk1tdXQPpJGEaMCmGsuFcj7BR5Q3FiOd7Osag@mail.gmail.com>
 <CAN6yY1uc4CEv66h=rqaOne0saPEBPFaOkWWq9s5qMNu4SMzA8A@mail.gmail.com>
 <CAN6yY1uu=sAKFSYHW1gcKFxE3yD7-1JgL0N5j1iQsAViQ80g1w@mail.gmail.com>
 <F94A457A-D179-4BA7-A8A6-82609777EBFA@yahoo.com>
In-Reply-To: <F94A457A-D179-4BA7-A8A6-82609777EBFA@yahoo.com>
From: Kevin Oberman <rkoberman@gmail.com>
Date: Mon, 15 Mar 2021 14:57:42 -0700
Message-ID: <CAN6yY1tJXGCUK+COoDO_9vGVcseogD+CrsdpMH9r=hvDVD3uiQ@mail.gmail.com>
Subject: Re: Filesystem operations slower in 13.0 than 12.2
To: Mark Millard <marklmi@yahoo.com>
Cc: Warner Losh <imp@bsdimp.com>, Konstantin Belousov <kib@freebsd.org>, 
 FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org>
X-Rspamd-Queue-Id: 4DzqzX27P8z4WT8
X-Spamd-Bar: ----
Authentication-Results: mx1.freebsd.org;
	none
X-Spamd-Result: default: False [-4.00 / 15.00];
	 REPLY(-4.00)[]
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.34
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-stable>, 
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Mar 2021 21:58:00 -0000

Responses in-line.

On Sun, Mar 14, 2021 at 3:09 PM Mark Millard <marklmi@yahoo.com> wrote:

>
>
> On 2021-Mar-14, at 11:09, Kevin Oberman <rkoberman at gmail.com> wrote:
>
> > . . .
> >
> > Seems to only occur on large r/w operations from/to the same disk. "sp
> > big-file /other/file/on/same/disk" or tar/untar operations on large
> files.
> > Hit this today updating firefox.
> >
> > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other
> > things while it was running slowly, the disk would appear to lock up.
> E.g.
> > pwd(1) seemed to completely lock up the system, but I could still ping it
> > and, after about 30 seconds, things came back to life. It was also not
> > instantaneous. Disc activity dropped to <1MB/s for a few seconds before
> > everything froze.
> >
> > During the untar of firefox, I saw; this several times. I also looked at
> my
> > console where I found these errors during :
> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192
> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096
>
> Does anyone know:
> Are those messages normal "reading is taking a rather long
> time" notices or is their presence more useful information
> in some way about the type of problem or context for the
> problem?
>
> As for the tests:
> Are these messages always present when near a time frame
> when the problem occurs? Never present in a near time
> frame to a period when the problem does not occur?
>
In a large number of test, these errors have not repeated. They baffle me
for another reason. This system has 20G or RAM. Tyically, all swap space is
unused. ATM I see 16384M free out of 16384. Not sure that I have ever seen
it used, though it might have been while building rust. I have not built
rust for a month.

>
> It appears that the messages are associated with reading
> the disk(s), not directly with writing them, where the
> reads take more than "hz * 20" time units to complete.
> (I'm looking at main (14) code.) What might contribute
> to the time taken for the pending read(s)?
>
The reference to hz * 20 woke up a few sleeping memory cells. I forgot that
I cleaned up my loader.conf. It was largely a copy of the one on my
decade-old T520. I commented out "kern.hz=100". I don't recall the details,
but I think it was actually from an even older system, my T42 from before I
retired.

In any case, restoring this setting has greatly improved the situation. I
now have really bad disk I/O performance on large disk to disk activity
(untarring the firefox distro) instead of terrible performance and the
system freezes have vanished, though I do see pauses in response to clicks
or text entry, but the display remains active and the pauses are short... 1
to 15 seconds, I'd guess. No, I have no idea what this indicates.

I'm still not seeing the performance I was seeing back in February when 40
MB/s for extended intervals was common and I once untarred firefox.tar.gz2
in under a minute and performance seldom dropped below 1.4 MB/s.


>
> /*
>  * swap_pager_getpages() - bring pages in from swap
>  *
>  *      Attempt to page in the pages in array "ma" of length "count".  The
>  *      caller may optionally specify that additional pages preceding and
>  *      succeeding the specified range be paged in.  The number of such
> pages
>  *      is returned in the "rbehind" and "rahead" parameters, and they will
>  *      be in the inactive queue upon return.
>  *
>  *      The pages in "ma" must be busied and will remain busied upon
> return.
>  */
> static int
> swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count,
>     int *rbehind, int *rahead)
> {
> . . .
>         /*
>          * Wait for the pages we want to complete.  VPO_SWAPINPROG is
> always
>          * cleared on completion.  If an I/O error occurs, SWAPBLK_NONE
>          * is set in the metadata for each page in the request.
>          */
>         VM_OBJECT_WLOCK(object);
>         /* This could be implemented more efficiently with aflags */
>         while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) {
>                 ma[0]->oflags |= VPO_SWAPSLEEP;
>                 VM_CNT_INC(v_intrans);
>                 if (VM_OBJECT_SLEEP(object, &object->handle, PSWP,
>                     "swread", hz * 20)) {
>                         printf(
> "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n",
>                             bp->b_bufobj, (intmax_t)bp->b_blkno,
> bp->b_bcount);
>                 }
>         }
>         VM_OBJECT_WUNLOCK(object);
> . . .
>
> where:
>
> #define VM_OBJECT_SLEEP(object, wchan, pri, wmesg, timo)                \
>         rw_sleep((wchan), &(object)->lock, (pri), (wmesg), (timo))
>
> and:
>
> #define rw_sleep(chan, rw, pri, wmesg, timo)                            \
>         _sleep((chan), &(rw)->lock_object, (pri), (wmesg),              \
>             tick_sbt * (timo), 0, C_HARDCLOCK)
>
> (I do not claim to be able to interpret the implications
> of the code that leads to the messages. But seeing some
> of the code might prompt a thought by someone that knows
> the code's context and operation.)
>
> > . . .
> > Backing off to Mar. 4 was not an improvement. My untar did seem better
> for a couple of minutes, but then the display froze again for 30 seconds
> and disk performance dropped to <1M.
>
> You were able to see the disk performance drop while
> the display was frozen?
>

No, but it refreshed the display when it un-froze and the refreshed display
showed a one-minute history that showed that data was still transferring
during the pause.

>
> It might not be the best for monitoring but I'll ask
> this in terms of top output: Does Inact, Laundry,
> Wired, Free, or other such show anything fairly unique
> for around the problematical time frame(s)?
>

These all look pretty normal. Free memory stays at 13G throughout hte
operatioin.

>
> > then things got really bad and behaved in a manner that was baffling to
> me. The screen froze again, but stayed frozen after half a minute. I
> clicked on a couple of buttons in Firefox to no effect and then hit ctrl-q
> to quit. After the long pause, I pressed the power button to try to force a
> shutdown. Suddenly, it started unwinding everything I had done during the
> freeze. My browser did the updates from my mouse clicks including quitting.
> It then switched to a different workspace from ctrl-alt-right and did a
> clean shutdown. ????
> >
> > Do I also have a graphics issue? Examining log files show no indication
> that anything was happening. SMART shows no errors and reasonable values
> for everything. No indication of a HW problem. The system performs well
> unless I do something that tries a bulk disk data move. Building world
> takes about 75 minutes. I just have a very hard time building big ports.
>
> Almost like things were stuck-sleeping and then the
> sleep(s) finished?
>
Exactly!


>
> ===
> Mark Millard
> marklmi at yahoo.com
> ( dsl-only.net went
> away in early 2018-Mar)
>
--
Kevin Oberman, Part time kid herder and retired Network Engineer
E-mail: rkoberman@gmail.com
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683