Date: Mon, 15 Mar 2021 14:57:42 -0700 From: Kevin Oberman <rkoberman@gmail.com> To: Mark Millard <marklmi@yahoo.com> Cc: Warner Losh <imp@bsdimp.com>, Konstantin Belousov <kib@freebsd.org>, FreeBSD-STABLE Mailing List <freebsd-stable@freebsd.org> Subject: Re: Filesystem operations slower in 13.0 than 12.2 Message-ID: <CAN6yY1tJXGCUK%2BCOoDO_9vGVcseogD%2BCrsdpMH9r=hvDVD3uiQ@mail.gmail.com> In-Reply-To: <F94A457A-D179-4BA7-A8A6-82609777EBFA@yahoo.com> References: <12705C29-53EA-4484-8291-C409AF4B3DE5.ref@yahoo.com> <12705C29-53EA-4484-8291-C409AF4B3DE5@yahoo.com> <CAN6yY1tT%2Bjoi=eyqmSPYS3apSy3-6WVM13z%2BifEzCzqqHY6oLA@mail.gmail.com> <CANCZdfpkXNUcDyLHXufM3qAVbaBV7RW8Oh6bHCQzv3%2BrafHssg@mail.gmail.com> <CAN6yY1sd-7CzGczu_HK2Q8WbUoDOfGSS2%2Bb-Sr08EBMuke=3Bw@mail.gmail.com> <CANCZdfqYb8VdNbk1tdXQPpJGEaMCmGsuFcj7BR5Q3FiOd7Osag@mail.gmail.com> <CAN6yY1uc4CEv66h=rqaOne0saPEBPFaOkWWq9s5qMNu4SMzA8A@mail.gmail.com> <CAN6yY1uu=sAKFSYHW1gcKFxE3yD7-1JgL0N5j1iQsAViQ80g1w@mail.gmail.com> <F94A457A-D179-4BA7-A8A6-82609777EBFA@yahoo.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Responses in-line. On Sun, Mar 14, 2021 at 3:09 PM Mark Millard <marklmi@yahoo.com> wrote: > > > On 2021-Mar-14, at 11:09, Kevin Oberman <rkoberman at gmail.com> wrote: > > > . . . > > > > Seems to only occur on large r/w operations from/to the same disk. "sp > > big-file /other/file/on/same/disk" or tar/untar operations on large > files. > > Hit this today updating firefox. > > > > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing other > > things while it was running slowly, the disk would appear to lock up. > E.g. > > pwd(1) seemed to completely lock up the system, but I could still ping it > > and, after about 30 seconds, things came back to life. It was also not > > instantaneous. Disc activity dropped to <1MB/s for a few seconds before > > everything froze. > > > > During the untar of firefox, I saw; this several times. I also looked at > my > > console where I found these errors during : > > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: 8192 > > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: 4096 > > Does anyone know: > Are those messages normal "reading is taking a rather long > time" notices or is their presence more useful information > in some way about the type of problem or context for the > problem? > > As for the tests: > Are these messages always present when near a time frame > when the problem occurs? Never present in a near time > frame to a period when the problem does not occur? > In a large number of test, these errors have not repeated. They baffle me for another reason. This system has 20G or RAM. Tyically, all swap space is unused. ATM I see 16384M free out of 16384. Not sure that I have ever seen it used, though it might have been while building rust. I have not built rust for a month. > > It appears that the messages are associated with reading > the disk(s), not directly with writing them, where the > reads take more than "hz * 20" time units to complete. > (I'm looking at main (14) code.) What might contribute > to the time taken for the pending read(s)? > The reference to hz * 20 woke up a few sleeping memory cells. I forgot that I cleaned up my loader.conf. It was largely a copy of the one on my decade-old T520. I commented out "kern.hz=100". I don't recall the details, but I think it was actually from an even older system, my T42 from before I retired. In any case, restoring this setting has greatly improved the situation. I now have really bad disk I/O performance on large disk to disk activity (untarring the firefox distro) instead of terrible performance and the system freezes have vanished, though I do see pauses in response to clicks or text entry, but the display remains active and the pauses are short... 1 to 15 seconds, I'd guess. No, I have no idea what this indicates. I'm still not seeing the performance I was seeing back in February when 40 MB/s for extended intervals was common and I once untarred firefox.tar.gz2 in under a minute and performance seldom dropped below 1.4 MB/s. > > /* > * swap_pager_getpages() - bring pages in from swap > * > * Attempt to page in the pages in array "ma" of length "count". The > * caller may optionally specify that additional pages preceding and > * succeeding the specified range be paged in. The number of such > pages > * is returned in the "rbehind" and "rahead" parameters, and they will > * be in the inactive queue upon return. > * > * The pages in "ma" must be busied and will remain busied upon > return. > */ > static int > swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int count, > int *rbehind, int *rahead) > { > . . . > /* > * Wait for the pages we want to complete. VPO_SWAPINPROG is > always > * cleared on completion. If an I/O error occurs, SWAPBLK_NONE > * is set in the metadata for each page in the request. > */ > VM_OBJECT_WLOCK(object); > /* This could be implemented more efficiently with aflags */ > while ((ma[0]->oflags & VPO_SWAPINPROG) != 0) { > ma[0]->oflags |= VPO_SWAPSLEEP; > VM_CNT_INC(v_intrans); > if (VM_OBJECT_SLEEP(object, &object->handle, PSWP, > "swread", hz * 20)) { > printf( > "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", > bp->b_bufobj, (intmax_t)bp->b_blkno, > bp->b_bcount); > } > } > VM_OBJECT_WUNLOCK(object); > . . . > > where: > > #define VM_OBJECT_SLEEP(object, wchan, pri, wmesg, timo) \ > rw_sleep((wchan), &(object)->lock, (pri), (wmesg), (timo)) > > and: > > #define rw_sleep(chan, rw, pri, wmesg, timo) \ > _sleep((chan), &(rw)->lock_object, (pri), (wmesg), \ > tick_sbt * (timo), 0, C_HARDCLOCK) > > (I do not claim to be able to interpret the implications > of the code that leads to the messages. But seeing some > of the code might prompt a thought by someone that knows > the code's context and operation.) > > > . . . > > Backing off to Mar. 4 was not an improvement. My untar did seem better > for a couple of minutes, but then the display froze again for 30 seconds > and disk performance dropped to <1M. > > You were able to see the disk performance drop while > the display was frozen? > No, but it refreshed the display when it un-froze and the refreshed display showed a one-minute history that showed that data was still transferring during the pause. > > It might not be the best for monitoring but I'll ask > this in terms of top output: Does Inact, Laundry, > Wired, Free, or other such show anything fairly unique > for around the problematical time frame(s)? > These all look pretty normal. Free memory stays at 13G throughout hte operatioin. > > > then things got really bad and behaved in a manner that was baffling to > me. The screen froze again, but stayed frozen after half a minute. I > clicked on a couple of buttons in Firefox to no effect and then hit ctrl-q > to quit. After the long pause, I pressed the power button to try to force a > shutdown. Suddenly, it started unwinding everything I had done during the > freeze. My browser did the updates from my mouse clicks including quitting. > It then switched to a different workspace from ctrl-alt-right and did a > clean shutdown. ???? > > > > Do I also have a graphics issue? Examining log files show no indication > that anything was happening. SMART shows no errors and reasonable values > for everything. No indication of a HW problem. The system performs well > unless I do something that tries a bulk disk data move. Building world > takes about 75 minutes. I just have a very hard time building big ports. > > Almost like things were stuck-sleeping and then the > sleep(s) finished? > Exactly! > > === > Mark Millard > marklmi at yahoo.com > ( dsl-only.net went > away in early 2018-Mar) > -- Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkoberman@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAN6yY1tJXGCUK%2BCOoDO_9vGVcseogD%2BCrsdpMH9r=hvDVD3uiQ>