From owner-freebsd-stable@freebsd.org Mon Mar 15 22:54:59 2021 Return-Path: Delivered-To: freebsd-stable@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id D2AB356AD84 for ; Mon, 15 Mar 2021 22:54:59 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic303-25.consmr.mail.gq1.yahoo.com (sonic303-25.consmr.mail.gq1.yahoo.com [98.137.64.206]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4DzsFH3Qc1z4ZM5 for ; Mon, 15 Mar 2021 22:54:58 +0000 (UTC) (envelope-from marklmi@yahoo.com) X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1615848897; bh=WcJSlW5veV0YwIl2A5DuGZK0llrTP7oz9bEeWjM5WKA=; h=X-Sonic-MF:Subject:From:Date:To:From:Subject; b=jkih8rstrP+DPncCWkgCvz8jcb6k+BypcJb9mvZDVR4Mx+HS4f+Bo9+N69/llCWN15wUz/fLYf8vJ1AhHXNP8Y8J2nZfb+uuLVMf6lfUAV6GQfbGRuB+TKLPsmhCjoxFW9NwWALmkrhai/I1HyBl9JCMdCAKmXHlauKdjflJmycOpDTM3YJ7fznZ5DIpnP2Szt5v4K9DTqsoco5chymtfZnmpZjBO4zjMSUZw2gZf2YK1zyo9nxhSWT5yzKo2JQq3QQeq6HgDPZhxi4cQT/kllFWsOgqRikV9Uw6qoHkzHDr7Dh/xgQOHyK8p7HhTLcZsoD6zKrr3A9WSk0whIjk3w== X-YMail-OSG: fIP3zaIVM1ned7w5eLGCL6LY0Khar3bncROfRtMwANgtk2oCX8DLitEwSR7Sf06 izQgWRAv7p0IQJM5SRVTZmRsEzewuF8DLRXz_U7.9.Ao25FXMUZ1GHDBK0J1dQX_o5t6DAxA84yt N.VQKj_bK_WlVeNjKTJBUJ5Aug8KJQSsRx0G4UQh8wh.BXBIBSfwkh73FnabbQs7SKi9HkVWehOX LbN9bFMXBglzRwZ63E2tDN0yaEq9cBSOAp8sxAOgRd1yeJ9R6A1skIpg4v4DQReuwrtlxhBaqWlK nt6AuDRAHfcbVnX9t7IiZHd9AVdqXF2qlXTx7KNgcI2BjURbIElSn_m8FyiMWMdQBSaPoHSwIj1T c.d_GBCZIUBCnbQHhihILNUPXixl.PsTqIkUesVmhgCfYNl5CnKHUS497hSg43wdtXpUbgFe9cL4 okL59C0.tfLUdJ1zr1.RRgZAgk.4Wd5jPMpjrkjTXd5QrU5TXZapQ5LP7ESqmbO66oEF0L1f3qLe SM1MVK.Vdxzn.xalFSzTA3xG6CirK3nHw744oIytEnanEE3Rl16ye_._9O93uVGvceicdjZZm0sQ B7fApADqR7deisKsZKecZod4iSLW6dJiaAyQXDXhSRk7PMJveNaH8l6vADD8fwD4Zt1K383uu3ji f_tvwYcWD_xuCbq.WFgwCdHLIujrSLzw2yUp1Kod8MGjHXPd1bbyRh2sGqFM.2yANgjEKodWUoNY Ww8ohhS5VrL.WMQwJYOkE25ayOI7O1xuoSVCqEjX6T.pKt6WVhf2Gq6w3qSl2DL0rtLd_H9vJssg 8iQQmDllAJv4izW.BTpTJQY8pDyjPj6eM5JfLCXewxI9GzWgDXkaidazA91ichoQtxJPQ7zBPOJ8 kHdbaQODIrU.mfpFn6XH8CX5BMSG0y29d8zZ_vuHDsVkqcTl.1TN8LcTEtdvXq0yNlinCpdO1XZT 04SYNW8VismaTUXPS7s.6a0yYZ4n3FwV1E2vglqRaZZ41PuuTe7Un7FekhN8VHO9tt506W29gcGq DC1D00oe8j397oNlVJTOgtwVSp0egG1RdnChT9c1u6tm77CjwlS7sy.nZTyUf2WYRtrPbiAEw1qv iGi5wDwAiRE_z6Qk0SYz1xIDi_8pRtE3RHV2rxv3.GJ1.oWxxxkY8PQqfvl19izcjLe4nmxxQYOg XL5sGgZubyFVPC53GzA2.YszwP4qY_PhvePlYxv8xGvlcJtxlRzdAZjjmHSWfD0zr4kIQLQl1vdR g6vRV16sE9Ea6s_GndHYEVPJypmxfN0HDcEQ.PRmbJeA4HkxZLiMUO_SizzvZgQMFYi7H_ginTKp iEVXdq665eyZjmaKutIjr_H0HLkkjT7_03etl1TiXRAzvkK.h8LpS5MWi2dTFccUpPJETZ6rk1jr Ip4x5168f9UafZ65vBvA6lSvUPqzpRuInjxF8TsP_XSwPIfQ8CznzxqpiBuWhYxgnS68y.33490B 63s1JHu1xipLxXFjAHbmwbIYOL2O_YlelX6fMVzyaIdAXmwg11Ep9W1Xkh5M37sKkDfPBy4kNXtR Xx4Wut83aQWbQHZbumT5Zwm5D_15SbyQEs.zfXNZS9CxCxfoedJZuTNvFw13.UJG71SuncbUwlvl nZLm_OV3QKM.g.n0aBLKLEHninj._IQkn1ztOIJlgfppFymPc9ISqJgxOdSgXM7K1l7o3IDj3hsn 1f7lJRB_8KqE2f8IrbcFuTia5qa6oA98fSD5lxoGCdjv1mZb_ce6uanpiqBX2TND_fUI3CSV9Xpx 4sXYyPlpIXXxDdjo1MPvENp70Wczn.Romlad23L22qfnKizdt51Fli1YM7eOcfAtxbhhjM4bK1DP EH.Zxvj5ZBqQzo0s7vv2PDMs6m_SZ2wwnBKRF411Ez0Krg2.hYCwLUmKI1K0sCiKtjCALJqNv7d3 bzcMMD3Ka4_zYrhd_cGEIG3qrEzLDbiv.ppa_1ke8yN5l6H.jCvEtroXqynrwdZ4P6S4por5IS.q 0x9bwGlH42uVN.luZAWHEeHylwPPZ16D1hzAiYz4jPvO1KOPaeFMID28Q7S8Bd7233QLkbOd44VA 9lx_TL5mvgeoM2jjAwY3cQoXE8bQabw6lEHkefB0i_TGe3QQXA6_b8VqKIprpdYn8KLs.jjCMFx2 9RqE4LwFn3bWgRtYXEWMmSdyIzZEr4AAhY8EAQZM8FMULjdJxpm382RvLT6llEe9OyQjuKVLw0vs fQYbs.iSulUe3feDMylCFUlTXpzwUwXD_2YyrheiOo.J7DrriybuQeFwPZRkp52d2Xz6MJzj31gK DswzO3N.hQHXhpeLpwD4xyDamYUTm_OPri36RH6iM0kp9fcHaV.A4WFxB.A.mxpFK5.1mJtd2H9e 0Ozshno7XsuZc8sEHOjVwXI2Vqtd9K8O39j3sYH7NqIAjq7BVM1ubeQhNW8ZqojfVBKzakonrW6j scgeAN2q86fQy0Qzhqqm_g8eTt0OjKXKUyl1kVjwvQNAUZLmbCRbpopFLD8zHg0PgMye1V34EYjl rlUq_k8iPekcHvRLzd.Z0FzhDlqYYeJaF5YBf7hOWGJTH_CVY5yFR4_6r02qAMD3Hl4zNJRCNhD_ M6dc3cirkUtiFDmOCRSr3HN7J49vkX3N6Ms_CQORyinAZHMj5wHYMBTemx4hqKIG1E3iM5H6mrhF 2O7lK7Nt9q3o0DFD5ci2TasOkG4hCd8zfyBNn_kPwTld30R_8Gy0L_zUeqnlS01KsfpzR_ukLJQ- - X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic303.consmr.mail.gq1.yahoo.com with HTTP; Mon, 15 Mar 2021 22:54:57 +0000 Received: by smtp421.mail.ne1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 48b1efc5d9231444b6b61852b155321c; Mon, 15 Mar 2021 22:54:54 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Subject: Re: Filesystem operations slower in 13.0 than 12.2 From: Mark Millard In-Reply-To: Date: Mon, 15 Mar 2021 15:54:51 -0700 Cc: Warner Losh , Konstantin Belousov , FreeBSD-STABLE Mailing List Content-Transfer-Encoding: quoted-printable Message-Id: References: <12705C29-53EA-4484-8291-C409AF4B3DE5.ref@yahoo.com> <12705C29-53EA-4484-8291-C409AF4B3DE5@yahoo.com> To: Kevin Oberman X-Mailer: Apple Mail (2.3654.60.0.2.21) X-Rspamd-Queue-Id: 4DzsFH3Qc1z4ZM5 X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Mar 2021 22:54:59 -0000 On 2021-Mar-15, at 14:57, Kevin Oberman wrote: > Responses in-line. >=20 > On Sun, Mar 14, 2021 at 3:09 PM Mark Millard = wrote: >=20 >> On 2021-Mar-14, at 11:09, Kevin Oberman = wrote: >>=20 >> > . . . >> > =20 >> > Seems to only occur on large r/w operations from/to the same disk. = "sp >> > big-file /other/file/on/same/disk" or tar/untar operations on large = files. >> > Hit this today updating firefox. >> >=20 >> > I/O starts at >40MB/s. Dropped to about 1.5MB/s. If I tried doing = other >> > things while it was running slowly, the disk would appear to lock = up. E.g. >> > pwd(1) seemed to completely lock up the system, but I could still = ping it >> > and, after about 30 seconds, things came back to life. It was also = not >> > instantaneous. Disc activity dropped to <1MB/s for a few seconds = before >> > everything froze. >> >=20 >> > During the untar of firefox, I saw; this several times. I also = looked at my >> > console where I found these errors during : >> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 55043, size: = 8192 >> > swap_pager: indefinite wait buffer: bufobj: 0, blkno: 51572, size: = 4096 >>=20 >> Does anyone know: >> Are those messages normal "reading is taking a rather long >> time" notices or is their presence more useful information >> in some way about the type of problem or context for the >> problem? >>=20 > As for the tests: > Are these messages always present when near a time frame > when the problem occurs? Never present in a near time > frame to a period when the problem does not occur? > In a large number of test, these errors have not repeated. They baffle = me for another reason. This system has 20G or RAM. Tyically, all swap = space is unused. ATM I see 16384M free out of 16384. Not sure that I = have ever seen it used, though it might have been while building rust. I = have not built rust for a month. >=20 > It appears that the messages are associated with reading > the disk(s), not directly with writing them, where the > reads take more than "hz * 20" time units to complete. > (I'm looking at main (14) code.) What might contribute > to the time taken for the pending read(s)? > The reference to hz * 20 woke up a few sleeping memory cells. I forgot = that I cleaned up my loader.conf. It was largely a copy of the one on my = decade-old T520. I commented out "kern.hz=3D100". I don't recall the = details, but I think it was actually from an even older system, my T42 = from before I retired. >=20 > In any case, restoring this setting has greatly improved the = situation. I now have really bad disk I/O performance on large disk to = disk activity (untarring the firefox distro) instead of terrible = performance and the system freezes have vanished, though I do see pauses = in response to clicks or text entry, but the display remains active and = the pauses are short... 1 to 15 seconds, I'd guess. No, I have no idea = what this indicates. Interesting. > I'm still not seeing the performance I was seeing back in February = when 40 MB/s for extended intervals was common and I once untarred = firefox.tar.gz2 in under a minute and performance seldom dropped below = 1.4 MB/s. > =20 >>> /* >>> * swap_pager_getpages() - bring pages in from swap >>> * >>> * Attempt to page in the pages in array "ma" of length = "count". The >>> * caller may optionally specify that additional pages = preceding and >>> * succeeding the specified range be paged in. The number of = such pages >>> * is returned in the "rbehind" and "rahead" parameters, and = they will >>> * be in the inactive queue upon return. >>> * >>> * The pages in "ma" must be busied and will remain busied upon = return. >>> */ >>> static int >>> swap_pager_getpages_locked(vm_object_t object, vm_page_t *ma, int = count, >>> int *rbehind, int *rahead) >>> { >>> . . . >>> /* >>> * Wait for the pages we want to complete. VPO_SWAPINPROG = is always >>> * cleared on completion. If an I/O error occurs, = SWAPBLK_NONE >>> * is set in the metadata for each page in the request. >>> */ >>> VM_OBJECT_WLOCK(object); >>> /* This could be implemented more efficiently with aflags */ >>> while ((ma[0]->oflags & VPO_SWAPINPROG) !=3D 0) { >>> ma[0]->oflags |=3D VPO_SWAPSLEEP; >>> VM_CNT_INC(v_intrans); >>> if (VM_OBJECT_SLEEP(object, &object->handle, PSWP, >>> "swread", hz * 20)) { >>> printf( >>> "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: = %ld\n", >>> bp->b_bufobj, (intmax_t)bp->b_blkno, = bp->b_bcount); >>> } >>> } >>> VM_OBJECT_WUNLOCK(object); >>> . . . >>>=20 >>> where: >>>=20 >>> #define VM_OBJECT_SLEEP(object, wchan, pri, wmesg, timo) = \ >>> rw_sleep((wchan), &(object)->lock, (pri), (wmesg), (timo)) >>>=20 >>> and: >>>=20 >>> #define rw_sleep(chan, rw, pri, wmesg, timo) = \ >>> _sleep((chan), &(rw)->lock_object, (pri), (wmesg), = \ >>> tick_sbt * (timo), 0, C_HARDCLOCK) >>>=20 >>> (I do not claim to be able to interpret the implications >>> of the code that leads to the messages. But seeing some >>> of the code might prompt a thought by someone that knows >>> the code's context and operation.) >>>=20 >>> > . . . >>> > Backing off to Mar. 4 was not an improvement. My untar did seem = better for a couple of minutes, but then the display froze again for 30 = seconds and disk performance dropped to <1M. >>=20 >> You were able to see the disk performance drop while >> the display was frozen? >>=20 >=20 > No, but it refreshed the display when it un-froze and the refreshed = display showed a one-minute history that showed that data was still = transferring during the pause. >=20 >> It might not be the best for monitoring but I'll ask >> this in terms of top output: Does Inact, Laundry, >> Wired, Free, or other such show anything fairly unique >> for around the problematical time frame(s)? > =20 > These all look pretty normal. Free memory stays at 13G throughout hte = operatioin.=20 >=20 >>> > then things got really bad and behaved in a manner that was = baffling to me. The screen froze again, but stayed frozen after half a = minute. I clicked on a couple of buttons in Firefox to no effect and = then hit ctrl-q to quit. After the long pause, I pressed the power = button to try to force a shutdown. Suddenly, it started unwinding = everything I had done during the freeze. My browser did the updates from = my mouse clicks including quitting. It then switched to a different = workspace from ctrl-alt-right and did a clean shutdown. ????=20 >>> >=20 >>> > Do I also have a graphics issue? Examining log files show no = indication that anything was happening. SMART shows no errors and = reasonable values for everything. No indication of a HW problem. The = system performs well unless I do something that tries a bulk disk data = move. Building world takes about 75 minutes. I just have a very hard = time building big ports. >>=20 >> Almost like things were stuck-sleeping and then the >> sleep(s) finished? > Exactly!=20 Multi-socket (and possibly multi-core) PowerMacs have not historically had the times used for controlling sleeping that can be used across FreeBSD's cpus well matched in any FreeBSD build without extra patches. They suffer threads being stuck-sleeping for periods not intended. This leads to fans running wild and obvious problems during shutdown having timeouts, though there are more consequences than are so obvious. That is where I got the idea for the question: some similarity to operational problems that I've seen when not using my patches that provide a work around matching the times better in my contexts. (I'm told the type of issue is not limited to PowerMacs, but PowerMacs are the only PowerPCish machines I've had access to. Doing the most accurate time matching gets into platform specific operations, no general solution for such accuracy. Similarly, only platform specifics might scale to lots of sockets/cores well, even without trying to be as well matched. My workaround is generic to the range of PowerMacs that I've had access to but is not as accurate about matching the times.) For your context: how many sockets? Cores per socket? Any other information that might be relevant to matching times across sockets/cores? I suppose that the board matters, not just the processor(s) in the sockets. But what all would be appropriate information? I do not know. I'm not sure if the kern.hz=3D100 results fit with this idea or not. (Such was never involved in my PowerMac experiments.) It is only somewhat suggestive evidence as stands. But time mismatches across socket/cores might be a direction for investigation? (Not that I've a great idea for how to investigate such.) =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)