From nobody Tue Feb 8 05:42:18 2022 X-Original-To: bugs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 9200119B40EF for ; Tue, 8 Feb 2022 05:42:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JtBjR2RZDz3PSW for ; Tue, 8 Feb 2022 05:42:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 34F7613DAE for ; Tue, 8 Feb 2022 05:42:19 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 2185gJNu040313 for ; Tue, 8 Feb 2022 05:42:19 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 2185gJZL040312 for bugs@FreeBSD.org; Tue, 8 Feb 2022 05:42:19 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 261690] NFSv4 mount on Linux client hangs during complex access patterns (gcc bootstrapping on client) Date: Tue, 08 Feb 2022 05:42:18 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 13.0-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: kumba@gentoo.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Bug reports List-Archive: https://lists.freebsd.org/archives/freebsd-bugs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-bugs@freebsd.org MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1644298939; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=krUbxfqWUMSyq1zweXuEuD8WibfaEh23epQeHqaaeqc=; b=i0AaPt8gTy40jB99PGfhiBLu34EL7HphlC8lVU0sG5b4praQyp9NkXjm8HblojXdK6HhZl GYHm/smpuCnjGz14TIHG039GFqzSWYutVx807M8DQ0SH23Ix7LTVgEy2y7jUNZRDbCBzy+ 4mYPD5s4f9eTPfuSTO3MBQma/8+Vy37dvTZ03Cd21iSfvc3u9B3VorxU25caFIa9PXgjOO r8cTy/x7qZGfbHFu7L40djfLPRpnHqCCRvNeH1Q9ohkbRRWYp3ArHAUWBhljpddX95Ih2D IwhHP84VfiBqr22DsHb8qs+OwxpKoB4w7vPNK8B1DnRWT5HTbRGeZF4OklLx4w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1644298939; a=rsa-sha256; cv=none; b=KXHYBosdFnyXd9Qk9qRnmsp0Z35ZPSAaNB6sro4Z0cSrF7X7MBiTjFJ7NoxktZNPgAucL5 tMP2nsrQmEsDRbBRN2lc0mzeYYhSMkqKv3ONN00ndlQtQmOd2gC7yBBVtvPtWvZmqYfn3c l08/L4s4fKvoVzGjDOT5NXaEqBl/Bl4mbZJ3LxWzr2r0jp6aSjHf0QgOf+V3BOz055VYxG HxZcB1FbZr8TluTVrH7P2Gf82AejOyC7uovPhC57Jnqp4+fDwgAfacjmroU6hCKNay0Lz0 ipNElzgPKfI4BSIzNmgoxmlXWwvOrSCCKX1LqiD3/ddbdbU/KVQOm1yaqr41OQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D261690 --- Comment #2 from Joshua Kinard --- This is a curiously apt find. I've got two old MIPS systems, same machine types (SGI Octanes), running a Linux-5.4 LTS kernel w/ custom patches and a Gentoo userland, which mount the Gentoo "portage" tree (similar to Ports) a= nd several other common folders over NFS4.2. The NFS server is my NAS running FreeBSD 13.0-RELEASE-p7 and the exports are on a ZFS file system. The FreeBSD kernel on the NAS box is a custom configuration and has these patches from Phab applied on top: - https://reviews.freebsd.org/D18985 - https://reviews.freebsd.org/D29088 - https://reviews.freebsd.org/D29315 - https://reviews.freebsd.org/D29772 - https://reviews.freebsd.org/D29838 - https://reviews.freebsd.org/D30318 - https://reviews.freebsd.org/D32724 As well as these patches listed from Bugzilla PRs (some cherry-picked before their bugs were fixed and closed): - Bug #254560 - Bug #254590 - Bug #256280 - Bug #260293 - Bug #260375 - Bug #260884 When either of those two MIPS machines build gcc-11.2.1 snapshots, or even sometimes recent glibc, there's been reasonable probability that they will crash with an "Oops" (type of Linux kernel panic). The cause of the oops is tangential to the issue as far as I can tell, because what really happens is cc1plus will hit an invalid memory access attempt while compiling, which triggers the Linux/mips page faulting code on the first CPU, and while that goes on, the second CPU tries running an interrupt handler to update process ticks, which causes the kernel to attempt to dereference a NULL and thus, o= ops the machine. The machine isn't totally dead, though, which is pretty weird, cause an oops usually kills Linux. The machine will still respond to pings (intermittently) and SSH sessions will remain connected, just not respond to commands. For the last few weeks, I have been scratching my head at the oops data, and nothing makes sense about it. This bug report, however, does. Or, at leas= t, it's the best find I've come across so far. Many of the characteristics described in the original report match my setup (FreeBSD 13 on the NFS serv= er, Linux 5.4.x on the client, NFS4.x mounts, compiling gcc/cc1plus, dead-endin= g in the Linux kernel scheduler path, etc). I am currently trying to port the MIPS kernel for these machines up to the = next Linux LTS release (5.10) to see if that changes anything. I tried running the Perl script on the MIPS machine, and on the first run, = it triggered a page fault and threw a SIGSEGV due to memory exhaustion (2GB RA= M in each machine). But in multiple subsequent runs, the Perl script finished (I think), when it stopped at 226 threads before claiming it was out of memory= .=20 Could not get the machine to oops in the same way gcc/cc1plus does. Thing is, I've been running this kind of a setup for well over a year. The= two MIPS machines have been on a 5.4 kernel for at least the last six months, a= nd up until about three weeks ago, all seemed fine. Which kinda suggests the fault may really be on the Linux-side of things. I'll also add that unlike the original report, both MIPS machines run the actual gcc compile on a folder on the local disk via a bind mount. During = the compile, there shouldn't be a whole lot of NFS chatter because the way Port= age works, all of the needed package data gets saved to the build directory on = the local disk. But I can't rule out that something is still slightly wacky wi= th periodic NFS commands between the client and the server causing an issue wh= ile the machine is under stress compiling gcc. I will have to go back through recent 5.4 stable releases and look for any recent commits for Linux NFS4.x client code to see if that could explain things. But I figured I'd describe my scenario here as well in case it off= ers any clues. --=20 You are receiving this mail because: You are the assignee for the bug.=