From owner-freebsd-bugs@freebsd.org Thu Aug 3 04:46:22 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4554BDC96C7 for ; Thu, 3 Aug 2017 04:46:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 262F866482 for ; Thu, 3 Aug 2017 04:46:22 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v734kKOF085545 for ; Thu, 3 Aug 2017 04:46:22 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 219399] System panics after several hours of 14-threads-compilation orgies using poudriere on AMD Ryzen... Date: Thu, 03 Aug 2017 04:46:20 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.1-RELEASE X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: truckman@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 03 Aug 2017 04:46:22 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D219399 --- Comment #202 from Don Lewis --- (In reply to Nils Beyer from comment #200) I believe so. It's pretty unlikely that the problem is caused by undefined opcodes, and we are not seeing any evidence (SIGILL) of valid instructions being trapped as invalid because they experience page faults mid-fetch. BTW, using either my origin workaround patch, or the committed version if t= he sv_maxuser adjustment is commented out, it is possible to use a user proces= s to mmap() the top page of user memory, load some code up there, and execute it= for testing purposes. I've done some experiments with that and it is possible = to quickly hang the machine or cause it to reboot. The interesting thing is t= hat I haven't observed any ill effects as long as no instructions are executed above 0x7fffffffff40. That's sort of in the area mentioned in the Dragonfly fix, but even they saw issues at addresses lower than that and a decreasing rate as the address was lowered. Our signal trampoline code was much close= r to the bottom of the page at 0x7ffffffff000, so at this point I don't know why= we were having problems. The only thing that I can think of is that the signal trampoline code uses some unusual instructions like syscall and hlt, which = are unlike the more vanilla instructions that I was using in my experiments. --=20 You are receiving this mail because: You are the assignee for the bug.=