From owner-freebsd-bugs@freebsd.org Mon Aug 21 20:25:07 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3F3D7DCF74B for ; Mon, 21 Aug 2017 20:25:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2CC6D836FC for ; Mon, 21 Aug 2017 20:25:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v7LKP79w064856 for ; Mon, 21 Aug 2017 20:25:07 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 221029] AMD Ryzen: strange compilation failures using poudriere or plain buildkernel/buildworld Date: Mon, 21 Aug 2017 20:25:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.1-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: truckman@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Aug 2017 20:25:07 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D221029 --- Comment #87 from Don Lewis --- I set affinity back to its default value of 1 and got another clean 1700 po= rt poudriere run. It's curious that the only issues I've had when steal_idle= =3D0 and balance=3D0 happened when I set affinity=3D1000. This is the opposite = of what I would expect. I would expect that migrations controlled by the steal_idle and balance kno= bs to have similar issues. In either case, the thread that is getting migrate= d is one that was preempted by an interrupt, and before being resumed, the sched= uler noticed that the thread had exhausted its run time quantum and moved the th= read to the back of the run queue for that cpu before resuming the thread that i= s at the front of the run queue. The only difference between steal_idle and bal= ance is the event that actually causes the thread to migrate. When they restart, they basically just execute the kernel code to restore their state before dropping back into user mode where they were preempted from. For some reaso= n, threads that have exhausted their time quantum seem to resume properly on t= he same CPU that they were previously running, but sometimes go wonky if they resume on some other CPU. The migrations controlled by the affinity knob are different. In those cas= es, the thread has voluntarily put itself to sleep, either because it blocked i= n a syscall, or perhaps trap on a page fault and then go to sleep in the kernel while the missing page is brought in. When these threads get a wakeup even= t, they then execute the remaining part of the syscall or the page fault handl= er before returning to user mode. It doesn't seem to matter what CPU these threads restart on. As a test, I set balance=3D1 and reduced balance_interval from its default = 127 to 10 so that balance events would happen a lot more frequently to try to make= up for the steal_idle being disabled. I had three port build failures. The f= irst was a guile segfault when building finance/gnucash. The second was a unit = test failure in editors/openoffice-devel. The third was build runaway in devel/doxygen. The steal_idle code in sched_ule is topology-aware, so it looks like it sho= uld be easy to hack the code to only allow migrations between SMT threads shari= ng the same core, or cores in the same CCX. --=20 You are receiving this mail because: You are the assignee for the bug.=