From owner-svn-src-head@freebsd.org Wed Jun 14 23:14:45 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 651D3BF5CA3 for ; Wed, 14 Jun 2017 23:14:45 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-16.reflexion.net [208.70.210.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0BFE37C4DA for ; Wed, 14 Jun 2017 23:14:44 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 30874 invoked from network); 14 Jun 2017 23:12:03 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 14 Jun 2017 23:12:03 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Wed, 14 Jun 2017 19:08:03 -0400 (EDT) Received: (qmail 18168 invoked from network); 14 Jun 2017 23:08:03 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Jun 2017 23:08:03 -0000 Received: from [192.168.1.114] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id C408CEC938A; Wed, 14 Jun 2017 16:08:02 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: svn commit: r319722 - in head: sys/cam/ctl sys/dev/iscsi sys/kern sys/netgraph sys/netgraph/bluetooth/socket sys/netinet sys/ofed/drivers/infiniband/core sys/ofed/drivers/infiniband/ulp/sdp sys/rpc... Message-Id: Date: Wed, 14 Jun 2017 16:08:02 -0700 To: andreast@FreeBSD.org, svn-src-head@freebsd.org X-Mailer: Apple Mail (2.3273) X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 14 Jun 2017 23:14:45 -0000 Andreas Tobler andreast at FreeBSD.org wrote on Wed Jun 14 08:00:03 UTC 2017: > Hi Gleb, > > with this revision I get either a kernel panic or a hang. This happens > on powerpc (32-bit). The powerpc64 looks stable. > > Here you can see the backtrace in case of the panic: > > https://people.freebsd.org/~andreast/r319722_ppc32_1.jpg > > > In the source code I see a comment with XXXGL... > Is this powerpc specific or do you think that there are some issues in > the uipc_socket.c code? I'm not so sure that the specific change in question will turn out to be the cause. Below is about why I say that: similar problems back in the likes of -r317820 and before. (I'd frozen at -r317820 for weeks. That is why I've no claims about later.) TARGET=powerpc TARGET_ARCH=powerpc context. . . (Not observed anywhere else. Also only being used on a old PowerMac G5 so-called "Quad Core".) I've spent weeks trying to get evidence of crashes that include jumps to non-code (and so illegal instructions and such). This would happen if busy or if sitting idle. Usually taking hours to happen but could happen in minutes after booting. This goes back to -r317820 where I finally froze the status for a while to focus on attempted problem isolation or at least evidence. It goes back farther as well but most of my effort was on -r317820. I found that the results were very memory layout dependent. Inserting: void HACKISH_EXTRA_CODE {} into any one of a variety of source files would change the resultant behavior. (No calles to the routine but externally accessible so not eliminate by the tool chain.) Adding any code to detect a observed failure earlier also changed the type of failure seen, making the change not directly effective. In some cases the result was that I was not able to identify a problem as happening even with waiting well over 24 hours. (Longest time to observed failure: 11 hours. A couple around 8. The rest under 7). But something still might have been trashed, just with less obvious consequences. In other cases other addressing errors occurred or other out of bounds accesses occurred or locks would spin too long or . . . You probably get the idea. All my effort basically only seemed to show one thing: occasionally something stomps on register values. It almost has to be some interrupt activity that does not restore context correctly. But I never found anything that I could identify as evidence of the prior interrupt that might have happened. I was completely unable to come up with any useful identification of what specific code was doing that trashing. I recently gave up and am starting to work on taking the machines that I have access to past -r317820. That will eventually include TARGET_ARCH=powerpc . Note: I eventually modified the kernel to prevent execution of most kernel pages that are from loading the file that also have no code in the page. So this was PowerMac G5 specific but at least prevented executing most potential garbage and should catch jumping out of code areas more reliably and sooner. (Not that it got me the answer I was looking for.) === Mark Millard markmi at dsl-only.net