From owner-svn-src-all@freebsd.org Mon Feb 6 18:38:29 2017 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 57AE3CD3543; Mon, 6 Feb 2017 18:38:29 +0000 (UTC) (envelope-from lidl@pix.net) Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254::4]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.pix.net", Issuer "Pix.Com Technologies LLC CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 17339A63; Mon, 6 Feb 2017 18:38:29 +0000 (UTC) (envelope-from lidl@pix.net) Received: from torb.pix.net (torb.pix.net [IPv6:2001:470:e254:10:1042:6a31:1deb:9f8a]) (authenticated bits=0) by hydra.pix.net (8.16.0.19/8.15.2) with ESMTPA id v16IcLRR098302; Mon, 6 Feb 2017 13:38:21 -0500 (EST) (envelope-from lidl@pix.net) Subject: Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include To: Jason Harmening , Svatopluk Kraus References: <201702010332.v113WnYf041362@repo.freebsd.org> <20170203231238.0675c289@kan> <8523aaa5-6c30-9f9f-40f0-fdf82cdf1669@pix.net> <6bf86e46-9714-c7e9-8d47-845761e2de24@FreeBSD.org> <8a2f7f7d-14c3-8e75-e060-fc41213ce389@FreeBSD.org> Cc: Andreas Tobler , Alexander Kabaev , "Jason A. Harmening" , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, Ed Maste , Justin Hibbits From: Kurt Lidl Message-ID: <5fcc9f37-3c0e-2685-2779-75b2a7566599@pix.net> Date: Mon, 6 Feb 2017 13:38:21 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Mon, 06 Feb 2017 18:56:30 +0000 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Feb 2017 18:38:29 -0000 On 2/5/17 1:59 PM, Jason Harmening wrote: > Actually attaching the patch this time (**** gmail client) > > On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening > > wrote: > > Hmm, it's a good idea to consider the possibility of a barrier > issue. It wouldn't be the first time we've had such a problem on a > weakly-ordered architecture. That said, I don't see a problem in > this case. smp_rendezvous_cpus() takes a spinlock and then issues > atomic_store_rel_int() to ensure the rendezvous params are visible > to other cpus. The latter corresponds to lwsync on powerpc, which > AFAIK should be sufficient to ensure visibility of prior stores. > > For now I'm going with the simpler explanation that I made a bad > assumption in the powerpc get_pcpu() and there is some context in > which the read of sprg0 doesn't return a consistent pointer value. > Unfortunately I don't see where that might be right now. > > On the mips side, Kurt/Alexander can you test the attached patch? > It contains a simple fix to ensure get_pcpu() returns the consistent > per-cpu pointer. I applied this patch on top of r313347 (which I had verified that a kernel built from that revisions to boot from successfully). The kernel from r313347+(this patch) least gets to multi-user on my ERL. So, that's a big improvement. I'll start a native buildworld/buildkernel on the ERL, and that ought to give it a reasonable workout. -Kurt > > On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus > wrote: > > Probably not related. But when I took short look to the patch to see > what could go wrong, I walked into the following comment in > _rm_wlock(): "Assumes rm->rm_writecpus update is visible on > other CPUs > before rm_cleanIPI is called." There is no explicit barrier to > ensure > it. However, there might be some barriers inside of > smp_rendezvous_cpus(). I have no idea what could happened if this > assumption is not met. Note that rm_cleanIPI() is affected by the > patch. > > > > On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening > > > wrote: > > Can you post an example of such panic? Only 2 MI pieces were > changed, > > netisr and rmlock. I haven't seen problems on my own > amd64/i386/arm testing > > of this, so a backtrace might help to narrow down the cause. > > > > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler > > > > wrote: > >> > >> On 04.02.17 20:54, Jason Harmening wrote: > >>> > >>> I suspect this broke rmlocks for mips because the rmlock > implementation > >>> takes the address of the per-CPU pc_rm_queue when building > tracker > >>> lists. That address may be later accessed from another CPU > and will > >>> then translate to the wrong physical region if the address > was taken > >>> relative to the globally-constant pcpup VA used on mips. > >>> > >>> Regardless, for mips get_pcpup() should be implemented as > >>> pcpu_find(curcpu) since returning an address that may mean > something > >>> different depending on the CPU seems like a big POLA > violation if > >>> nothing else. > >>> > >>> I'm more concerned about the report of powerpc breakage. > For powerpc we > >>> simply take each pcpu pointer from the pc_allcpu list (which > is the same > >>> value stored in the cpuid_to_pcpu array) and pass it through > the ap_pcpu > >>> global to each AP's startup code, which then stores it in > sprg0. It > >>> should be globally unique and won't have the > variable-translation issues > >>> seen on mips. Andreas, are you certain this change was > responsible the > >>> breakage you saw, and was it the same sort of hang observed > on mips? > >> > >> > >> I'm really sure. 313036 booted fine, allowed me to execute heavy > >> compilation jobs, np. 313037 on the other side gave me > various patterns of > >> panics. During startup, but I also succeeded to get into > multiuser and then > >> the panic happend during port building. > >> > >> I have no deeper inside where pcpu data is used. Justin > mentioned netisr? > >> > >> Andreas > >> > > > > >