From owner-svn-src-head@freebsd.org Sun Feb 5 18:58:15 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ADEF1CD2706; Sun, 5 Feb 2017 18:58:15 +0000 (UTC) (envelope-from jason.harmening@gmail.com) Received: from mail-oi0-x242.google.com (mail-oi0-x242.google.com [IPv6:2607:f8b0:4003:c06::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 68FE21C81; Sun, 5 Feb 2017 18:58:15 +0000 (UTC) (envelope-from jason.harmening@gmail.com) Received: by mail-oi0-x242.google.com with SMTP id u143so5053956oif.3; Sun, 05 Feb 2017 10:58:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=/1eObKWATfOUpgTvXWKEc3/BB+8Y12ZOszjRmSkZdfo=; b=TrEB6dv8I8lt1jGz4ScmO1EIC24I/q++QgPk1SrcqXnpDOy/ORlUH+rbDQbn4nFhcW exzl8RtdZEbYsl9/oJQy0d3fz5gnymYLcVeWWT9VDPFviZXqdwMAH3o3vssOX6WhNfMM sRJOba/Gu1mZ3vY17I8BlGTLY6vJAnToow8d8Cy6KQ4bXLYkHXJ1Mp5Vk80vHBw0eb31 sN7e0x7rqyF7qvz/qtODBD8m9OA2ZAh3t9nUjYJWdES1kAJJrPiswkwQvhtFikgzjcha u28fr0ppKpJgd9pk7lUqK1r8AdBPhl01zHPpeJ82ToCxUpfWVOKa2XJrUQrwuCBv1SAP S7DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/1eObKWATfOUpgTvXWKEc3/BB+8Y12ZOszjRmSkZdfo=; b=fBAJvm+q/SGpKgTPwgdzoIxLm5B7FJYUC0kZ4rlJHDD0NuBa49rM0rGFowuh3u+Wx+ Y47rwxilgwYDlex1nskwFiKqSr3PCkQEn8dTSSySPp5f6EPV6YdMyQ6EJu6Ma52HAghb juKvh2PoIog+h/Wwnk2LSKiWvewV8Xbi7LJelDR9dzE07Gg06ovx2KohurOHeLpxM3Uy EcG0ao/ARCqPwVvvQuJ8qB7YjZjlMexP7BuBffu/QY6pM4/fyjzaszwO96s5oDsTx/xq BhaFzxmEuywNl2KsJTNqAEQWNI42YZJekY7hNAdnS4HTPyKjppkKwdYy/zu+vZTz6YMj BxqQ== X-Gm-Message-State: AMke39ltxNpZjDzYadYb4tzt65yDf8ognCGTx+eZ1rGLPkVedP+lfW2lgwEqslu+GswoQnmGz5ndvNIYSoeoBw== X-Received: by 10.202.193.65 with SMTP id r62mr3481370oif.90.1486321094479; Sun, 05 Feb 2017 10:58:14 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.51.7 with HTTP; Sun, 5 Feb 2017 10:58:13 -0800 (PST) In-Reply-To: References: <201702010332.v113WnYf041362@repo.freebsd.org> <20170203231238.0675c289@kan> <8523aaa5-6c30-9f9f-40f0-fdf82cdf1669@pix.net> <6bf86e46-9714-c7e9-8d47-845761e2de24@FreeBSD.org> <8a2f7f7d-14c3-8e75-e060-fc41213ce389@FreeBSD.org> From: Jason Harmening Date: Sun, 5 Feb 2017 10:58:13 -0800 Message-ID: Subject: Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include To: Svatopluk Kraus Cc: Andreas Tobler , Kurt Lidl , Alexander Kabaev , "Jason A. Harmening" , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, Ed Maste , Justin Hibbits X-Mailman-Approved-At: Sun, 05 Feb 2017 21:46:34 +0000 Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Feb 2017 18:58:15 -0000 Hmm, it's a good idea to consider the possibility of a barrier issue. It wouldn't be the first time we've had such a problem on a weakly-ordered architecture. That said, I don't see a problem in this case. smp_rendezvous_cpus() takes a spinlock and then issues atomic_store_rel_int() to ensure the rendezvous params are visible to other cpus. The latter corresponds to lwsync on powerpc, which AFAIK should be sufficient to ensure visibility of prior stores. For now I'm going with the simpler explanation that I made a bad assumption in the powerpc get_pcpu() and there is some context in which the read of sprg0 doesn't return a consistent pointer value. Unfortunately I don't see where that might be right now. On the mips side, Kurt/Alexander can you test the attached patch? It contains a simple fix to ensure get_pcpu() returns the consistent per-cpu pointer. On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus wrote: > Probably not related. But when I took short look to the patch to see > what could go wrong, I walked into the following comment in > _rm_wlock(): "Assumes rm->rm_writecpus update is visible on other CPUs > before rm_cleanIPI is called." There is no explicit barrier to ensure > it. However, there might be some barriers inside of > smp_rendezvous_cpus(). I have no idea what could happened if this > assumption is not met. Note that rm_cleanIPI() is affected by the > patch. > > > > On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening > wrote: > > Can you post an example of such panic? Only 2 MI pieces were changed, > > netisr and rmlock. I haven't seen problems on my own amd64/i386/arm > testing > > of this, so a backtrace might help to narrow down the cause. > > > > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler > > wrote: > >> > >> On 04.02.17 20:54, Jason Harmening wrote: > >>> > >>> I suspect this broke rmlocks for mips because the rmlock implementation > >>> takes the address of the per-CPU pc_rm_queue when building tracker > >>> lists. That address may be later accessed from another CPU and will > >>> then translate to the wrong physical region if the address was taken > >>> relative to the globally-constant pcpup VA used on mips. > >>> > >>> Regardless, for mips get_pcpup() should be implemented as > >>> pcpu_find(curcpu) since returning an address that may mean something > >>> different depending on the CPU seems like a big POLA violation if > >>> nothing else. > >>> > >>> I'm more concerned about the report of powerpc breakage. For powerpc > we > >>> simply take each pcpu pointer from the pc_allcpu list (which is the > same > >>> value stored in the cpuid_to_pcpu array) and pass it through the > ap_pcpu > >>> global to each AP's startup code, which then stores it in sprg0. It > >>> should be globally unique and won't have the variable-translation > issues > >>> seen on mips. Andreas, are you certain this change was responsible > the > >>> breakage you saw, and was it the same sort of hang observed on mips? > >> > >> > >> I'm really sure. 313036 booted fine, allowed me to execute heavy > >> compilation jobs, np. 313037 on the other side gave me various patterns > of > >> panics. During startup, but I also succeeded to get into multiuser and > then > >> the panic happend during port building. > >> > >> I have no deeper inside where pcpu data is used. Justin mentioned > netisr? > >> > >> Andreas > >> > > >