Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 6 Feb 2017 13:38:21 -0500
From:      Kurt Lidl <lidl@pix.net>
To:        Jason Harmening <jason.harmening@gmail.com>, Svatopluk Kraus <onwahe@gmail.com>
Cc:        Andreas Tobler <andreast@freebsd.org>, Alexander Kabaev <kabaev@gmail.com>, "Jason A. Harmening" <jah@freebsd.org>, src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, Ed Maste <emaste@freebsd.org>, Justin Hibbits <jrh29@alumni.cwru.edu>
Subject:   Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include
Message-ID:  <5fcc9f37-3c0e-2685-2779-75b2a7566599@pix.net>
In-Reply-To: <CAM=8qanyz08tAHvo7q2Dxt=Q6jWKqD53iF82wtFtN86fMXctQA@mail.gmail.com>
References:  <201702010332.v113WnYf041362@repo.freebsd.org> <20170203231238.0675c289@kan> <CAM=8qa=h3=sfro02hCQyzqkDnLO7TnQJ8ugCoa5=MfNE_OCgZg@mail.gmail.com> <8523aaa5-6c30-9f9f-40f0-fdf82cdf1669@pix.net> <CAM=8qa=YB_kBsFRS%2Ba7kR%2BZTkyjmRnwY7r9vfZ48SyyZ2BZddg@mail.gmail.com> <6bf86e46-9714-c7e9-8d47-845761e2de24@FreeBSD.org> <CAM=8qa=dEfeV%2BBLr7g7-9YiSP9uGyxzkzP1NEEz4xZ6TDUJ2HA@mail.gmail.com> <8a2f7f7d-14c3-8e75-e060-fc41213ce389@FreeBSD.org> <CAM=8qa=2rpHi8Jdbv23KN70M1CVx=gGO54fvTk38uYP9iEqa2g@mail.gmail.com> <CAFHCsPXdRya9xVgoi-DfuzutYZ-LdpntFy9t7GbBY1KvQZxdpA@mail.gmail.com> <CAM=8qakqOjn0x%2BFwLGG8qrAbksQ_VR4547_vRV%2Bqso%2Bu=Cdf1A@mail.gmail.com> <CAM=8qanyz08tAHvo7q2Dxt=Q6jWKqD53iF82wtFtN86fMXctQA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 2/5/17 1:59 PM, Jason Harmening wrote:
> Actually attaching the patch this time (**** gmail client)
>
> On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening
> <jason.harmening@gmail.com <mailto:jason.harmening@gmail.com>> wrote:
>
>     Hmm, it's a good idea to consider the possibility of a barrier
>     issue.  It wouldn't be the first time we've had such a problem on a
>     weakly-ordered architecture. That said, I don't see a problem in
>     this case.  smp_rendezvous_cpus() takes a spinlock and then issues
>     atomic_store_rel_int()  to ensure the rendezvous params are visible
>     to other cpus.  The latter corresponds to lwsync on powerpc, which
>     AFAIK should be sufficient to ensure visibility of prior stores.
>
>     For now I'm going with the simpler explanation that I made a bad
>     assumption  in the powerpc get_pcpu() and there is some context in
>     which the read of sprg0 doesn't return a consistent pointer value.
>     Unfortunately I don't see where that might be right now.
>
>     On the mips side, Kurt/Alexander can you test the attached patch?
>     It contains a simple fix to ensure get_pcpu() returns the consistent
>     per-cpu pointer.

I applied this patch on top of r313347 (which I had verified that a
kernel built from that revisions to boot from successfully).
The kernel from r313347+(this patch) least gets to multi-user on my ERL.

So, that's a big improvement.

I'll start a native buildworld/buildkernel on the ERL, and that ought
to give it a reasonable workout.

-Kurt

>
>     On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus <onwahe@gmail.com
>     <mailto:onwahe@gmail.com>> wrote:
>
>         Probably not related. But when I took short look to the patch to see
>         what could go wrong, I walked into the following comment in
>         _rm_wlock(): "Assumes rm->rm_writecpus update is visible on
>         other CPUs
>         before rm_cleanIPI is called." There is no explicit barrier to
>         ensure
>         it. However, there might be some barriers inside of
>         smp_rendezvous_cpus(). I have no idea what could happened if this
>         assumption is not met. Note that rm_cleanIPI() is affected by the
>         patch.
>
>
>
>         On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
>         <jason.harmening@gmail.com <mailto:jason.harmening@gmail.com>>
>         wrote:
>         > Can you post an example of such panic?  Only 2 MI pieces were
>         changed,
>         > netisr and rmlock.  I haven't seen problems on my own
>         amd64/i386/arm testing
>         > of this, so a backtrace might help to narrow down the cause.
>         >
>         > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler
>         <andreast@freebsd.org <mailto:andreast@freebsd.org>>
>         > wrote:
>         >>
>         >> On 04.02.17 20:54, Jason Harmening wrote:
>         >>>
>         >>> I suspect this broke rmlocks for mips because the rmlock
>         implementation
>         >>> takes the address of the per-CPU pc_rm_queue when building
>         tracker
>         >>> lists.  That address may be later accessed from another CPU
>         and will
>         >>> then translate to the wrong physical region if the address
>         was taken
>         >>> relative to the globally-constant pcpup VA used on mips.
>         >>>
>         >>> Regardless, for mips get_pcpup() should be implemented as
>         >>> pcpu_find(curcpu) since returning an address that may mean
>         something
>         >>> different depending on the CPU seems like a big POLA
>         violation if
>         >>> nothing else.
>         >>>
>         >>> I'm more concerned about the report of powerpc breakage.
>         For powerpc we
>         >>> simply take each pcpu pointer from the pc_allcpu list (which
>         is the same
>         >>> value stored in the cpuid_to_pcpu array) and pass it through
>         the ap_pcpu
>         >>> global to each AP's startup code, which then stores it in
>         sprg0.  It
>         >>> should be globally unique and won't have the
>         variable-translation issues
>         >>> seen on mips.   Andreas, are you certain this change was
>         responsible the
>         >>> breakage you saw, and was it the same sort of hang observed
>         on mips?
>         >>
>         >>
>         >> I'm really sure. 313036 booted fine, allowed me to execute heavy
>         >> compilation jobs, np. 313037 on the other side gave me
>         various patterns of
>         >> panics. During startup, but I also succeeded to get into
>         multiuser and then
>         >> the panic happend during port building.
>         >>
>         >> I have no deeper inside where pcpu data is used. Justin
>         mentioned netisr?
>         >>
>         >> Andreas
>         >>
>         >
>
>
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5fcc9f37-3c0e-2685-2779-75b2a7566599>