Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 13 Nov 2016 23:31:08 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Warner Losh <wlosh@bsdimp.com>
Cc:        Adrian Chadd <adrian.chadd@gmail.com>, Juli Mallett <jmallett@freebsd.org>, Warner Losh <imp@bsdimp.com>, "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org>
Subject:   Re: svn commit: r307626 - head/sys/ufs/ffs
Message-ID:  <20161113213108.GP54029@kib.kiev.ua>
In-Reply-To: <50FE3B7E-8FA4-47DC-BC45-EE75B9FAFC0F@bsdimp.com>
References:  <20161113071911.GF54029@kib.kiev.ua> <CANCZdfpC6smeNSPKzpbX8aAnF8CZ%2BSEFQmQ74jqvWUVXrttM%2BQ@mail.gmail.com> <20161113075557.GH54029@kib.kiev.ua> <71C512CD-0FB6-40D8-B46C-30467A245693@bsdimp.com> <CACVs6=_zmjJhMzmyFGJGHK1RAguQ_fZUcd94ZEmVEnTXBiOSdQ@mail.gmail.com> <20161113161548.GK54029@kib.kiev.ua> <CAJ-VmomG6e8WVyyuqAkC20fwZ5wX2hnwSE7T4r%2BTSDF%2BOzLCNQ@mail.gmail.com> <CAJ-Vmo=C-pqY4o-r2RXD-m0OxWeDaLpm_zUroeMpwZvLLuuZdg@mail.gmail.com> <20161113190344.GM54029@kib.kiev.ua> <50FE3B7E-8FA4-47DC-BC45-EE75B9FAFC0F@bsdimp.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 13, 2016 at 01:55:54PM -0700, Warner Losh wrote:
> 
> > On Nov 13, 2016, at 12:03 PM, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Sun, Nov 13, 2016 at 10:50:19AM -0800, Adrian Chadd wrote:
> >> Ok, so after talking with others, my questions are:
> >> 
> >> * I thought our VM was supposed to not be doing double mapping like
> >> this. warner's comment on irc was:
> >> 
> >> ===
> >> 13:39 < bsdimp> adrian the VM isn't supposed to do it at all.
> >> 13:39 < bsdimp> adrian that is, double map on purpose.
> >> 13:40 < bsdimp> though there's some exceptions to the rule
> >> 13:40 < bsdimp> since kernel mappings go away in userland, and
> >> userland doesn't execute while you're in kernel mode, you can do the
> >> flushing
> >>                game in busdma to prevent most issues.
> >> 13:41 < bsdimp> which is what we do. Generally, though, our VM doesn't
> >> do it in-kernel.
> >> ===
> >> 
> >> * is this still the case? or are there places in the VM where we are doing this?
> >> * can we introduce a machdep/pmap capability check to see if aliasing
> >> is allowed and if so, turn this feature on?
> > This never was the case, and never will.  VM establishes mappings as
> > requested by the userspace and kernel, and n-times mapping for the same
> > page is always legitimate.  It is the pmap duty to handle that.
> 
> User space N way mapping is rare. Can you show an actual example of when this  is commonly done? Or being done in init(8)?
> 
User space N mapping is very common with e.g. shared libraries.
Writeable n-way mappings are ubiquitously used by X apps in the
form of sysv shared memory segments, by database systems, in particular
postgres, by qt c++ framework, and so on.

If you use vi on some file, nvi mmaps the content. If any other
application reads the file simultaneously with the file being opened in
vi, you get double-mapping by userspace and KVA, with all consistency
problems.

It is rather pointless to argue about the necessity and absolute omnipresent
nature of this feature.  Even without the new UFS pager, apparently the
MIPS port randomly corrupts user data.


> > If you cared to read my previous mail, I already explained that besides
> > the userspace asking for n-mapping, there is at least buffer cache which
> > also maps the same page into KVA, from the times when unified buffer
> > cache/page cache was introduced.  Same is true for n-mapping of shared
> > anon pages.
>
> KVA mapping is different, since that mapping goes away. And generally,
> the KVA mappings aren???t active while user land mappings are active.
> Here ???active??? means can change the cache. When the kernel is
> executing on these single core 32-bit mips boxes, user space isn???t
> creating traffic to the pages. Also, KVA mappings tend to be for pages
> that we don???t actually touch in the kernel once the I/O is complete.
> This is true for data pages in files. It may not be true for in-kernel
> meta-data pages, and that might have changed with your commit (I
> haven???t looked in enough detail to know).
>

This is not how 'mappings' work, at least in FreeBSD.  KVA mappings are not
switched away when CPU enters usermode, and VI caches for KVA indexes
are not flushed on switch to usermode.  So if the page is double-mapped and
the colors of the mappings are different, it does not matter whether it is
kernel or user VA to create problems.

> But that still begs the question: why is the fault address 0 in
> the original crash report. If we???re doing multiple mappings, and
> that???s creating a cache coherency problem because there???s two
> copies of dirty cache data, how does that end up with us doing a NULL
> dereference? That???s the part that I???m having trouble understanding
> and I worry that arguing over the exact parameters of the VM mapping
> issues might be distracting us from a credible theory, with a specific
> sequence of events that gets us here.
>
I can't answer exactly why 0 appear there, but a plausible explanation is
that kernel faults in the pages mapped at some KVA which is not colored
same as the user mapping.

Consider what happen: bootstrap mapped init .text, but the pages are not
resident. On the first touch, the tlb refill exception is reported, and
since mips software pages in pmap are not filled, vm_fault() is called,
which eventually called VOP_GETPAGES(). There, the UFS file buffer is
bread(9). During bread(9), for mips, the buffer is mapped in KVA, and
actual io occurs, which validates the pages.

On one hand, the pages constituing the buffer are mapped into KVA, on
the other hand, they are mapped into UVA, possibly with different color.
The bread(9) operated on KVA, which validated the VIPT cache line for
the KVA color. A cache line for UVA is not filled, and its content is
probably zeroes. After return from vm_fault()->tlb refill, the not
filled UVA line is executed.  Most likely, that line is zeroed and either
some function pointer is loaded as zero, or MIPS ISA executes something
which causes jump to 0.

Additional problem for MIPS pmap is that vfs buffer maps pages non-managed
non-executable, which does not allow the pmap to even try the dcache/icache
consistency trick, but this is secondary.

BTW, when I developed the PCID feature for amd64, similar to ASID feature
of MIPS TLB, inconsistent cache invalidations also often resulted in attempt
to execute @0 or access @0, I believe for similar reasons.

> It looks like we use the direct map in kernel for addresses < 256MB.
> KSEG0 covers a lot of sins.
KSEG0 cannot be used for buffer mappings, since buffers require consequtive
mappings of several non-contiguous pages.  Typical modern UFS buffer is 32k,
meaning b_data is for 8 pages.

And, KSEG0 access is cached with all consequences of having different
color for physical address of the page and its virtual mapping.

>
> NetBSD has code to cope with congruent mappings that cause problems.
> These look to be only on LONGSOON and MIPS3 processors with mips32 and
> newer not affected by it. MIPS3 has VCE exceptions for Instructions
> and Data and NetBSD has code there to cope. It also maps the pages
> uncached until the multiple mappings go away. The other code it has
> to cope is effectively disabled when not on these processors. It is
> a twisty maze of #ifdefs, though, so maybe I???m missing something.
> It does disable the socket page loaning code as well when there???s
> issues.
>
> So, we???re back to the basic question: why does this change cause the
> observed behavior, exactly?
I explained above, how new UFS pager work.  Old BMAP pager used transient
mappings for duration of io, established by pmap_qenter(), and dissolved
by pmap_qremove() before VOP_GETPAGES() returns to vm_fault().  From my
reading of the code, pmap_qremove() flushed dcache always.

>
> >> Adding a pmap capability and turning it on for say,
> >i386/amd64/arm64 > would allow for this new feature as well as the
> >previous behaviour on > older platforms. > > I don't think I have the
> >time to fix mips pmap to support this new > feature, so if you want
> >to turn it on for all features, we should > really fix/test pmap on
> >said platforms first. This is not a new feature, this is the old bug
> >on that platform.
>
> This sort of argument is pointless: the code does something new, and
> the current code doesn???t support it. It???s that simple. Trying
> to label new bug / old bug, etc is pointless. It???s a bug, and
> it???s gotta get fixed.
Well, no, this is not the essence of the surrounding noise.

> It isn???t clear to me it???s even understood
> why things are happening, so even the old / new finger pointing is
> premature given we don???t have a definitive cause, just a theory.
I have the theory, and asked to fiddle a knob which forces UFS buffer to
be dissolved at the end of io in VOP_GETPAGES() in similar manner to
the BMAP pager.  The result of this experiment will be telling, IMO.
I cannot neither stop the discussion until the experiment is done, nor
can I conduct the experiment myself.

>
> Warner
>
> >> Final comment: I'd really like to see a sort of "tested
> >> on" for things like this, because it's not clear which
> >> platforms/architectures it was tested on.
>





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161113213108.GP54029>