Date: Sun, 13 Nov 2016 23:31:08 +0200 From: Konstantin Belousov <kostikbel@gmail.com> To: Warner Losh <wlosh@bsdimp.com> Cc: Adrian Chadd <adrian.chadd@gmail.com>, Juli Mallett <jmallett@freebsd.org>, Warner Losh <imp@bsdimp.com>, "freebsd-mips@freebsd.org" <freebsd-mips@freebsd.org> Subject: Re: svn commit: r307626 - head/sys/ufs/ffs Message-ID: <20161113213108.GP54029@kib.kiev.ua> In-Reply-To: <50FE3B7E-8FA4-47DC-BC45-EE75B9FAFC0F@bsdimp.com> References: <20161113071911.GF54029@kib.kiev.ua> <CANCZdfpC6smeNSPKzpbX8aAnF8CZ%2BSEFQmQ74jqvWUVXrttM%2BQ@mail.gmail.com> <20161113075557.GH54029@kib.kiev.ua> <71C512CD-0FB6-40D8-B46C-30467A245693@bsdimp.com> <CACVs6=_zmjJhMzmyFGJGHK1RAguQ_fZUcd94ZEmVEnTXBiOSdQ@mail.gmail.com> <20161113161548.GK54029@kib.kiev.ua> <CAJ-VmomG6e8WVyyuqAkC20fwZ5wX2hnwSE7T4r%2BTSDF%2BOzLCNQ@mail.gmail.com> <CAJ-Vmo=C-pqY4o-r2RXD-m0OxWeDaLpm_zUroeMpwZvLLuuZdg@mail.gmail.com> <20161113190344.GM54029@kib.kiev.ua> <50FE3B7E-8FA4-47DC-BC45-EE75B9FAFC0F@bsdimp.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Nov 13, 2016 at 01:55:54PM -0700, Warner Losh wrote: > > > On Nov 13, 2016, at 12:03 PM, Konstantin Belousov <kostikbel@gmail.com> wrote: > > > > On Sun, Nov 13, 2016 at 10:50:19AM -0800, Adrian Chadd wrote: > >> Ok, so after talking with others, my questions are: > >> > >> * I thought our VM was supposed to not be doing double mapping like > >> this. warner's comment on irc was: > >> > >> === > >> 13:39 < bsdimp> adrian the VM isn't supposed to do it at all. > >> 13:39 < bsdimp> adrian that is, double map on purpose. > >> 13:40 < bsdimp> though there's some exceptions to the rule > >> 13:40 < bsdimp> since kernel mappings go away in userland, and > >> userland doesn't execute while you're in kernel mode, you can do the > >> flushing > >> game in busdma to prevent most issues. > >> 13:41 < bsdimp> which is what we do. Generally, though, our VM doesn't > >> do it in-kernel. > >> === > >> > >> * is this still the case? or are there places in the VM where we are doing this? > >> * can we introduce a machdep/pmap capability check to see if aliasing > >> is allowed and if so, turn this feature on? > > This never was the case, and never will. VM establishes mappings as > > requested by the userspace and kernel, and n-times mapping for the same > > page is always legitimate. It is the pmap duty to handle that. > > User space N way mapping is rare. Can you show an actual example of when this is commonly done? Or being done in init(8)? > User space N mapping is very common with e.g. shared libraries. Writeable n-way mappings are ubiquitously used by X apps in the form of sysv shared memory segments, by database systems, in particular postgres, by qt c++ framework, and so on. If you use vi on some file, nvi mmaps the content. If any other application reads the file simultaneously with the file being opened in vi, you get double-mapping by userspace and KVA, with all consistency problems. It is rather pointless to argue about the necessity and absolute omnipresent nature of this feature. Even without the new UFS pager, apparently the MIPS port randomly corrupts user data. > > If you cared to read my previous mail, I already explained that besides > > the userspace asking for n-mapping, there is at least buffer cache which > > also maps the same page into KVA, from the times when unified buffer > > cache/page cache was introduced. Same is true for n-mapping of shared > > anon pages. > > KVA mapping is different, since that mapping goes away. And generally, > the KVA mappings aren???t active while user land mappings are active. > Here ???active??? means can change the cache. When the kernel is > executing on these single core 32-bit mips boxes, user space isn???t > creating traffic to the pages. Also, KVA mappings tend to be for pages > that we don???t actually touch in the kernel once the I/O is complete. > This is true for data pages in files. It may not be true for in-kernel > meta-data pages, and that might have changed with your commit (I > haven???t looked in enough detail to know). > This is not how 'mappings' work, at least in FreeBSD. KVA mappings are not switched away when CPU enters usermode, and VI caches for KVA indexes are not flushed on switch to usermode. So if the page is double-mapped and the colors of the mappings are different, it does not matter whether it is kernel or user VA to create problems. > But that still begs the question: why is the fault address 0 in > the original crash report. If we???re doing multiple mappings, and > that???s creating a cache coherency problem because there???s two > copies of dirty cache data, how does that end up with us doing a NULL > dereference? That???s the part that I???m having trouble understanding > and I worry that arguing over the exact parameters of the VM mapping > issues might be distracting us from a credible theory, with a specific > sequence of events that gets us here. > I can't answer exactly why 0 appear there, but a plausible explanation is that kernel faults in the pages mapped at some KVA which is not colored same as the user mapping. Consider what happen: bootstrap mapped init .text, but the pages are not resident. On the first touch, the tlb refill exception is reported, and since mips software pages in pmap are not filled, vm_fault() is called, which eventually called VOP_GETPAGES(). There, the UFS file buffer is bread(9). During bread(9), for mips, the buffer is mapped in KVA, and actual io occurs, which validates the pages. On one hand, the pages constituing the buffer are mapped into KVA, on the other hand, they are mapped into UVA, possibly with different color. The bread(9) operated on KVA, which validated the VIPT cache line for the KVA color. A cache line for UVA is not filled, and its content is probably zeroes. After return from vm_fault()->tlb refill, the not filled UVA line is executed. Most likely, that line is zeroed and either some function pointer is loaded as zero, or MIPS ISA executes something which causes jump to 0. Additional problem for MIPS pmap is that vfs buffer maps pages non-managed non-executable, which does not allow the pmap to even try the dcache/icache consistency trick, but this is secondary. BTW, when I developed the PCID feature for amd64, similar to ASID feature of MIPS TLB, inconsistent cache invalidations also often resulted in attempt to execute @0 or access @0, I believe for similar reasons. > It looks like we use the direct map in kernel for addresses < 256MB. > KSEG0 covers a lot of sins. KSEG0 cannot be used for buffer mappings, since buffers require consequtive mappings of several non-contiguous pages. Typical modern UFS buffer is 32k, meaning b_data is for 8 pages. And, KSEG0 access is cached with all consequences of having different color for physical address of the page and its virtual mapping. > > NetBSD has code to cope with congruent mappings that cause problems. > These look to be only on LONGSOON and MIPS3 processors with mips32 and > newer not affected by it. MIPS3 has VCE exceptions for Instructions > and Data and NetBSD has code there to cope. It also maps the pages > uncached until the multiple mappings go away. The other code it has > to cope is effectively disabled when not on these processors. It is > a twisty maze of #ifdefs, though, so maybe I???m missing something. > It does disable the socket page loaning code as well when there???s > issues. > > So, we???re back to the basic question: why does this change cause the > observed behavior, exactly? I explained above, how new UFS pager work. Old BMAP pager used transient mappings for duration of io, established by pmap_qenter(), and dissolved by pmap_qremove() before VOP_GETPAGES() returns to vm_fault(). From my reading of the code, pmap_qremove() flushed dcache always. > > >> Adding a pmap capability and turning it on for say, > >i386/amd64/arm64 > would allow for this new feature as well as the > >previous behaviour on > older platforms. > > I don't think I have the > >time to fix mips pmap to support this new > feature, so if you want > >to turn it on for all features, we should > really fix/test pmap on > >said platforms first. This is not a new feature, this is the old bug > >on that platform. > > This sort of argument is pointless: the code does something new, and > the current code doesn???t support it. It???s that simple. Trying > to label new bug / old bug, etc is pointless. It???s a bug, and > it???s gotta get fixed. Well, no, this is not the essence of the surrounding noise. > It isn???t clear to me it???s even understood > why things are happening, so even the old / new finger pointing is > premature given we don???t have a definitive cause, just a theory. I have the theory, and asked to fiddle a knob which forces UFS buffer to be dissolved at the end of io in VOP_GETPAGES() in similar manner to the BMAP pager. The result of this experiment will be telling, IMO. I cannot neither stop the discussion until the experiment is done, nor can I conduct the experiment myself. > > Warner > > >> Final comment: I'd really like to see a sort of "tested > >> on" for things like this, because it's not clear which > >> platforms/architectures it was tested on. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20161113213108.GP54029>