Date: Sun, 16 Aug 1998 14:14:05 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: dg@root.com Cc: tlambert@primenet.com, current@FreeBSD.ORG, karl@mcs.net Subject: Re: Better VM patches (was Tentative fix for VM bug) Message-ID: <199808161414.HAA16245@usr08.primenet.com> In-Reply-To: <199808161315.GAA00184@implode.root.com> from "David Greenman" at Aug 16, 98 06:15:30 am
next in thread | previous in thread | raw e-mail | index | archive | help
> >This may be the source of the: > > > > * XXX: sometimes we get pages which aren't wired down or on any queue - > > * we need to put them on the inactive queue also, otherwise we lose > > * track of them. Paul Mackerras (paulus@cs.anu.edu.au) 9-Jan-93. > > > >comment in vm_page.c, and it may be the source of the zero page > >phenomenon. > > That comment simply means what it says and must be taken in the context > of the function where it appears (vm_page_deactivate). The system sometimes > needs to put a page on a queue - any queue, and vm_page_deactivate needs to > be able to handle the case of not-on-any-queue. Really, the comment is > obsolete and dates back to a time when the VM system didn't manage the page > queues properly. Then it should be replaced with a DIAGNOSTIC panic. The issue that's being addressed here is the "XXX", more than anything else. I really think there are three cases in the code where pages can be orphaned; I think the object code only deals with one of these; the other two can be implemented with late-binding reaps without a problem (ie: the code is not technically correct, except in reduction). > >Another problem in the mmap code is that when the vnode_pager object > >is created, it is set to be the number of pages times the page size > >in length; but also in vnode_pager, the setsize on the object sets > >the length to the actual file length. > > > >This discrepancy opens a window in which some changes may be lost. > > My reading of the code shows that the object size is set to the rounded-up > index. vnp_size is set to the unrounded new size, but that's expected and > I don't see any bugs related to doing that; rounding it will likely break the > proper reading of partial pages. Am I missing something? vnp_size is mostly > private to the vnode_pager. The rounded-up object->size is used outside of > the vnode pager. OK, consider the boundry cases: 1) The page is less than the boundary, but is extended to less than the boundary. This is seen as a truncation by setsize. 2) The page is less than the boundary, but is extended to the boundary. This is seen as a NOP by the first compare in the setsize code. 3) The mapped page is written (an implicit extend without a write fault -- this is, I believe, Karl Denniger's case when the code provided failed -- between the mapping and the actual size). There are actually two more failure modes, but they are hellaciously complex to describe. 8-(. > >John also suggested another change for detecting a case where page > >orphans can be created. in vm_page.c, it's possible that an object > >entry in the page insert function will overwrite an existing entry; > >I added a DIAGNOSTIC panic to catch this when it happens. > > You mean that there can be multiple pages at the same offset? It would > be bad if that happend, but I'm skeptical that it actually does. So was I, when John suggested it. John's suggested change would detect an orphaning event. Since I believe that what is happening is actually an alias event (for the bug that is at the top of my list), the DIAGNOSTIC change is a NOP, as far as I'm concerned. But it's a nice thing to have to detect the XXX comment case, which, in theory, should never occur (ie: a page should be on a queue, and this recovery mechanism was a kludge which covers the real problem, which is orphaned pages). > >+ /* > >+ * XXX we are in a race here. If the file is extended > >+ * between the time we get the object and the time > >+ * we set the size, then we lose... > >+ */ > >+ if (!(flags & MAP_ANON) && vp->v_type != VCHR) > >+ object->un_pager.vnp.vnp_size = vat.va_size; > > Hmmm. This is, in fact, the most interesting change, as far as Karl Denniger is concerned. I would like to close the race window here (by adding an "actual_length" parameter to the various "allocate" routines) to see if this resolves Karl's remaining bugs (per the SPLVM bug noted in the comment here)... I suspect that Karl has two bugs; the one I hit, and a sepoerate one, and the remaining window accounts for a tiny fraction of the misbehaviour that he is still experiencing. The real way out of this mes is to reference cont everything and to note 1->2 transitions for page references. This takes a significant amount of code and structure changes to accomplish; it would be nice if the problem were a bit easier to duplicate, but it's not. I have to make changes that narrow down the problem instead of resolve it; A/B tests. 8-(. > >+ /* > >+ * The object is allocated to a page boundary. This is > >+ * incorrect if page is only partially backed by the vp; > >+ * we must fix this up in the caller. > >+ */ > > object->un_pager.vnp.vnp_size = (vm_ooffset_t) size * PAGE_SIZE; > > I think the above change is wrong. It's only a comment. The actual change is: if (!(flags & MAP_ANON) && vp->v_type != VCHR) object->un_pager.vnp.vnp_size = vat.va_size; But me too, since it allows a race window. A correct fix would get rid of the window by passing an additional parameter to those alloc routines that need to care about actual backing length (ie: vnode_pager). It was more of a test case than anything else... if the problem decreased, it points to the race window that was partially closed. If it didn't, then the operation was a boundary condition, which, while nice to address, did nothing to address reported bugs. For non-page-aligned pagers, this would be important; for all others, I expect that round_page() would be used to assure page alignment. This is very tricky; I nearly posted code to page align setsize instead, but that would have been a mistake, since the real issue is the bzero that occurs in the setsize code. The point is to set the size in such a way that it is not misinterpreted as a truncate between the set event and the truncation event handler. This is the remaining race. A very real problem here is that the code is hideously poorly commented, which makes fixes problematic. I haven't investigated what it would take to put asserts of locking state in there; from a naieve examination of the code, there would need to be a parent map pointer in vm_map_t that was used to see if the parent was locked (this may exist; I haven't looked for it). The PITA is that there is an implicit rather than an explicit hierarchical lock relationship. 8-(. At a minimum, it would be nice if each function asserted its assumptions with regard to locks based on its parameters... at least in the DIAGNOSTIC case, so that runtime checks could be enabled. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808161414.HAA16245>