Date: Tue, 4 Aug 1998 23:18:25 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: wollman@khavrinen.lcs.mit.edu (Garrett Wollman) Cc: freebsd-fs@FreeBSD.ORG, core@FreeBSD.ORG Subject: Re: Exclusive locking for directory lookups? Message-ID: <199808042318.QAA11288@usr07.primenet.com> In-Reply-To: <199808041758.NAA03021@khavrinen.lcs.mit.edu> from "Garrett Wollman" at Aug 4, 98 01:58:33 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> Does anybody remember why plain-jane directory lookups (i.e., not > deleting or creating anything) require an exclusive lock on all the > directory vnodes along the path? It would seem to be that only shared > locks should be necessary in those cases... Because there is no flag from namei() to indicate whether the terminal path component is going to be modified or not. This is more a symptom of the way namei() is implemented; it can't support inheritance of such a flag (or inheritance of a POSIX namespace escape ("//<escape>/<rest-of-path>") for the same reason. The locking is a tail-chase down the tree -- that is, the lock is one-behind (parent and child) and is not held to the root. The real question is whether the race this is protecting against still exists with a unified VM and buffer cache... in which case, the you could reexamine the need for it (for this to work, you would need my patch to have namei() to return the EEXISTS, instead of duplicating the code every place you wanted a lookup to fail if the target exists, ie: the create/rename target cases, etc. -- you need to distinguis internal vs. external error return for this case). My gut feeling is that it is still neccessary, even though the race it was intended to protect against was a VM and buffer cache coherency with multiple accesses in the "write entry" case, mostly because of the late buffer mapping. I could be wrong here, though. In any case, that would mean you would need to add a flag to indicate a terminal component lookup to VOP_LOOKUP. This is somewhat problematic, because an underlying FS is permitted to eat as many components as it wants to, according to the design. To get around this, the idea that it is the terminal component would have to be indicated by the non-existance of a "next component". To implement this approach would require pre-parsing the path into components. The easiest way to do this would be to keep a seperate "total length" in the path component buffer, and replace the path sperators with NUL, treating it as a pre-strtok'ed string. If you go this route, consider providing access macros as well, making the underlying FS advance cn_nameptr (if it consumes extra components), and in general making the structure opaque enough that we could support multiple namespaces (the current VFAT short name binding and assumption of ISO 8859-1 character set instead of Unicode is broken). The underlying VOP_LOOKUP for the "writing an entry" case would use the accessor macro to ask for the start of the next component; if it got a NULL back, it would know it was terminal, and that it needed to lock (handling the EEXISTS case in namei() lets you avoid this lock, if the lookup would succeed). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808042318.QAA11288>