Date: Thu, 20 Nov 2014 11:25:29 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-hackers@freebsd.org Cc: Konstantin Belousov <kostikbel@gmail.com>, Mike Gelfand <Mike.Gelfand@logicnow.com>, "hackers@freebsd.org" <hackers@freebsd.org> Subject: Re: [BUG] Getting path to program binary sometimes fails Message-ID: <201411201125.30087.jhb@freebsd.org> In-Reply-To: <B655709E-0D6F-4DE1-A746-9A20B897BEA8@logicnow.com> References: <91809230-5E81-4A6E-BFD6-BE8815A06BB2@logicnow.com> <20141113170758.GY17068@kib.kiev.ua> <B655709E-0D6F-4DE1-A746-9A20B897BEA8@logicnow.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Friday, November 14, 2014 4:54:18 am Mike Gelfand wrote: > On Nov 13, 2014, at 8:07 PM, Konstantin Belousov <kostikbel@gmail.com>=20 wrote: >=20 > > This is not a defect. The vnode->path translation uses namecache, which > > could be purged at any time. The behaviour is typical for most unix > > implementations. Linux and new Solaris have 'rigid' namecache, where > > name entry lifetime is the same as the vnode lifetime it is attached to. > > I am not aware of any useful consequences of such design, except > > vn_fullpath() working more reliable, but at the cost of increased > > memory usage. >=20 > The man page for sysctl(3) states that =93Unless explicitly noted below,= =20 sysctl() returns a consistent snapshot of the data requested=94 (surely we = don=92t=20 expect half the path being returned; I=92m just trying to read thoroughly).= =20 Later on there are no special notes on {CTL_KERN, KERN_PROC,=20 KERN_PROC_PATHNAME}; at least no notes on the unstable behavior being=20 observed, and no funny details of internal implementation you describe. ERR= ORS=20 section only describes ENOENT condition as =93The name array specifies a va= lue=20 that is unknown,=94 which certainly is not the case here. Note that sysctl(3) is describing a generic interface that mostly returns integers. The language is trying to state that when you read the values you get a consistent snapshot of whatever logical values a node provides. (e.g= =2E=20 for a 64-bit int on a 32-bit system it will try to return a consistent value rather than one which mixes 32-bit halves from different values of the=20 associated varaible, or things like the kern.cp_times sysctl (for the=20 cp_times[] array) will return a consistent snapshot of the entire array of= =20 ints). It is not saying that a node is not permitted to say "I have no val= id data at this time." If anything, I think that a node is obligated to return that instead of a partial data (as you somewhat noted). > Since you=92re saying that current behavior is not a defect, maybe=20 documentation is wrong (incomplete, misleading) then? I will readily accept= =20 the =93not a defect=94 explanation, but only if one wouldn=92t have to ask = you every=20 time this oddity is met. If this is the expected error condition, what shou= ld=20 I do to get the path reliably? Should I retry (and how many times)? You=92r= e=20 saying cache is being purged; does it mean that when I ask for path then ca= che=20 is populated again? Does it guarantee then that I=92ll be able to get the p= ath=20 on next call? Could you guarantee that I=92ll be able to get the path at al= l if=20 I fail two or more times? Should I rely on ENOENT specifically when retryin= g? Is this over NFS? NFS is more aggressive than local filesystems in purging name cache entries because there are inherent races in NFS with certain=20 fileservers (ones that don't use sub-second timestamps), so by default entr= ies=20 always expire after about a minute. You can change that via the 'nametimeo= '=20 mount option (takes a count in seconds). =2D-=20 John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201411201125.30087.jhb>