Date: Thu, 19 Dec 1996 14:53:00 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: rminnich@Sarnoff.COM (Ron G. Minnich) Cc: terry@lambert.org, freebsd-hackers@freebsd.org Subject: Re: rpc.lockd in nfs in freebsd vs. sun nfs locking Message-ID: <199612192153.OAA12245@phaeton.artisoft.com> In-Reply-To: <Pine.SUN.3.91.961219151510.18130D-100000@terra> from "Ron G. Minnich" at Dec 19, 96 03:30:02 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > 2) The Sun locking works correctly if you obey order of operation > > protocols in your client code. If it doesn't work for someone, > > it's pilot error, not an unclosable hole in the code. > > Hmm, there still seem to be problems as late as solaris 2.5. All I can say > is friends of mine are still trying to use the locking stuff and still > having problems, and I can't see any obvious things wrong in what they're > doing. Have your friends applied the undocumented debugger-based kernel patch to turn of async responses on NFS writes and slow down their writes by as much as a factor of 3? If not, then they are not obeying the order of operation protocols. Sun followed SVR4 in their NFS server code by defaulting async writes on to get better benchmarks. There is a Sun release note to this effect, but it is hidden in a problem report response (ie: "undocumented"). > As late as Solaris 2.4, we were still seeing an occasional 'lock storm', > where RPC lock traffic would eat the wire on one particular error > condition that would occur when a client rebooted. This was elicited by > sendmail locking in /var/spool/mail. We just had a lockup the other day on > a 2.5 machine, we're not sure why but the process that was hung was ... > sendmail. The guy who rebooted the machine didn't get me a core dump > though. Certainly, when the rpc.statd notes the server death and the client comes back up, all clients will relock everything they had open (that's why NFS locking is stateful). When a client dies, the client doesn't have lock state and therefore can not reestablish locks, so that can't be the cause of your "lock storm" problems. > > 3) The FreeBSD implementation of rpc.lockd (for the NFS server) > > always returns "success" to the NFS client instead of making > > local fcntl() calls to assert the locks on the local system on > > the clients behalf. > > Yes, BUT: is it locking or not? If so, that's great. If not, then > it's hard to see how this helps an application writer -- and that's > the real end goal. No, it's not locking. That was Jordan's class project: to integrate my patches and maintain a flatened lock graph in the rpc.lockd code (ie: he intended to un-stub the code and collapse multipl open file references to a single lockd descriptor). > > 4) Patches to do this were submitted, but never integrated. They > > remain available in the -current list archive for anyone who > > is interested in integrating them. > > So: locking does not work. Works on my box... but then again, so do union FS and Unicode namespaces; doesn't do anyone else a lot of good if they won't take the code. > Anyway seems you know what to do in order to fix it, but it's still not > really there, right? It would be nice to see this go in -- people keep > asking for it. Talk to Jordan. > Anyways, thanks for the note. It sounds like freebsd is close but not > quite there ... maybe the changes will make it in next time. Yes; it is very close. I estimated once that it would be less than 20 hours of actual work. No one is doing it, and no one is letting the supporting patches into the kernel code. Since I don't need the locking myself, I just decided it's a core team problem, and they can resolve it or not... I'm not going to get an ulcer over trying to get my code committed. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199612192153.OAA12245>