Date: Fri, 20 Jun 2003 18:56:32 -0400 From: John <jwd@bsdwins.com> To: Terry Lambert <tlambert2@mindspring.com> Cc: uitm@blackflag.ru Subject: Re: open() and ESTALE error Message-ID: <20030620225632.GA29485@BSDWins.Com> In-Reply-To: <3EF2CF89.3E5542F5@mindspring.com> References: <200306200617.h5K6HaM7058935@gw.catspoiler.org> <3EF2CF89.3E5542F5@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Terry Lambert's Original Message ----- > Specifically, see the underline part of: > > > > + if (error == ESTALE && stale++ == 0) > --------------- > > ...he exits it after retrying it fails, and falls into the > standard ESTALE return case. > > If this gets committed (which I think it shouldn't because I > can see a genuinely bad handle getting converted to a good one > in a couple of cases), that line should probably be rewritten > to be more obvious (e.g. move the "stale++" before the "if" > statement and adjust the compare to compensate for the difference > so no one else reads it the way we did). hi folks, After looking at his original patch, I suggested modifying it for clarity to be of the form: error = vn_open(&nd, flags, cmode); if (error == ESTALE) error = vn_open(&nd, flags, cmode); /* single retry */ While I understand a number of you have reservations against this change, I think it worth serious consideration. Unless someone is willing to go into each of the individual fs layers and deal with ESTALE, this appears to be a relatively straight forward and easy to understand approach. Most of the main applications I run on clusters have all had their open routines recoded similar to the following (this from ftpd): int try = 0; while ((fin = fopen(name,"r")) == NULL && errno == ESTALE && try < 3 ) { if (logging > 1) syslog(LOG_INFO,"fopen(\"%s\"): %m: attempting retry",name); } if (fin == NULL && logging > 1) syslog(LOG_INFO,"get fopen(\"%s\"): %m",name); This is a real problem when using fbsd in high load / high throughput situations where highly sequenced operations are performed on a common set of data files from multiple machines. An example of this environment can be seen here: http://www.freebsd.org/~jwd/images/cluster.jpg If no one has any patches which can provide a better solution for handling ESTALE I would like to see Andreys' patch given a chance. Of course, if we don't want to do this, then I think it is high time we documented that open(2) can return ESTALE and provide a library routine that wraps open() with a retry :-) -John
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030620225632.GA29485>