Date: Wed, 22 Sep 2004 09:56:58 -0400 From: Robert Blayzor <rblayzor@inoc.net> To: freebsd-stable@freebsd.org Subject: Problem with fclose() returning error (EBADF) Message-ID: <415184AA.90604@inoc.net>
next in thread | raw e-mail | index | archive | help
I have a multithreaded application running on FreeBSD 4.9, .10 and -STABLE that I'm having an issue with. The application writes large amounts of small files over an NFS mount and randomly we're seeing fclose() return a failure code, -1 and errorno, EBADF. We have no idea what may be causing the problem. The NFS server appears to be functioning fine, no errors at all, it runs perfectly over tons of other clients. At first we thought maybe that the fd was getting munged somehow, but here is the weird part. If the code is changed to do an fflush() on the fd immediately before we issue an fclose(), fflush NEVER returns an error and always completes successfully. However, completely rnadomly fclose() will return an error condition and errno of EBADF. There are hundreds of gigs and inodes available on the NFS server and writes work fine from all other NFS clients at the time. (this is a six server mail cluster) We've double checked the compile flags and I've gone through all the libc calls I can think of. And I've linked my own debugging into the libc_r close function and it's not showing 'any' closes occuring between the fopen and fclose that fails. We've also checked the flags of the FILE *f, structure, it is still correct so it has not been munged by anything. There are lots of conditions where the error EBADF is returned by the kernel etc... and I suspect one of them is not really a sign of a bad file handle but means something else, but I don't know any way to find what is really occuring and if it is serious or just a faulty return code. Doing a KTRACE on this may be the only option, but the problem is, the application is SO busy and the problem only happens randomly it'd be impossible to find if/when it happens. ie: thousands and thousands of files can be written successfully before we actually see a failed one. Any help or guidance would be greatly apprecaited. TIA -- Robert Blayzor, BOFH INOC, LLC rblayzor@inoc.net PGP: http://www.inoc.net/~dev/ Key fingerprint = 1E02 DABE F989 BC03 3DF5 0E93 8D02 9D0B CB1A A7B0 Never underestimate the bandwidth of a station wagon full of tapes. - Jackson
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?415184AA.90604>