From owner-freebsd-isp@FreeBSD.ORG Wed Jun 25 02:54:57 2003 Return-Path: Delivered-To: freebsd-isp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D17E737B401; Wed, 25 Jun 2003 02:54:57 -0700 (PDT) Received: from mutare.noc.clara.net (mutare.noc.clara.net [195.8.70.95]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E1F444003; Wed, 25 Jun 2003 02:54:57 -0700 (PDT) (envelope-from ollie@mutare.noc.clara.net) Received: from ollie by mutare.noc.clara.net with local (Exim 4.14) id 19V6zQ-000CUk-Kc; Wed, 25 Jun 2003 10:54:56 +0100 Date: Wed, 25 Jun 2003 10:54:56 +0100 From: Ollie Cook To: freebsd-questions@freebsd.org, freebsd-isp@freebsd.org Message-ID: <20030625095456.GH33040@mutare.noc.clara.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.1i X-Operating-System: FreeBSD 4.8-STABLE i386 X-NCC-RegID: uk.claranet Sender: Ollie Cook Subject: mixed files streams / nfs client cacheing X-BeenThere: freebsd-isp@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Internet Services Providers List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Jun 2003 09:54:58 -0000 Hi, We run a NetApp F840 filer with a 350GB volume which is mounted by our FreeBSD 4.x clients using NFSv3. Today it seems that two fopen() calls for the same file on that mounted volume yielded file streams associated with two different files. Two different processes on the same host, run at approximately the same time (although not concurrently) seem to have got FILE *streams mixed up. Each process chdir()s to a different directory on the mounted volume, readdir()s and then operates on the files using the relative path, rather than an absolute path to the file. The simplified chain of events was: Process 5453 (starts at 19:09:46 ends at 19:09:48): - File A is stat()'d by process only Process 5592 (starts at 19:10:02 ends at 19:11:41): - File B is stat()'d - File B is fopen()'d - Contents is read with fgets() (the contents read at this stage, is actually the contents of File A, not File B) - File B is fclose()'d - [other operations are performed] - File B is fopen()'d again and the contents parsed for a unique token (this time, the contents relates to File B and the correct unique token is found. This token was *not* read when the file was previously opened) - File B is fclose()'d and unlink()'d File A and File B are in different directories on the volume. The unique token is used in logs to provide an audit trail. It is logged when the file is written and when it is unlinked. The net result of the events described was that process 5592 read the contents of the wrong file, before unlinking the correct one; effectively the contents of File B were lost. Given that the two processes were executed within twenty seconds of one another, I wondered if some NFS caching either on the server or client side was causing this behaviour. The client in question was running 4.6-STABLE as at Jul 17 2002. Does it seem plausible that sys/nfs may have cached File A's information and associated stream B with it in error? I've had a cursory look at CVS commits relating to NFS since July 2002 in the 4-RELENG tree, but I admit to not being an expert in this area and didn't spot anything. I have a case open with NetApp in case this could be attributed to an error on the filer, such as an inconsistent filesystem, although I've not yet heard anything back. I don't think this behaviour will be easily reproducible as the cluster causes around 3000 NFS operations per second on average each day, and this sort of behaviour has only been brought to my attention twice in the last month. I have implemented a little sanity check after fopen to check that the inode associated with the file is the same as the inode of the file associated with the stream before proceeding, but this may not help if file credentials are being incorrectly cached on the NFS client. Still, it can't do much harm to do the check. Pseudocode without error checking: FILE *f; struct stat ssb, fsb; f = fopen(filename, "r"); stat(fn, &ssb); fstat(fileno(f), &fsb); if (ssb.st_ino != fsb.st_ino) { /* report inconsistency error */ } If there's further information that I can provide to help make sense of this turn of events I would be glad to provide it. Cheers, Ollie -- Oliver Cook Systems Administrator, Claranet UK ollie@uk.clara.net 020 7903 3065