From owner-freebsd-hackers Thu Dec 13 14: 7:22 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from iscserv7.nepustil.net (NS.Nepustil.NET [193.96.243.22]) by hub.freebsd.org (Postfix) with ESMTP id D23E437B41C for ; Thu, 13 Dec 2001 14:07:15 -0800 (PST) Received: from peotl.homeip.net (tuebpool-130.pm3.nepustil.net [212.71.200.130]) by iscserv7.nepustil.net (Sendmail) with ESMTP id EA58A79E59; Thu, 13 Dec 2001 23:07:03 +0100 (CET) Received: (from thz@localhost) by peotl.homeip.net (8.11.6/8.11.6) id fBDLxvl05437; Thu, 13 Dec 2001 22:59:57 +0100 (MET) (envelope-from thz) Date: Thu, 13 Dec 2001 22:58:10 +0100 From: Thomas Zenker To: Matthew Dillon Cc: hackers@freebsd.org Subject: Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step ) Message-ID: <20011213225810.A5144@peotl.homeip.net> References: <59687.1008231593@winston.freebsd.org> <200112131058.fBDAwSR66790@apollo.backplane.com> <20011213221204.A2994@peotl.homeip.net> <200112132140.fBDLek699925@apollo.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i X-Mailer: Mutt 1.0.1i In-Reply-To: <200112132140.fBDLek699925@apollo.backplane.com>; from dillon@apollo.backplane.com on Thu, Dec 13, 2001 at 01:40:46PM -0800 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, Dec 13, 2001 at 01:40:46PM -0800, Matthew Dillon wrote: > > :Matt, > : > :what the hell, this seems to very near by a problem I wanted to > :report since a week: > : > :in a data acquisition I have a write process writing to a file > :backed shared mmapped ringbuffer. There can be several reader > :processes on this this ringbuffer. Now once i killed the writer for > :resizing of the ringbuffer and forgot about the readers. The writer > :truncated the database without unlinking it before. This lead the > :readers to be running for ever, it seemed so at least. After > :attaching with gdb I saw, that they were only page faulting nothing > :more, for ever.... > : > :Something similar I saw with netscape going mad. > : > :cheers, Thomas > > That's something else. There's no OS bug there. When you mmap() > a file only those pages that are within the file's boundries are > valid. So if you ftruncate() the file then all the pages occuring > after the (new) file EOF will become invalid and BUSfault if the > process touches them. > > You touched upon the correct solution... remove() the file instead > of ftruncate()ing it. The file's data then remains intact for the > processes still referencing it. > > The readers must be catching SIGBUS and retrying (not exiting), > causing them to run in a signal loop forever. This is a case of > bad programming. I've seen it before... there was a popular IRC > bot back in my BEST days which constantly got itself into infinite > loops because the guy who wrote it installed a signal handler for > SIGBUS. > > -Matt > Matthew Dillon > well, I know, that this was a bug in my software, not to unlink the file first and then truncating :-). But SIGBUS was not catched in the readers. Will try to reproduce it. Thomas To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message