From owner-freebsd-hackers Thu Dec 13 13:44: 5 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2]) by hub.freebsd.org (Postfix) with ESMTP id 0447137B405 for ; Thu, 13 Dec 2001 13:44:00 -0800 (PST) Received: (from dillon@localhost) by apollo.backplane.com (8.11.6/8.9.1) id fBDLek699925; Thu, 13 Dec 2001 13:40:46 -0800 (PST) (envelope-from dillon) Date: Thu, 13 Dec 2001 13:40:46 -0800 (PST) From: Matthew Dillon Message-Id: <200112132140.fBDLek699925@apollo.backplane.com> To: Thomas Zenker Cc: hackers@freebsd.org Subject: Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step ) References: <59687.1008231593@winston.freebsd.org> <200112131058.fBDAwSR66790@apollo.backplane.com> <20011213221204.A2994@peotl.homeip.net> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG :Matt, : :what the hell, this seems to very near by a problem I wanted to :report since a week: : :in a data acquisition I have a write process writing to a file :backed shared mmapped ringbuffer. There can be several reader :processes on this this ringbuffer. Now once i killed the writer for :resizing of the ringbuffer and forgot about the readers. The writer :truncated the database without unlinking it before. This lead the :readers to be running for ever, it seemed so at least. After :attaching with gdb I saw, that they were only page faulting nothing :more, for ever.... : :Something similar I saw with netscape going mad. : :cheers, Thomas That's something else. There's no OS bug there. When you mmap() a file only those pages that are within the file's boundries are valid. So if you ftruncate() the file then all the pages occuring after the (new) file EOF will become invalid and BUSfault if the process touches them. You touched upon the correct solution... remove() the file instead of ftruncate()ing it. The file's data then remains intact for the processes still referencing it. The readers must be catching SIGBUS and retrying (not exiting), causing them to run in a signal loop forever. This is a case of bad programming. I've seen it before... there was a popular IRC bot back in my BEST days which constantly got itself into infinite loops because the guy who wrote it installed a signal handler for SIGBUS. -Matt Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message