From owner-freebsd-hackers  Thu Dec 13 13:44: 5 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from apollo.backplane.com (apollo.backplane.com [216.240.41.2])
	by hub.freebsd.org (Postfix) with ESMTP id 0447137B405
	for <hackers@freebsd.org>; Thu, 13 Dec 2001 13:44:00 -0800 (PST)
Received: (from dillon@localhost)
	by apollo.backplane.com (8.11.6/8.9.1) id fBDLek699925;
	Thu, 13 Dec 2001 13:40:46 -0800 (PST)
	(envelope-from dillon)
Date: Thu, 13 Dec 2001 13:40:46 -0800 (PST)
From: Matthew Dillon <dillon@apollo.backplane.com>
Message-Id: <200112132140.fBDLek699925@apollo.backplane.com>
To: Thomas Zenker <thz@tuebingen.netsurf.de>
Cc: hackers@freebsd.org
Subject: Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )
References: <59687.1008231593@winston.freebsd.org> <200112131058.fBDAwSR66790@apollo.backplane.com> <20011213221204.A2994@peotl.homeip.net>
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG


:Matt,
:
:what the hell, this seems to very near by a problem I wanted to
:report since a week:
:
:in a data acquisition I have a write process writing to a file
:backed shared mmapped ringbuffer. There can be several reader
:processes on this this ringbuffer. Now once i killed the writer for
:resizing of the ringbuffer and forgot about the readers. The writer
:truncated the database without unlinking it before. This lead the
:readers to be running for ever, it seemed so at least.  After
:attaching with gdb I saw, that they were only page faulting nothing
:more, for ever....
:
:Something similar I saw with netscape going mad.
:
:cheers, Thomas

    That's something else.  There's no OS bug there.   When you mmap()
    a file only those pages that are within the file's boundries are
    valid.  So if you ftruncate() the file then all the pages occuring
    after the (new) file EOF will become invalid and BUSfault if the 
    process touches them.

    You touched upon the correct solution... remove() the file instead
    of ftruncate()ing it.  The file's data then remains intact for the
    processes still referencing it.

    The readers must be catching SIGBUS and retrying (not exiting),
    causing them to run in a signal loop forever.  This is a case of
    bad programming.  I've seen it before... there was a popular IRC
    bot back in my BEST days which constantly got itself into infinite
    loops because the guy who wrote it installed a signal handler for
    SIGBUS.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message