From owner-freebsd-hackers  Thu Dec 13 14: 7:22 2001
Delivered-To: freebsd-hackers@freebsd.org
Received: from iscserv7.nepustil.net (NS.Nepustil.NET [193.96.243.22])
	by hub.freebsd.org (Postfix) with ESMTP id D23E437B41C
	for <hackers@freebsd.org>; Thu, 13 Dec 2001 14:07:15 -0800 (PST)
Received: from peotl.homeip.net (tuebpool-130.pm3.nepustil.net [212.71.200.130])
	by iscserv7.nepustil.net (Sendmail) with ESMTP
	id EA58A79E59; Thu, 13 Dec 2001 23:07:03 +0100 (CET)
Received: (from thz@localhost)
	by peotl.homeip.net (8.11.6/8.11.6) id fBDLxvl05437;
	Thu, 13 Dec 2001 22:59:57 +0100 (MET)
	(envelope-from thz)
Date: Thu, 13 Dec 2001 22:58:10 +0100
From: Thomas Zenker <thz@tuebingen.netsurf.de>
To: Matthew Dillon <dillon@apollo.backplane.com>
Cc: hackers@freebsd.org
Subject: Re: Found NFS data corruption bug... (was Re: NFS: How to make FreeBSD fall on its face in one easy step )
Message-ID: <20011213225810.A5144@peotl.homeip.net>
References: <59687.1008231593@winston.freebsd.org> <200112131058.fBDAwSR66790@apollo.backplane.com> <20011213221204.A2994@peotl.homeip.net> <200112132140.fBDLek699925@apollo.backplane.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0.1i
X-Mailer: Mutt 1.0.1i
In-Reply-To: <200112132140.fBDLek699925@apollo.backplane.com>; from dillon@apollo.backplane.com on Thu, Dec 13, 2001 at 01:40:46PM -0800
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-hackers.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-hackers>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-hackers>
X-Loop: FreeBSD.ORG

On Thu, Dec 13, 2001 at 01:40:46PM -0800, Matthew Dillon wrote:
> 
> :Matt,
> :
> :what the hell, this seems to very near by a problem I wanted to
> :report since a week:
> :
> :in a data acquisition I have a write process writing to a file
> :backed shared mmapped ringbuffer. There can be several reader
> :processes on this this ringbuffer. Now once i killed the writer for
> :resizing of the ringbuffer and forgot about the readers. The writer
> :truncated the database without unlinking it before. This lead the
> :readers to be running for ever, it seemed so at least.  After
> :attaching with gdb I saw, that they were only page faulting nothing
> :more, for ever....
> :
> :Something similar I saw with netscape going mad.
> :
> :cheers, Thomas
> 
>     That's something else.  There's no OS bug there.   When you mmap()
>     a file only those pages that are within the file's boundries are
>     valid.  So if you ftruncate() the file then all the pages occuring
>     after the (new) file EOF will become invalid and BUSfault if the 
>     process touches them.
> 
>     You touched upon the correct solution... remove() the file instead
>     of ftruncate()ing it.  The file's data then remains intact for the
>     processes still referencing it.
> 
>     The readers must be catching SIGBUS and retrying (not exiting),
>     causing them to run in a signal loop forever.  This is a case of
>     bad programming.  I've seen it before... there was a popular IRC
>     bot back in my BEST days which constantly got itself into infinite
>     loops because the guy who wrote it installed a signal handler for
>     SIGBUS.
> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon@backplane.com>


well, I know, that this was a bug in my software, not to unlink the
file first and then truncating :-). But SIGBUS was not catched in
the readers.  Will try to reproduce it.

Thomas


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message