Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 Feb 1999 15:42:21 -0500
From:      Andrew Heybey <ath@niksun.com>
To:        FreeBSD-gnats-submit@freebsd.org
Subject:   kern/10243: read(2) returns garbage
Message-ID:  <199902242042.PAA24006@stiegl.niksun.com>

next in thread | raw e-mail | index | archive | help

>Number:         10243
>Category:       kern
>Synopsis:       Under heavy disk and network load read(2) can return garbage.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Feb 25 10:20:02 PST 1999
>Closed-Date:
>Last-Modified:
>Originator:     Andrew Heybey
>Release:        FreeBSD 3.1-RELEASE i386
>Organization:
Niksun
>Environment:

	3.1-RELEASE GENERIC kernel (+ bpf)
	450Mhz P-II, 256MB memory, Asus P2B-LS motherboard.
	Adaptec 7890 SCSI controller, IBM DRVS09V (10000 RPM LVD) disks
	Intel EtherExpress Pro 10/100B ethernet

	Full dmesg output (or any other info) available to anyone who
	wants to look into this.	

>Description:

The bug is that under certain loads, read(2) can return corrupted data
(ie data that are not in the file on disk).  The instances I have seen
are relatively small amounts (8-64 bytes) of corrupt data at the end
of a 4k page.  The corrupt data is from a file previously read or
another position in the current file.  I have also seen this problem
in 3.0-RELEASE but not in 2.2.8-RELEASE.

Specifically, I can reproduce the bug under the following conditions
(I am sorry that I don't have a smaller and simpler test case):

1) Multiple processes reading a set of large files.  I believe that
   the amount of data must be large enough such that the reads come
   from disk, not the cache (if I only read one 50MB file, I do not
   see the bug).  (I have used 1.5GB of data files on a system with
   256MB of physical memory.)  I also believe that multiple read processes must
   be running (I have used 4 processes and found the bug, but not with
   only one process).

   The files that I have used are filled with sequential integers.
   This allows my test program to know if it gets bogus data from
   read(2), since it knows what should be there.

*AND*

2) Very high network interrupt rate.  I have tested on a fast ethernet
   receiving at about 46000 packets/sec.  I use bpf to get the network
   interrupt rate up that high without having to do any protocol
   processing.  I don't know if the network or bpf code has anything
   to do with the bug or if it is just that the high load stimulates
   some cam/vm/ufs/bpf bug.  I have not been able to reproduce the bug
   without this high load.  Both zero pkts/sec and 3000 pkts/sec do
   *not* exhibit the bug (or at least not after running for several
   hours), while with the network load it will usually occur within 10
   minutes.

>How-To-Repeat:

	I have put a small suite of programs that I use to produce
	this bug at http://www.niksun.com/~ath/fbsd_bug.tgz.  The
	tar file contains a few test programs and complete
	instructions on how I tickle the bug.

	I have reproduced the bug on two different machines, so I
	don't think that the hw is broken (though the machines have
	substantially the same kind of hardware so it is conceivable
	that it is a HW misdesign of some kind).

	I welcome advice on how to track this down.  It smells to me
	like an insufficient-application-of-splfoo bug, but I'm not
	even sure where to start looking.  For example why would
	network I/O and BPF have any effect on disk reads?

	Even better, I suppose, would be someone to tell me that I'm
	an idiot and my test program is broken.  But it is really a
	very simple program and has run for hours without a problem
	when there is negligible network load.

>Fix:
	
	I wish.

>Release-Note:
>Audit-Trail:
>Unformatted:


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199902242042.PAA24006>