Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 29 May 1999 01:46:44 -0700 (PDT)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        hackers@FreeBSD.ORG
Subject:   Possible race in pipe device driver, esp on multi-cpu machines.
Message-ID:  <199905290846.BAA29926@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help
    A friend of mine upgraded one of his machines to a duel-cpu
    box and upgraded the OS to -STABLE, and he noticed that his
    backups were being corrupted.  The corruption appears to occur when
    he transfers huge gzip'd tar files over a 100BaseTX network:

	rsh remote -n "cat remotefile" > localfile
	ssh remote -n "cat remotefile" > localfile
	rcp remote:remotefile localfile
	scp remote:remotefile localfile

    The remotefile in this case is a huge 192MB gzip'd tar file.  Portions
    of the file get corrupted - generally the corruption consists of a small
    sequence of 8 or so bytes at a random offset in the file being repeated
    twice, destroying data that should have been sent instead.  The corrupted
    file winds up being the same size as the original file, but with 
    occassional repeating patterns.

    Through experimentation we determined that it was NOT the TCP connection,
    programs run, or filesystem that are introducing the corrupted data.

    I believe that the problem is situated in the pipe device driver.  It
    should also be noted that one of my friend's machines is a duel-cpu
    machine.  Both are running FreeBSD-STABLE.  If I replace the "cat" in
    the rsh/ssh with a "dd bs=4k", the corruption goes away.  I think the
    problem is occuring on the duel-cpu machine by creating a window of
    opportunity due to the way processes are scheduled that is not being
    protected by the pipe device.

    I seem to recall that the pipe device tries to optimize certain situations
    when a reader is blocked waiting for input.  This code is my primary
    suspect at the moment.

    We are attempting to reproduce the problem with a smaller dataset, but
    if anyone is hot on the pipe code in the kernel and can give it a once-over
    we may be able to find the bug more quickly.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199905290846.BAA29926>