Date: Fri, 05 Jan 2001 21:08:28 +0100 From: Martin Birgmeier <Martin.Birgmeier@aon.at> To: FreeBSD-gnats-submit@FreeBSD.ORG Subject: kern/24092: Disk data corruption using FreeBSD_4_2_0_RELEASE Message-ID: <3A5629BC.8147CE64@aon.at> Resent-Message-ID: <200101052010.f05KA2G65273@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 24092
>Category: kern
>Synopsis: Disk data corruption using FreeBSD_4_2_0_RELEASE
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Fri Jan 05 12:10:01 PST 2001
>Closed-Date:
>Last-Modified:
>Originator: Martin.Birgmeier@aon.at (Martin Birgmeier)
>Release: FreeBSD 4.2-RELEASE i386
>Organization:
MBi at home
>Environment:
ASUS A7V, 256 MB main memory
disks as shown below
atapci0: <VIA 82C686 ATA66 controller> port 0xd800-0xd80f at device 4.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
atapci1: <Promise ATA100 controller> port
0x8000-0x803f,0x8400-0x8403,0x8800-0x8807,0x9000-0x9003,0x9400-0x9407 mem 0xd5000000-0xd501ffff irq
10 at device 17.0 on pci0
ad0: 27199MB <ST328040A> [55262/16/63] at ata0-master UDMA33
ad3: 29188MB <ST330630A> [59303/16/63] at ata1-slave UDMA66
acd0: CDROM <TOSHIBA CD-ROM XM-6602B> at ata0-slave using UDMA33
acd1: CD-RW <PLEXTOR CD-R PX-W1210A> at ata1-master using WDMA2
Mounting root from ufs:/dev/ad0s4a
df output:
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s4a 79359 40819 32192 56% /
devfs 16 16 0 100% dummy_mount
/dev/ad0s4e 39647 3138 33338 9% /var
/dev/ad0s4f 8825163 6923321 1195829 85% /usr
/dev/ad0s4g 13605124 8565455 3951260 68% /d/5s4g
/dev/ad3s1a 79359 53254 19757 73% /d/6s1a
/dev/ad3s4e 22491456 8855727 11836413 43% /d/6s4e
mfs:34 127023 16 116846 0% /tmp
procfs 4 4 0 100% /proc
devfs 16 16 0 100% /devs
<above>:/usr/X11R6.local 17650326 15748484 1195829 93% /usr/X11R6.4
pid147@gandalf:/vol 0 0 0 100% /vol
pid147@gandalf:/users 0 0 0 100% /users
pid147@gandalf:/srcs 0 0 0 100% /srcs
pid147@gandalf:/opt 0 0 0 100% /opt
pid147@gandalf:/d/auto 0 0 0 100% /d/auto
- Sources gotten via CTM
- checkout via `cvs -R co -rRELENG_4_2_0_RELEASE src'
- buildworld, installworld
(In fact, buildworld stopped on a corrupted source file, which was one of
the earliest hints I got that something is very wrong.)
>Description:
See below. When doing a cmp -x on the files just copied, it turns
out that blocks of size 2 ** n, with n between 6 and 12 inclusive,
are corrupt (sometimes more than one such block in the same
file).
Fortunately, mostly (but not only!) long files are affected.
>How-To-Repeat:
Use the following shell script. The file "SRC" contains data:
% ls -l /d/5s4g/fileX
-rw-r--r-- 1 root wheel 1083285504 Jan 2 14:24 .../fileX
%
----------------------------------------------------------------------
#! /bin/sh
SRC=/d/5s4g/fileX
DST=/d/6s4e/file
for i in 1 2 3 4 5 6 7 8
do
echo "*** $i ***"
dd if="$SRC" of="${DST}$i" bs=102400k || break
for j in 1 2 3
do
cmp "$SRC" "${DST}$i" && break
done
done
----------------------------------------------------------------------
What happens is that in about 50% of the cases, the compare does not
succeed (I once had a case where a compare failed on the first try,
but later succeeded; hence the triple comparison).
This happens most often on large(r) files, which is exactly the reason
why I am using a file of about 1 GB for testing.
Notes: I tested copying a file of 800 MB under Win98 twice - no
problems. In addition, I installed Suse Linux 7.0 on ad0s3, and
tested copying a file of about 400 MB four times using the above
shell script (as in `for i in 1 2 3 4'...). No problems. Reason
for somewhat smaller file sizes is that I don't have much disk
space devoted to the other environments.
>Fix:
Unknown.
However, I tried the following, without any improvements:
- In /sys/dev/ata/ata-all.c, made ata_umode() return -1 always. As a
result, the disks used WDMA2.
- In /sys/dev/ata/ata-disk.c, ad_attach() (only one at a time of the
following items):
. fixed adp->transfersize at DEV_BSIZE
. disabled write caching
With all this, I am pretty sure that the problem lies not with my
hardware, but within the vm/buffer subsystem and its interaction
with some other service, possibly malloc (corruptions seem to
always be powers of two in length, see above).
--
Martin Birgmeier
Vienna
Austria
>Release-Note:
>Audit-Trail:
>Unformatted:
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3A5629BC.8147CE64>
