Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 21 Jun 2009 15:37:32 -0500 (CDT)
From:      Joe Greco <jgreco@ns.sol.net>
To:        FreeBSD-gnats-submit@FreeBSD.org
Subject:   kern/135898: Severe filesystem corruption - large files or large filesystems
Message-ID:  <200906212037.n5LKbWmt038127@aurora.sol.net>
Resent-Message-ID: <200906212140.n5LLe3j5055603@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         135898
>Category:       kern
>Synopsis:       Severe filesystem corruption - large files or large filesystems
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Sun Jun 21 21:40:03 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Joe Greco
>Release:        FreeBSD 6-7, i386/amd64, etc (incompatibilities)
>Organization:
sol.net Network Services
>Environment:

Thie is a follow-on to a problem previously described on freebsd-hackers
which was dismissed as a "controller corruption."

We had noticed odd behaviour on a 1.5TB filesystem with a large file when
moved from i386 to amd64.  A ~400GB compressed dump refused to read when
attached to an 7.*R/amd64 box.  It was suggested that the dodgy SATA ctlr
used to write the data was at fault, a theory put forth on hackers despite
the fact that the data read fine on various amd64 and i386 boxes that had 
a good SATA ctlr, and it wasn't clear how that could be possible.

Now that the filesystem restore is done, we acquired a second 1.5TB disk
and we've got some more experimental results.

Steps taken:

1) Place a Seagate 1.5TB SATA disk on a 3Ware 9550SX on a 6.1R/amd64 box.  
   Create full disk filesystem.  Mount, etc.

   # dd if=/dev/random of=file bs=1048576 count=500000

   # cat genmds
   #! /bin/sh -

   uname -rp
   (
   echo "count    1 " `dd if=file bs=1048576 count=1 | md5`
   echo "count    4 " `dd if=file bs=1048576 count=4 | md5`
   echo "count   16 " `dd if=file bs=1048576 count=16 | md5`
   echo "count  256 " `dd if=file bs=1048576 count=256 | md5`
   echo "count 1024 " `dd if=file bs=1048576 count=1024 | md5`
   ) 2> /dev/null

   # sh genmds
   6.1-RELEASE amd64
   count    1  a78d2f52290367c76837fb00a16a4e79
   count    4  ba4e51e332a27ff8e5c817b4a95501d5
   count   16  af3cd4e9a10cb5679a50081cdb35d54f
   count  256  56cc987930c0c9e8246c8f91ed9f23bb
   count 1024  4bc8e00d9c20210bd5cc0ccfbd7bb1a3

   Gives us a baseline for comparison.

2) Unmount.  Move to 7.0R/amd64 box.

   # sh genmds
   7.0-RELEASE amd64
   count    1  88f830ae7f572282a2da19ffb3d036e4
   count    4  6b38b2c18f039859b10d8d33ffcc19c9
   count   16  2d40cb233ef4be44a3c33edc79c3aa05
   count  256  4fd629256316643099ddfdfd40afe56c
   count 1024  8ea0fa80158105d5b6f1768de5ceddc6

   More terrifyingly, when repeated, some answers *change*:

   # sh genmds
   7.0-RELEASE amd64
   count    1  d7c43b568d8f72ecbd47d2dc89062704
   count    4  e670a01847d4fe08958754ea434fbf6d
   count   16  c0c962024713c3db3e8d5070f7284413
   count  256  830e32f1b862c7b867ccaf05782ff769
   count 1024  8ea0fa80158105d5b6f1768de5ceddc6

   It's not even consistent.

3) Unmount.  Move to 7.2R/i386 box.

   # sh genmds
   7.2-RELEASE i386
   count    1  a78d2f52290367c76837fb00a16a4e79
   count    4  ba4e51e332a27ff8e5c817b4a95501d5
   count   16  af3cd4e9a10cb5679a50081cdb35d54f
   count  256  56cc987930c0c9e8246c8f91ed9f23bb
   count 1024  4bc8e00d9c20210bd5cc0ccfbd7bb1a3

   Lookin' good.

So.  Experiment further.  Place a Deskstar 250GB SATA on the 6.1R/amd64,
write file.

4) basically same as 1), abbreviated for clarity

   # cat 61r-amd64 ; sh genmds
   6.1-RELEASE amd64
   count    1  65827a57009f618b3638f557246f40d8
   count    4  b3f5e9743173c29211545ff42f6df15d
   count   16  1b6a9d862522bf1091a47f91b874470c
   count  256  64545fa4dc6af95519dfd6644639518f
   count 1024  3ed6942e979a794eaa3acdd2908543f4
   7.0-RELEASE amd64
   count    1  65827a57009f618b3638f557246f40d8
   count    4  b537d1a50c85ee2e49cddb74a0afa9d0
   count   16  5632bd5c03008dd89ea332f19aa8240c
   count  256  980bb1ba249fcff0136867bb8b461a3c
   count 1024  980bb1ba249fcff0136867bb8b461a3c

Well now that's interesting and puzzling.  Turns out that dd is failing
with

dd: file: Input/output error
144+0 records in
144+0 records out
150994944 bytes transferred in 1.816704 secs (83114773 bytes/sec)
count  256  980bb1ba249fcff0136867bb8b461a3c
dd: file: Input/output error
144+0 records in
144+0 records out
150994944 bytes transferred in 1.816477 secs (83125159 bytes/sec)
count 1024  980bb1ba249fcff0136867bb8b461a3c

and I'm seeing

g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6844985820815177728, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-7367456256827488256, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=7136992927258085376, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-6809018773487804416, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-9105783899823685632, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-8531636795393505280, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=-1479149208900890624, length=16384)]error = 5
g_vfs_done():da1s1e[READ(offset=110823770104102912, length=16384)]error = 5

And... I'm guessing based on what I see that the "I/O error" is simply a
ludicrous offset, but I could be wrong.

My best guess is that there is something amiss on FreeBSD 7.*/amd64 relating
to the filesystem code.

It appears to be repeatable.  Would anyone care to try?

... JG
>Description:
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200906212037.n5LKbWmt038127>