Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 22 Apr 2008 09:49:22 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Dominic Fandrey <kamikaze@bsdforen.de>
Cc:        freebsd-bugs@freebsd.org, gavin@freebsd.org
Subject:   Re: kern/122961: write operation on msdosfs file system causes panic
Message-ID:  <20080422084732.H63563@delplex.bde.org>
In-Reply-To: <480CC6F4.1000200@bsdforen.de>
References:  <200804211445.m3LEjNh6018941@freefall.freebsd.org> <480CC6F4.1000200@bsdforen.de>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 21 Apr 2008, Dominic Fandrey wrote:

> gavin@FreeBSD.org wrote:
>> To submitter: are you able to connect the USB stick to a machine
>> running Windows and run chkdsk, to confirm that the filesystem
>> is not invalid?  (Although we should ideally be resiliant to
>> corrupt filesystems, if it still panics after a chkdisk then it's
>> a more serious problem...)
>> 
>
> I have already checked the stick under windows. Chkdisk did not find any 
> problems, but the panic still occurs.
>
> The problem started after I updated RELENG_7 on my machine this weekend. The 
> previous RELENG_7 build was ~2 months old.

This seems to be a bug in usb (umass) or the particular usb drive.
msdosfs now uses the drive's advertised max i/o size (mp->mnt_iosize_max)
to implement vfs clustering, but mnt_iosize_max seems to be broken for
some drives.  This is only a theory because bug reporters never repond
to requests for more info.

Note that there are lots of bugs in the initialization of mp->mnt_iosize_max.
It is always MAXPHYS (128K), but few drives support this.  Goem bogusly
splits up large i/o's into units that the drive claims to support
(d_maxsize).  d_maxsize is bogusly initialized to the fixed value of
DFLTPHYS (64K) in many drivers including da.  Bad things then happen if
a scsi drive doesn't actually support d_maxsize = 64K.

To check that this is the bug, mount msdosfs with -o noclusterr,noclusterw
under RELENG_7 or later (the bug also affects RELENG_6, but these mount
options are broken in RELENG_6).  Then write and read some files, using
write() and not mmap().  (Use, dd or cp a file larger than 8M.  cp always
uses mmap() for files smaller than 8M (a good pessimization if the file
is not in the buffer cache), and the nocluster* mount options don't affect
mmap() for any file system (another bug), and there is no option to prevent
cp using mmap().).  Then remount without nocluster* and repeat.  The bug
should only affect the repeat.

> # mount
> /dev/ufs/2root on / (ufs, local)
> devfs on /dev (devfs, local)
> /dev/ufs/2tmp on /tmp (ufs, local, soft-updates)
> /dev/ufs/2usr on /usr (ufs, NFS exported, local, soft-updates)
> /dev/ufs/2var on /var (ufs, local, soft-updates)
> pid874@mobileKamikaze:/var/run/automounter.amd.mnt on 
> /var/run/automounter.amd.mnt (nfs)
> /dev/msdosfs/APRIL RYAN on 
> /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a (msdosfs, 
> local, noatime, noexec)

The labels obfuscate the device type for all mountpoints very well.

Your backtrace showed a panic in mmap().  mmap() actually uses the
support for vfs clustering (VOP_BMAP()), not vfs clustering itself,
to determine the size of the largest contiguous i/o that is possible.
It's possible that the bug only affects mmap(), but I doubt it.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080422084732.H63563>