From owner-freebsd-bugs@FreeBSD.ORG Mon Apr 21 23:49:35 2008 Return-Path: Delivered-To: freebsd-bugs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2450F106564A for ; Mon, 21 Apr 2008 23:49:35 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail17.syd.optusnet.com.au (mail17.syd.optusnet.com.au [211.29.132.198]) by mx1.freebsd.org (Postfix) with ESMTP id BE4ED8FC2A for ; Mon, 21 Apr 2008 23:49:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c220-239-252-11.carlnfd3.nsw.optusnet.com.au (c220-239-252-11.carlnfd3.nsw.optusnet.com.au [220.239.252.11]) by mail17.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id m3LNnMDL019337 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 22 Apr 2008 09:49:25 +1000 Date: Tue, 22 Apr 2008 09:49:22 +1000 (EST) From: Bruce Evans X-X-Sender: bde@delplex.bde.org To: Dominic Fandrey In-Reply-To: <480CC6F4.1000200@bsdforen.de> Message-ID: <20080422084732.H63563@delplex.bde.org> References: <200804211445.m3LEjNh6018941@freefall.freebsd.org> <480CC6F4.1000200@bsdforen.de> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-bugs@freebsd.org, gavin@freebsd.org Subject: Re: kern/122961: write operation on msdosfs file system causes panic X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Apr 2008 23:49:35 -0000 On Mon, 21 Apr 2008, Dominic Fandrey wrote: > gavin@FreeBSD.org wrote: >> To submitter: are you able to connect the USB stick to a machine >> running Windows and run chkdsk, to confirm that the filesystem >> is not invalid? (Although we should ideally be resiliant to >> corrupt filesystems, if it still panics after a chkdisk then it's >> a more serious problem...) >> > > I have already checked the stick under windows. Chkdisk did not find any > problems, but the panic still occurs. > > The problem started after I updated RELENG_7 on my machine this weekend. The > previous RELENG_7 build was ~2 months old. This seems to be a bug in usb (umass) or the particular usb drive. msdosfs now uses the drive's advertised max i/o size (mp->mnt_iosize_max) to implement vfs clustering, but mnt_iosize_max seems to be broken for some drives. This is only a theory because bug reporters never repond to requests for more info. Note that there are lots of bugs in the initialization of mp->mnt_iosize_max. It is always MAXPHYS (128K), but few drives support this. Goem bogusly splits up large i/o's into units that the drive claims to support (d_maxsize). d_maxsize is bogusly initialized to the fixed value of DFLTPHYS (64K) in many drivers including da. Bad things then happen if a scsi drive doesn't actually support d_maxsize = 64K. To check that this is the bug, mount msdosfs with -o noclusterr,noclusterw under RELENG_7 or later (the bug also affects RELENG_6, but these mount options are broken in RELENG_6). Then write and read some files, using write() and not mmap(). (Use, dd or cp a file larger than 8M. cp always uses mmap() for files smaller than 8M (a good pessimization if the file is not in the buffer cache), and the nocluster* mount options don't affect mmap() for any file system (another bug), and there is no option to prevent cp using mmap().). Then remount without nocluster* and repeat. The bug should only affect the repeat. > # mount > /dev/ufs/2root on / (ufs, local) > devfs on /dev (devfs, local) > /dev/ufs/2tmp on /tmp (ufs, local, soft-updates) > /dev/ufs/2usr on /usr (ufs, NFS exported, local, soft-updates) > /dev/ufs/2var on /var (ufs, local, soft-updates) > pid874@mobileKamikaze:/var/run/automounter.amd.mnt on > /var/run/automounter.amd.mnt (nfs) > /dev/msdosfs/APRIL RYAN on > /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a (msdosfs, > local, noatime, noexec) The labels obfuscate the device type for all mountpoints very well. Your backtrace showed a panic in mmap(). mmap() actually uses the support for vfs clustering (VOP_BMAP()), not vfs clustering itself, to determine the size of the largest contiguous i/o that is possible. It's possible that the bug only affects mmap(), but I doubt it. Bruce