From owner-freebsd-bugs@FreeBSD.ORG Tue Nov 28 17:41:02 2006 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 27E0016A511 for ; Tue, 28 Nov 2006 17:41:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [69.147.83.40]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8B7BF43CAF for ; Tue, 28 Nov 2006 17:40:35 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.4/8.13.4) with ESMTP id kASHeJmX083614 for ; Tue, 28 Nov 2006 17:40:19 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.4/8.13.4/Submit) id kASHeIq8083613; Tue, 28 Nov 2006 17:40:19 GMT (envelope-from gnats) Resent-Date: Tue, 28 Nov 2006 17:40:19 GMT Resent-Message-Id: <200611281740.kASHeIq8083613@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Oliver Fromme Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 188AA16A49E for ; Tue, 28 Nov 2006 17:37:43 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2625C43C9E for ; Tue, 28 Nov 2006 17:37:37 +0000 (GMT) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (spslgh@localhost [127.0.0.1]) by lurza.secnetix.de (8.13.4/8.13.4) with ESMTP id kASHbMhc015502; Tue, 28 Nov 2006 18:37:27 +0100 (CET) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.13.4/8.13.1/Submit) id kASHbM2V015501; Tue, 28 Nov 2006 18:37:22 +0100 (CET) (envelope-from olli) Message-Id: <200611281737.kASHbM2V015501@lurza.secnetix.de> Date: Tue, 28 Nov 2006 18:37:22 +0100 (CET) From: Oliver Fromme To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Oliver Fromme Subject: kern/105964: Make MSDOSFS_LARGE a mount option X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Oliver Fromme List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 28 Nov 2006 17:41:02 -0000 >Number: 105964 >Category: kern >Synopsis: Make MSDOSFS_LARGE a mount option >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Tue Nov 28 17:40:17 GMT 2006 >Closed-Date: >Last-Modified: >Originator: Oliver Fromme >Release: n/a >Organization: secnetix GmbH & Co. KG http://www.secnetix.de/bsd >Environment: n/a >Description: This problem has been discussed on the -stable mailing list, an Craig Rodrigues asked me to submit a PR for this issue because he's interested to pick it up. So here we go. The FAT file system format doesn't support file ID numbers (UFS/FFS calls them "inode numbers"). Therefore MSDOSFS has to create such numbers somehow. Currently there are two hacks for that purpose, with different drawbacks: -1- (Default) Use the directory entry offset of the file as the file ID number. Assume that the hole media is divided into blocks the size of a directory entry (32 bytes), and use that "block number" for the file ID. Since file ID numbers (a.k.a. inodes) are 32 bit, that algorithm will overflow above 32 * 2^32 = 128 GB. If you try to mount a FAT file system larger than 128 GB, it will fail and print "disk too big, sorry". -2- (With MSDOSFS_LARGE in the kernel) Maintain a table that dynamically maps 64bit offsets (that are computed like above) to 32bit ID numbers. This works for FAT file systems of any size > 128 GB (the code falls back to algorithm 1 for file systems < 128 GB). Two drawbacks: -A- If a large number of files is accessed, the table will grow very big and consume much kernel memory. It is possible that the machine panics when it runs out of kernel memory. -B- Since, the mapping is dynamic, file ID numbers may be different when the file system is unmounted and re-mounted. That will break NFS exports, because NFS assumes that file ID numbers (which are used for NFS handles) are constant. It should be noted that those drawbacks only apply if the file system is > 128 GB. For smaller file systems the code will automatically use the simpler algorithm described first. This is controlled by the flag MSDOSFS_LARGEFS (different from MSDOSFS_LARGE!). >How-To-Repeat: Try to mount FAT file systems of various sizes and encounter the situations mentioned above. For testing and experimenting, you can easily use a md(4) device and newfs_msdos(8) to create a 160 GB FAT disk: # truncate -s 160000000000 testfat.img # mdconfig -a -t vnode -f testfat.img md1 # fdisk -BI /dev/md1 ******* Working on device /dev/md1 ******* # newfs_msdos -s 312496317 -c 128 -h 254 -u 63 /dev/md1s1 orb /dev/md1s1: 312458112 sectors in 2441079 FAT32 clusters (65536 bytes/cluster) bps=512 spc=128 res=32 nft=2 mid=0xf0 spt=63 hds=254 hid=0 bsec=312496317 bspf=19071 rdcl=2 infs=1 bkbs=2 # mount -t msdos -o ro /dev/md1s1 /mnt mount_msdosfs: /dev/md1s1: Invalid argument # dmesg | tail -1 mountmsdosfs(): disk too big, sorry (Note: The newfs_msdos command is not very fast. It will take a few seconds.) >Fix: Unfortunately, there is no real fix known for the problem. However, the problem is made worse by the fact that you have to recompile your kernel and reboot in order to enable the second hack (kernel option MSDOSFS_LARGE). That aspect of the problem could be fixed by making it a mount option instead of a kernel compile option, essentially converting the #ifdef's to regular if's. It has been considered to even enable the MSDOSFS_LARGE code by default. However, because of the drawbacks (i.e. possibility of a panic because of kernel memory usage, and inability to NFS-export the file system) it should only be used if specifically requested by the user. >Release-Note: >Audit-Trail: >Unformatted: