From owner-freebsd-fs@FreeBSD.ORG Mon Apr 16 21:40:11 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B47A31065670 for ; Mon, 16 Apr 2012 21:40:11 +0000 (UTC) (envelope-from truckman@FreeBSD.org) Received: from gw.catspoiler.org (gw.catspoiler.org [75.1.14.242]) by mx1.freebsd.org (Postfix) with ESMTP id 746278FC20 for ; Mon, 16 Apr 2012 21:40:11 +0000 (UTC) Received: from FreeBSD.org (mousie.catspoiler.org [192.168.101.2]) by gw.catspoiler.org (8.13.3/8.13.3) with ESMTP id q3GLUYMG013359; Mon, 16 Apr 2012 14:30:38 -0700 (PDT) (envelope-from truckman@FreeBSD.org) Message-Id: <201204162130.q3GLUYMG013359@gw.catspoiler.org> Date: Mon, 16 Apr 2012 14:30:33 -0700 (PDT) From: Don Lewis To: rysto32@gmail.com In-Reply-To: MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Cc: freebsd-fs@FreeBSD.org Subject: Re: SU: Could an unclean shutdown cause a file with outstanding writes to become sparse after fsck? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 Apr 2012 21:40:11 -0000 On 16 Apr, Ryan Stone wrote: > Today I encountered a system running a very old version of FreeBSD > (6.1-ish) that was stuck in a reboot loop. I was eventually able to > discover that the system was running into a long-since fixed bug where > the system would panic if you tried to execute a sparse file. From > what I've been able to get from the owner of this system, it sounds > like the machine reset during a system upgrade. I suspect that the > initial reset was unrelated (a different long-since fixed panic or a > power loss, maybe), and that some executables that had outstanding > writes before the reset ended up becoming sparse when fsck was run. > Is this possible? The filesystem was running soft-updates, and I'm > really not familiar enough with either soft-updates or even the UFS > on-disk metadata to say whether this is reasonable. Yes an unclean shutdown can cause a new file that is being written to become sparse, especially if you are using tagged commands on SCSI or NCQ with SATA and write caching disabled. I've seen it happen. Even if the file is being written sequentially, there is no guarantee that the drive will actually write the data and report the write completions in a sequential manner. With SU, the block pointers for the file won't get written until the drive reports the writes are complete, so if there is an unclean shutdown, some of the block pointers may still be zero, creating a sparse file. At least all of the block pointers that are present will point to valid data. This particular problem would probably not occur if write caching was enabled and the unclean shutdown was caused by a system panic because all of the data would probably be in the drive's write cache and would eventually get written even after the crash. If write caching is enabled, then all bets are off if there is a power failure because the unwritten contents of the drive's write cache would be lost. Some of the file's block pointers could be pointing to random garbage, and there could be unconsistencies that fsck can't automatically fix. This would require a manual fsck and can cause data loss. When install is invoked with the -S flag, it should probably call fsync() on the destination file after it is done writing and before it used rename() to replace the target file.