From owner-freebsd-fs@FreeBSD.ORG Sun Sep 2 12:52:37 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id DB3D4106564A; Sun, 2 Sep 2012 12:52:37 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-ee0-f54.google.com (mail-ee0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3E17D8FC08; Sun, 2 Sep 2012 12:52:37 +0000 (UTC) Received: by eeke52 with SMTP id e52so1754575eek.13 for ; Sun, 02 Sep 2012 05:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=pYcfFabUEtz9VSpFIQkH5xqQ0r5n8c3vOqmJJGUIBT8=; b=hpCJ16QSlGRQ3X+pkLwJwMLQxecWkHckM7AMfGj4dluWjEzfnCFtglSQfX5JuUjcGd HpJM9+gdwqvVc9RS0ZcuK+jfNnErbVUNqwNjQ7HfeFRKPvlBa3cRQcwFxdAZIpp85Ken gWxdFp+ak184s2Zw5hjzSUDM6rj3UX1QSgm8JdBqKpIiOLVXkmi6fE/Od5dnyIoigpzv jvd8Oqn9DnhA+M94UV4jCLIsir8QfLSQ7oMLPVQPcAMDZz/XVt6B2g0Iw3y6Uf8hxkn9 TiUU8BZ7RxudMxrNRLrV5s5PRJ2ktiLswRlb1l9yN9lTRQQPRfFbLEuSeUqclR4DBWiH u9Xg== Received: by 10.14.213.137 with SMTP id a9mr17462033eep.38.1346590356197; Sun, 02 Sep 2012 05:52:36 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by mx.google.com with ESMTPS id h42sm28069245eem.5.2012.09.02.05.52.34 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 02 Sep 2012 05:52:35 -0700 (PDT) Date: Sun, 2 Sep 2012 14:52:28 +0200 From: Mateusz Guzik To: Marcel Moolenaar Message-ID: <20120902125228.GA29075@dft-labs.eu> References: <96DC4416-6CA5-45B4-B790-068797FAA2C6@xcllnt.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <96DC4416-6CA5-45B4-B790-068797FAA2C6@xcllnt.net> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-fs@freebsd.org, Grzegorz Bernacki Subject: Re: NANDFS: out of space panic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 02 Sep 2012 12:52:38 -0000 On Fri, Aug 31, 2012 at 01:09:40PM +0100, Marcel Moolenaar wrote: > > On Aug 22, 2012, at 4:34 PM, Boris Astardzhiev wrote: > > Now when I attempt to delete /mnt/file1234 I get a panic: > > root@smartcpe:/mnt # rm file1234 > > panic: bmap_truncate_mapping: error 28 when truncate at level 1 > > I think this is a step in the right direction and should help for originally reported testcase: http://people.freebsd.org/~mjg/patches/nandfs-out-of-space1.diff > 2. There's a real bug. For me it gives the following panic > (by virtue of me changing the behaviour of point 1): > > nandfs_new_segment: cannot create segment error 1 > create_segment: cannot create next segment [snip] > panic: brelse: not dirty > cpuid = 0 > KDB: enter: panic > While error handling in this case is clearly bogus, I believe that the sole fact that nandfs ran out of free segments is a sign of another and more important bug. Mail ended up quite lengthy with some repetitions and possibly bad English, so sorry for that. I didn't touch this filesystem for a couple of months now and may remember things incorrectly. Also ideas presented here are my own and may or may not be of any value. Some definitions to help avoid confusion: segment - fixed size continunous area filled with blocks containing user and filesystem data; partitions consist of some number of segments sufile - segment usage file (tells you which segments are free and so on) datfile - virtual-physical block translation map ifile - inode file, contains all user file inodes Free space is reclaimed as follows: 1. cleaner reads n segments and dirties blocks that are still in use 2. syncer writes dirtied blocks in new segments along with updated sufile and erases old segments 3. repeat with another n next segments. if reached end of partition, start from the beginning In other words, nandfs needs some space in order to reclaim anything. Thus, if the user is allowed to use all available segments, nandfs is unable to clean up. fs should allow the user to write data to some point. After that point is reached it should be possible to remove at least some parts of data (which results in writes). And after that it should be possible to reclaim free space (which results in additional writes). So we need safe enough first threshold (i.e. you can reach it, delete everything from fs and it still has place to clean up) or safe enough second threshold (you are allowed to delete stuff to some point). In both cases fs can return ENOSPC or try to adapt to situation by suspending write operations and trying to free up more space that in normal conditions. nandfs currently maintains only one threshold and returns ENOSPC when reached. Only removal operations are allowed (as noted earlier, this causes additional writes, threshold is just ignored for such cases). And unfortunately this can leave fs without free segments. So this is a "first threshold" with incorrect value. Some ways in which nandfs can adapt: Less coding-way: temporarily increase number of scanned segments per iteration or frequency of iterations until acceptable level of free space is reached (or there is no more space to reclaim). More coding-way: scan entire filesystem and free up top n segments with stale blocks. Possibly track this information constantly so that full scan is required once per mount. Another thing that can help is reduction of data written during deletion. I believe that during large file removal intermediate bmap blocks can be written, even though a segment later such blocks become stale. datfile and ifile never shrink. So if you happen to "free" all virtual block numbers from given datfile block, it's still carried around even though it could be just removed (note that it does not mean that datfile leaks anything - such blocks are used again as new vblocks are allocated, similar situation with ifile). Again, these ideas my be completely bogus or of little value. > Also: design documentation is missing right now, which > does mean that there's a pretty steep curve for anyone > who didn't write the file system to go in and fix any > bugs. > While it's true that there is no documentation describing current state of nandfs, there still are some ideas that originated from nilfs2, so one can get some understanding after reading their materials. -- Mateusz Guzik