Date: Sun, 2 Sep 2012 14:52:28 +0200 From: Mateusz Guzik <mjguzik@gmail.com> To: Marcel Moolenaar <marcel@xcllnt.net> Cc: freebsd-fs@freebsd.org, Grzegorz Bernacki <gber@freebsd.org> Subject: Re: NANDFS: out of space panic Message-ID: <20120902125228.GA29075@dft-labs.eu> In-Reply-To: <96DC4416-6CA5-45B4-B790-068797FAA2C6@xcllnt.net> References: <CAP=KkTz3%2B7UbfBcW9D_8VHv-Rw7BxNyG5xiVFxG4L-Zq1skwJw@mail.gmail.com> <96DC4416-6CA5-45B4-B790-068797FAA2C6@xcllnt.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Aug 31, 2012 at 01:09:40PM +0100, Marcel Moolenaar wrote: > > On Aug 22, 2012, at 4:34 PM, Boris Astardzhiev <boris.astardzhiev@gmail.com> wrote: > > Now when I attempt to delete /mnt/file1234 I get a panic: > > root@smartcpe:/mnt # rm file1234 > > panic: bmap_truncate_mapping: error 28 when truncate at level 1 > > I think this is a step in the right direction and should help for originally reported testcase: http://people.freebsd.org/~mjg/patches/nandfs-out-of-space1.diff > 2. There's a real bug. For me it gives the following panic > (by virtue of me changing the behaviour of point 1): > > nandfs_new_segment: cannot create segment error 1 > create_segment: cannot create next segment [snip] > panic: brelse: not dirty > cpuid = 0 > KDB: enter: panic > While error handling in this case is clearly bogus, I believe that the sole fact that nandfs ran out of free segments is a sign of another and more important bug. Mail ended up quite lengthy with some repetitions and possibly bad English, so sorry for that. I didn't touch this filesystem for a couple of months now and may remember things incorrectly. Also ideas presented here are my own and may or may not be of any value. Some definitions to help avoid confusion: segment - fixed size continunous area filled with blocks containing user and filesystem data; partitions consist of some number of segments sufile - segment usage file (tells you which segments are free and so on) datfile - virtual-physical block translation map ifile - inode file, contains all user file inodes Free space is reclaimed as follows: 1. cleaner reads n segments and dirties blocks that are still in use 2. syncer writes dirtied blocks in new segments along with updated sufile and erases old segments 3. repeat with another n next segments. if reached end of partition, start from the beginning In other words, nandfs needs some space in order to reclaim anything. Thus, if the user is allowed to use all available segments, nandfs is unable to clean up. fs should allow the user to write data to some point. After that point is reached it should be possible to remove at least some parts of data (which results in writes). And after that it should be possible to reclaim free space (which results in additional writes). So we need safe enough first threshold (i.e. you can reach it, delete everything from fs and it still has place to clean up) or safe enough second threshold (you are allowed to delete stuff to some point). In both cases fs can return ENOSPC or try to adapt to situation by suspending write operations and trying to free up more space that in normal conditions. nandfs currently maintains only one threshold and returns ENOSPC when reached. Only removal operations are allowed (as noted earlier, this causes additional writes, threshold is just ignored for such cases). And unfortunately this can leave fs without free segments. So this is a "first threshold" with incorrect value. Some ways in which nandfs can adapt: Less coding-way: temporarily increase number of scanned segments per iteration or frequency of iterations until acceptable level of free space is reached (or there is no more space to reclaim). More coding-way: scan entire filesystem and free up top n segments with stale blocks. Possibly track this information constantly so that full scan is required once per mount. Another thing that can help is reduction of data written during deletion. I believe that during large file removal intermediate bmap blocks can be written, even though a segment later such blocks become stale. datfile and ifile never shrink. So if you happen to "free" all virtual block numbers from given datfile block, it's still carried around even though it could be just removed (note that it does not mean that datfile leaks anything - such blocks are used again as new vblocks are allocated, similar situation with ifile). Again, these ideas my be completely bogus or of little value. > Also: design documentation is missing right now, which > does mean that there's a pretty steep curve for anyone > who didn't write the file system to go in and fix any > bugs. > While it's true that there is no documentation describing current state of nandfs, there still are some ideas that originated from nilfs2, so one can get some understanding after reading their materials. -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120902125228.GA29075>