Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Apr 2006 22:56:11 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Scott Long <scottl@samsco.org>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: How a file is deleted in ufs2?
Message-ID:  <20060411210858.G46778@delplex.bde.org>
In-Reply-To: <443AFB03.6060301@samsco.org>
References:  <1144687418.11014.9.camel@diegows> <443AFB03.6060301@samsco.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, 10 Apr 2006, Scott Long wrote:

> Diego Woitasen wrote:
>> I want to know how a file is deleted in a ufs2 filesystem, specifically
>> what happen with the information in the inode. The information is
>> deleted to or the inode is marked as free but the information (uid, gid,
>> blocks, times, etc) remains there?
>> 
>> I read the chapter 8 of 'Design and implementation of FreeBSD" and "a
>> Fast file system for Unix", but i can't see the answer.
>> 
>> Reading the code is an interesting choice, but is the last resource :)

It should be nearly the first resort.

> Two things happen when a file gets 'deleted'.  First is that the directory 
> entry for the filename gets cleared from the directory, and
> the reference count on the inode is decremented.  However, a 'file' can
> have multiple hard links, or it might still be opened by a program.
> So nothing further might happen until the reference count goes to 0.
> When that happens, the inode is zeroed and the free block bitmaps
> are updated to indicate that the data blocks and any indirect blocks
> have been freed.  Softupdates complicates this by ordering the operations, 
> but that's a whole other discussion.

No, the inode is not zeroed.  When the inode count goes to zero, the
file is truncated to size 0.  The truncate routine (ffs_truncate() for
ffs[1-2] could reasonably preserve the direct and indirect block numbers
in the inode since the file is going away, but the truncate routine
doesn't know that the context is special so it just clears all the
block numbers.  Finally, only the mode in the inode is cleared explicitly.
I think this mode-claring is somewhat gratuitous and is not done in
Linux's ext2fs.  Perhaps Linux doesn't clear the block numbers either.

Here is debugger output for a live and dead ffs1 inode:

Live on-disk inode:

% (gdb) p/x *ip
% $1 = {di_mode = 0x81a4, di_nlink = 0x1, di_u = {oldids = {0x0, 0x0}}, 
%   di_size = 0x27800, di_atime = 0x443b982f, di_atimensec = 0x0, 
%   di_mtime = 0x443b982f, di_mtimensec = 0x0, di_ctime = 0x443b9839, 
%   di_ctimensec = 0x0, di_db = {0x38, 0x40, 0x48, 0x50, 0x58, 0x60, 0x68, 
%     0x70, 0x78, 0x31, 0x0, 0x0}, di_ib = {0x0, 0x0, 0x0}, di_flags = 0x2, 
%   di_blocks = 0x13c, di_gen = 0x1ba2f568, di_uid = 0xf, di_gid = 0x3e8, 
%   di_spare = {0x0, 0x0}}

Dead on-disk inode (same inode after final unlink):

% (gdb) p/x *ip
% $3 = {di_mode = 0x81a4, di_nlink = 0x1, di_u = {oldids = {0x0, 0x0}}, 
%   di_size = 0x27800, di_atime = 0x443b982f, di_atimensec = 0x0, 
%   di_mtime = 0x443b982f, di_mtimensec = 0x0, di_ctime = 0x443b9839, 
%   di_ctimensec = 0x0, di_db = {0x38, 0x40, 0x48, 0x50, 0x58, 0x60, 0x68, 
%     0x70, 0x78, 0x31, 0x0, 0x0}, di_ib = {0x0, 0x0, 0x0}, di_flags = 0x2, 
%   di_blocks = 0x13c, di_gen = 0x1ba2f568, di_uid = 0xf, di_gid = 0x3e8, 
%   di_spare = {0x0, 0x0}}

Comparing the fields:

di_mode: gratuitously cleared
di_nlink: had to be cleared for final unlink and to show that this inode is
           unreferenced
di_u: unused before and after
di_size: cleared as a side effect of truncate
di_atime*: preserved
di_mtime*: preserved
di_ctime*: preserved
di_db: cleared as a side effect of truncate
di_ib: remains clear (would be cleared as a side effect of truncate)
di_flags: preserved
di_blocks: cleared as a side effect of truncate
di_gen: preserved
di_uid: preserved
di_gid: preserved
di_spare: remains clear

Another debugging session showed that di_mtime and di_ctime are sometimes
changed as a side effect of the final unlink.  In ext2s under Linux, there
is an explicit setting of the ctime on the final unlink (giving a deathtime)
but under FreeBSD there is apparently only a change as a side effect of
converting previous marks for update to updates.

So only the least important information is preserved.  You can only recover
empty files including some of their metadata.

> But to specifically answer your question, when an inode gets freed it
> is also zeroed and any information in it is lost permanently.  It's
> not like MSDOS FAT where just a bit gets set in the directory entry
> and the information remains valid until it is re-allocated and overwritten.

It's actually quite similar, with small implementation details making
a big difference.  With msdosfs, at least under FreeBSD and even more
fundamentally than for ffs[1-2] (since the FAT corresponds to the
bitmap of free blocks so it must be cleared), truncation of the file
results in all the block numbers for the file in the FAT being cleared.
However, the first block (cluster) number in the directory entry somehow
escapes clearing by the truncate, so the first cluster of a deleted
file can be recovered until it or its directory entry is overwritten.
ffs[1-2] can potentially retain info for _all_ blocks since it has
more indirection -- the inode and indirect blocks provide places to
retain all the old block numbers just like msdosfs's directory entry
provides a place to retain 1.

For directory entries, removal is almost equally harmless for ffs[1-2]
as for msdosfs.  Here is a difference of hex dumps of ffs1 directories
for live and dead entries "foo" and "bar"

% --- live	Tue Apr 11 22:09:28 2006
% +++ dead	Tue Apr 11 22:09:37 2006
% @@ -1,5 +1,5 @@
%  00000000  02 00 00 00 0C 00 04 01 2E 00 00 00 02 00 00 00  ................
              ---d_ino--- recln ty nl -name -pad- ---d_ino---
% -00000010  0C 00 04 02 2E 2E 00 00 03 00 00 00 10 00 04 05  ................
% +00000010  0C 00 04 02 2E 2E 00 00 03 00 00 00 E8 01 04 05  ................
              recln ty nl --name-- pd ---d_ino--- recln ty nl
%  00000020  2E 73 6E 61 70 00 00 00 04 00 00 00 0C 00 08 03  .snap...........
              ----name--------- -pad- ...
%  00000030  66 6F 6F 00 05 00 00 00 CC 01 04 03 62 61 72 00  foo.........bar.
%  00000040  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Removal of "foo" and "bar" has just changed d_reclen for ".snap" so
that the subsequent entries are inside the entry for ".snap" and thus
not seen.  The subsequent entries including their inode numbers are
preserved.  More complicated removals would result in a more complicated
layout.  I think compaction of directory entries rarely if ever occrurs.
Recovery of deleted directory entries in msdosfs is certainly simpler
since the record length is constant, but the constant record length
also makes overwrites of directory entries more likely.  Part of the
cleared i_mode field in inodes can be recovered from the d_type field
in directory entries, but it would be better to use an uncleared i_mode
as a consistency check for recovered directory entries.

> IOW, there is no easy way to undelete a file.

This is currently true, except in the rare case where undelete(2) works.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060411210858.G46778>