Date: Sat, 4 Jul 2015 15:30:34 +0200 From: Mateusz Guzik <mjguzik@gmail.com> To: Renato Botelho <garga@FreeBSD.org> Cc: FreeBSD Hackers <freebsd-hackers@freebsd.org> Subject: Re: rename() + fsync() implementation Message-ID: <20150704133034.GA3102@dft-labs.eu> In-Reply-To: <2770DA5F-D7F2-46D9-9158-10C86115F8AC@FreeBSD.org> References: <2770DA5F-D7F2-46D9-9158-10C86115F8AC@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jul 03, 2015 at 01:59:43PM -0300, Renato Botelho wrote: > Some time ago we found a bug on pfSense and after investigating we figured out the root cause was passwd / group related tools was not checking if files were safe in disk, and system ended up with a 0 length passwd/group db after a power cycle. There are more context at revision [1]. > > After that, bapt@ suggest to do similar fix for cap_mkdb and services_mkdb, that also can be found at another review [2]. > This definitely needs an explanation what is going on here. When the problem from [1] is encountered, which file appears to be zeroed? If the new one, should not O_SYNC you added in several places take care of that? (btw, it would be nicer if that was fsynced before close instead) If the old one, this still has the window (although miniscule compared to 5 mins) since whatever crash/failure you experienced can happen before you fsync. It may be ok enough in practice, but then the question is whether O_SYNC on the new file was of any significance. Or to state differently, do callers have to fsync/O_SYNC the file they are passing as an argument? Of course it may be either one can appear truncated. Assuming the whole approach makes sense here are some comments about the code itself: > /* > * Make sure file is safe on disk. To improve performance we will call > * fsync() to the directory where file lies > */ > if (rename(tname, dbname) == -1 || > (dbname_dir = dirname(dbname)) == NULL || dirname returns a pointer to an internal buffer, so it is not suitable for use in a library function (think: foo = dirname(...); this(...);) Since it can fail and does not depend on rename, it should have been done earlier. dbname_dir looks like a weird name. Something like dbdirname would be better. > (dbname_dir_fd = open(dbname_dir, O_RDONLY|O_DIRECTORY)) == -1 || dbname_dir_fd is definitely bad. This is not a name, this is a fd. So dbdirfd. > fsync(dbname_dir_fd) != 0) { Why does this do '!= 0' instead of '== -1'? > if (dbname_dir_fd != -1) > close(dbname_dir_fd); > err(1, "Cannot rename `%s' to `%s'", tname, dbname); It could be renamed succeeded, so this msg should be modified to state what really failed. But as a library func it likely should return an error instead, indicating which part failed. > } > > if (dbname_dir_fd != -1) > close(dbname_dir_fd); > At this point dbname_dir_fd is guaranteed to be != -1. > The idea is to implement a “sync rename” function to do all these steps. I thought about it and IMO lib/libutil would be a good place to implement it. But since I’m starting to touch src now, I would like to hear more opinions about this. > > > [1] https://reviews.freebsd.org/D2978 > [2] https://reviews.freebsd.org/D2982 -- Mateusz Guzik <mjguzik gmail.com>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150704133034.GA3102>