From owner-freebsd-hackers@freebsd.org Mon Jul 6 13:20:26 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C0F239848 for ; Mon, 6 Jul 2015 13:20:26 +0000 (UTC) (envelope-from garga.bsd@gmail.com) Received: from mail-qk0-x22f.google.com (mail-qk0-x22f.google.com [IPv6:2607:f8b0:400d:c09::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8CB3C1620 for ; Mon, 6 Jul 2015 13:20:26 +0000 (UTC) (envelope-from garga.bsd@gmail.com) Received: by qkei195 with SMTP id i195so116563420qke.3 for ; Mon, 06 Jul 2015 06:20:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=YHKCOfHTMeVPxdZg5WUh2e1NBeF64wulWRBtDmCWBws=; b=olohwz6mw3VoivPubBuw84Q3TZLCdtEHfC4VG7wqSOaoXDmkvCPrEDOxNgnPnFo4rY TtB5Ot1Cij8gt/HtRWWaSIsUmqTXUnkqvv8ljZdkK+CrrCoyYF3DbGN9PcrHpWCpza7g OODOh2coCFxZXtdFUKMvnXQgjUfGTzMlgqECrWo9/ZRryCE0XnEK0GWRblY7Ax3XdXru TSAam7E5/xEj+2DGZ+Y9xAGXxVyV5mI6zfugbtaRXyddF+lGPQC5RXhohAoi5mjJQhyo UrOIvHclRsJ3m3GcL4Iz6MAUktFv2+vF5BV56NiUAnj/ELSCLdG4+Z6QoRm+EiXoC7jx alfg== X-Received: by 10.140.202.65 with SMTP id x62mr75414828qha.102.1436188825449; Mon, 06 Jul 2015 06:20:25 -0700 (PDT) Received: from mbp.home (179-125-129-24.desktop.com.br. [179.125.129.24]) by mx.google.com with ESMTPSA id z81sm9241605qkg.44.2015.07.06.06.20.23 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 06 Jul 2015 06:20:24 -0700 (PDT) Sender: Renato Botelho Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) Subject: Re: rename() + fsync() implementation From: Renato Botelho In-Reply-To: <20150704133034.GA3102@dft-labs.eu> Date: Mon, 6 Jul 2015 10:20:21 -0300 Cc: FreeBSD Hackers Content-Transfer-Encoding: quoted-printable Message-Id: <4B5ABFF0-AAE5-4E74-BAD4-E7B301DC3F7D@FreeBSD.org> References: <2770DA5F-D7F2-46D9-9158-10C86115F8AC@FreeBSD.org> <20150704133034.GA3102@dft-labs.eu> To: Mateusz Guzik X-Mailer: Apple Mail (2.2102) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 13:20:26 -0000 > On Jul 4, 2015, at 10:30, Mateusz Guzik wrote: >=20 > On Fri, Jul 03, 2015 at 01:59:43PM -0300, Renato Botelho wrote: >> Some time ago we found a bug on pfSense and after investigating we = figured out the root cause was passwd / group related tools was not = checking if files were safe in disk, and system ended up with a 0 length = passwd/group db after a power cycle. There are more context at revision = [1]. >>=20 >> After that, bapt@ suggest to do similar fix for cap_mkdb and = services_mkdb, that also can be found at another review [2]. >>=20 >=20 > This definitely needs an explanation what is going on here. >=20 > When the problem from [1] is encountered, which file appears to be > zeroed? >=20 > If the new one, should not O_SYNC you added in several places take = care > of that? (btw, it would be nicer if that was fsynced before close > instead) According comment #11 of pfSense bug #4523 [3]: ---------- With SU, with or without J, you end up with 0 byte master.passwd, = passwd, group, pwd.db, spwd.db. Or some subset of those. Without SU, you = end up with master.passwd and/or group corrupted, containing parts of = other files in /etc/. It's replicable on stock FreeBSD 9.0 through 11-CURRENT by running the = following:=20 #!/bin/sh /usr/sbin/pw userdel -n 'admin' /usr/sbin/pw groupadd all -g 1998 /usr/sbin/pw groupadd admins -g 1999 /usr/sbin/pw groupmod all -g 1998 -M '' echo = \$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50= HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw usermod -q -n root -s = /bin/tcsh -H 0 echo = \$6\$O/T6GYkcgYvOBTGm\$KvOh3zhFKiA6HMEPktuImAI8/cetwEFsgj7obXdeTcQvn6mhs50= HgkWt6nfnxNhTIb2w4Je6dqdKtARavxThP1 | /usr/sbin/pw useradd -m -k = /etc/skel -o -q -u 0 -n admin -g wheel -s /bin/sh -d /root -c 'System = Administrator' -H 0 /usr/sbin/pw unlock admin -q /usr/sbin/pw groupmod all -g 1998 -M '0' /usr/sbin/pw groupmod admins -g 1999 -M '0' then power cycling the system. If using SU, you'll end up with 0 byte = files. Without SU, you'll have corrupted files containing parts of some = other file(s) in /etc. ---------- > If the old one, this still has the window (although miniscule compared > to 5 mins) since whatever crash/failure you experienced can happen > before you fsync. It may be ok enough in practice, but then the = question > is whether O_SYNC on the new file was of any significance. Or to state > differently, do callers have to fsync/O_SYNC the file they are passing > as an argument? >=20 > Of course it may be either one can appear truncated. The idea of using O_SYNC on temporary file, and call fsync() on = directory after rename came from Kirk McKusick: ---------- Applications that are properly written take these steps: 1. write new file to a temporary name. 2. fsync newly written file. 3. rename temporary name to file to be updated. 4. For faster results, fsync directory that has updated file to = commit the new file. ---------- I decided to change it on libutil because the functions are specific for = passwd and group operations, otherwise it would need to be changed on = all applications that calls it (usr.bin/chpass, usr.sbin/pw, = usr.sbin/rpc.yppasswdd). > Assuming the whole approach makes sense here are some comments about = the > code itself: >=20 >> /* >> * Make sure file is safe on disk. To improve performance we = will call >> * fsync() to the directory where file lies >> */ >> if (rename(tname, dbname) =3D=3D -1 || >> (dbname_dir =3D dirname(dbname)) =3D=3D NULL || >=20 > dirname returns a pointer to an internal buffer, so it is not suitable > for use in a library function (think: foo =3D dirname(...); = this(...);) What do you think about having dirname_r implemented? Using the same = approach we have today for basename/basename_r. > Since it can fail and does not depend on rename, it should have been > done earlier. >=20 > dbname_dir looks like a weird name. Something like dbdirname would be > better. OK >> (dbname_dir_fd =3D open(dbname_dir, O_RDONLY|O_DIRECTORY)) = =3D=3D -1 || >=20 > dbname_dir_fd is definitely bad. This is not a name, this is a fd. So > dbdirfd. OK >> fsync(dbname_dir_fd) !=3D 0) { >=20 > Why does this do '!=3D 0' instead of '=3D=3D -1=E2=80=99? No special reason, I=E2=80=99m going to change it. >> if (dbname_dir_fd !=3D -1) >> close(dbname_dir_fd); >> err(1, "Cannot rename `%s' to `%s'", tname, dbname); >=20 > It could be renamed succeeded, so this msg should be modified to state > what really failed. But as a library func it likely should return an > error instead, indicating which part failed. Agreed. >> } >>=20 >> if (dbname_dir_fd !=3D -1) >> close(dbname_dir_fd); >>=20 >=20 > At this point dbname_dir_fd is guaranteed to be !=3D -1. True, thanks for pointing this out. >> The idea is to implement a =E2=80=9Csync rename=E2=80=9D function to = do all these steps. I thought about it and IMO lib/libutil would be a = good place to implement it. But since I=E2=80=99m starting to touch src = now, I would like to hear more opinions about this. >>=20 >>=20 >> [1] https://reviews.freebsd.org/D2978 >> [2] https://reviews.freebsd.org/D2982 >=20 > --=20 > Mateusz Guzik [3] https://redmine.pfsense.org/issues/4523#note-11 -- Renato Botelho