From owner-freebsd-hackers@freebsd.org Tue Dec 8 17:42:49 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 202EE9D4EB8 for ; Tue, 8 Dec 2015 17:42:49 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from smtprelay05.ispgateway.de (smtprelay05.ispgateway.de [80.67.18.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id DC1101E39 for ; Tue, 8 Dec 2015 17:42:48 +0000 (UTC) (envelope-from freebsd-listen@fabiankeil.de) Received: from [78.35.183.85] (helo=fabiankeil.de) by smtprelay05.ispgateway.de with esmtpsa (TLSv1.2:AES128-GCM-SHA256:128) (Exim 4.84) (envelope-from ) id 1a6MHi-0000d5-Hy for freebsd-hackers@FreeBSD.org; Tue, 08 Dec 2015 18:42:42 +0100 Date: Tue, 8 Dec 2015 18:40:07 +0100 From: Fabian Keil To: Freebsd hackers list Subject: Re: How to get the deterministic result for FreeBSD tar(1)? Message-ID: <20151208184007.3080da7b@fabiankeil.de> In-Reply-To: <5666B828.5000306@rawbw.com> References: <5666B828.5000306@rawbw.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/_d.KNCs67RBAR5IAKg=XBE6"; protocol="application/pgp-signature" X-Df-Sender: Nzc1MDY3 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2015 17:42:49 -0000 --Sig_/_d.KNCs67RBAR5IAKg=XBE6 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Yuri wrote: > I have two identical directories (no diffs, all identical mtime=20 > attributes) compressed by this command: > find dir -print0 | LC_ALL=3DC sort -z | tar cf archive.tgz --format=3Dbsd= tar=20 > --no-recursion --null -T - >=20 > The results are different: 3 files out of 10,000 have pax attributes set= =20 > that are different: > - 27 ctime=3D1449566560.642715 > +27 ctime=3D1449566903.167521 [...]=20 > So I have two questions: > 1. How do I actually achieve the output determinism for tar(1)? You can use an mtree spec to set fake timestamps etc. For an example see patch 12 in this set: https://www.fabiankeil.de/sourcecode/electrobsd/reproducible-build-goo-r291= 706-29246dc.diff Patch 5 contains a script to regenerate tar files with normalized timestamps (and some other attributes) but of course generating the files twice is a bit silly if it can be avoided. > 2. Is there an agreement that this is a bug that too long or non-ASCII=20 > path name triggers the leakage of ctime into a tar file? My general impression is that large parts of tar's behaviour are undefined (due to lack of documentation) and it's not obvious to me that this isn't one of them. Fabian --Sig_/_d.KNCs67RBAR5IAKg=XBE6 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlZnFfgACgkQBYqIVf93VJ1JKwCgjigFJo/uCBufdBtn1syd4RK2 3rkAniiF5gYg/sts1h8L1lvkQVEguXsh =DN/n -----END PGP SIGNATURE----- --Sig_/_d.KNCs67RBAR5IAKg=XBE6--