From owner-freebsd-hackers@freebsd.org Tue Dec 8 10:59:53 2015 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9813C9D4007 for ; Tue, 8 Dec 2015 10:59:53 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from shell1.rawbw.com (shell1.rawbw.com [198.144.192.42]) by mx1.freebsd.org (Postfix) with ESMTP id 87C3E1EA8 for ; Tue, 8 Dec 2015 10:59:53 +0000 (UTC) (envelope-from yuri@rawbw.com) Received: from yuri.doctorlan.com (c-50-184-63-128.hsd1.ca.comcast.net [50.184.63.128]) (authenticated bits=0) by shell1.rawbw.com (8.15.1/8.15.1) with ESMTPSA id tB8Axqqc079559 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Tue, 8 Dec 2015 02:59:52 -0800 (PST) (envelope-from yuri@rawbw.com) X-Authentication-Warning: shell1.rawbw.com: Host c-50-184-63-128.hsd1.ca.comcast.net [50.184.63.128] claimed to be yuri.doctorlan.com To: Freebsd hackers list From: Yuri Subject: How to get the deterministic result for FreeBSD tar(1)? Message-ID: <5666B828.5000306@rawbw.com> Date: Tue, 8 Dec 2015 02:59:52 -0800 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 08 Dec 2015 10:59:53 -0000 I have two identical directories (no diffs, all identical mtime attributes) compressed by this command: find dir -print0 | LC_ALL=C sort -z | tar cf archive.tgz --format=bsdtar --no-recursion --null -T - The results are different: 3 files out of 10,000 have pax attributes set that are different: - 27 ctime=1449566560.642715 +27 ctime=1449566903.167521 src/contrib/libarchive/archive_write_set_format_by_name.c suggests that format=bsdtar should force ARCHIVE_FORMAT_TAR_PAX_RESTRICTED format (no attributes), unless need_extension=1 is set on a per-file basis in archive_write_set_format_pax.c. need_extension=1 is triggered by these conditions: * too long or non-ASCII path * too long or non-ASCII link * too large file * too long GID or UID * too long or non-ASCII group name or user name * ACL entries and extended attributes * sparse info In my case file hierarchy is indeed very deep, and these three files also have the "path" attribute. I think this is a bug that in archive_write_set_format_pax.c ctime attribute is written in case one of the above conditions are satisfied, because ctime can't be controlled by the user, and will always cause the difference. So I have two questions: 1. How do I actually achieve the output determinism for tar(1)? 2. Is there an agreement that this is a bug that too long or non-ASCII path name triggers the leakage of ctime into a tar file? Yuri