Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 30 Aug 2023 15:59:14 +0100
From:      Doug Rabson <dfr@rabson.org>
To:        Baptiste Daroussin <bapt@freebsd.org>
Cc:        freebsd-pkgbase@freebsd.org
Subject:   Re: Repeatable builds using pkgbase
Message-ID:  <CACA0VUj=nPLpn6MCkxdUxG90bw_06Asj_oS0GSNkshyGVxK%2BgA@mail.gmail.com>
In-Reply-To: <CACA0VUi2rZ757wq_WRkaXnDZGED_%2BtDnWPUa-oWK=vNefEQYsg@mail.gmail.com>
References:  <CACA0VUgd0Az-=vj2qwirY081YEQ%2BVPutWhjU596qj05r6m%2BZyA@mail.gmail.com> <gwuqh5ghnlgvp2yizrlhiljabl65vv5illsusrvizpioihczbb@2h5kd6xmcouf> <CACA0VUi2rZ757wq_WRkaXnDZGED_%2BtDnWPUa-oWK=vNefEQYsg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On Mon, 21 Aug 2023 at 17:26, Doug Rabson <dfr@rabson.org> wrote:

>
>
> On Mon, 21 Aug 2023 at 17:23, Baptiste Daroussin <bapt@freebsd.org> wrote:
>
>> On Mon, Aug 21, 2023 at 02:33:24PM +0100, Doug Rabson wrote:
>> > While working on build scripts for FreeBSD container images, I wanted to
>> > get to the point where my builds are repeatable, i.e. if I create two
>> > images with the same set of packages installed in the same order, they
>> > should be identical.
>> >
>> > The main stumbling block is timestamps. I can force all the file
>> timestamps
>> > to a fixed value with buildah using the '--timestamp' argument to either
>> > 'buildah commit' or 'buildah build' but even then, the two images have
>> > different hashes. Looking deeper, the difference is in
>> > /var/db/pkg/local.sqlite. If I compare SQL dumps of the databases from
>> each
>> > image, I can see a timestamp embedded in the sqlite file:
>> >
>> > diff dump1 dump2
>> >
>> >
>> > 4c4
>> > < INSERT INTO packages
>> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo package','zoneinfo
>> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org','
>> > https://www.FreeBSD.org
>> >
>> ','/',731014,0,0,1,1692446701,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0);
>> > ---
>> > > INSERT INTO packages
>> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo package','zoneinfo
>> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org','
>> > https://www.FreeBSD.org
>> >
>> ','/',731014,0,0,1,1692622924,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0);
>> >
>> >
>> > Looking at the pkg source, I can see that the prepared statement for
>> > inserting into the packages table explicitly uses NOW() for this column.
>> > Would it be reasonable to allow changing this, e.g. by adding a command
>> > line argument to pkg to override the default? I haven't tried this to
>> see
>> > if that makes the two databases identical - if not, I guess I'll just
>> > remove pkg metadata altogether.
>>
>> yes this would be reasonable, if you use en env var, please respect
>> SOURCE_DATE_EPOCH.
>>
>> I'll try this out, probably using an env var as you suggest. Hopefully
> there is nothing non-deterministic in sqlite which would stop this from
> being reproducible.
>

Sadly, even if I override the timestamp written to the packages table, the
resulting local.sqlite files on two consecutive runs are still different.
If I compare the two using 'sqlite3 local.sqlite .dump', the sql dumps are
identical so there is something else in sqlite which is making things
non-reproducible. I guess I'll have to fall back to plan B and remove the
package metadata from my images.

>
>

[-- Attachment #2 --]
<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 21 Aug 2023 at 17:26, Doug Rabson &lt;<a href="mailto:dfr@rabson.org">dfr@rabson.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 21 Aug 2023 at 17:23, Baptiste Daroussin &lt;<a href="mailto:bapt@freebsd.org" target="_blank">bapt@freebsd.org</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On Mon, Aug 21, 2023 at 02:33:24PM +0100, Doug Rabson wrote:<br>
&gt; While working on build scripts for FreeBSD container images, I wanted to<br>
&gt; get to the point where my builds are repeatable, i.e. if I create two<br>
&gt; images with the same set of packages installed in the same order, they<br>
&gt; should be identical.<br>
&gt; <br>
&gt; The main stumbling block is timestamps. I can force all the file timestamps<br>
&gt; to a fixed value with buildah using the &#39;--timestamp&#39; argument to either<br>
&gt; &#39;buildah commit&#39; or &#39;buildah build&#39; but even then, the two images have<br>
&gt; different hashes. Looking deeper, the difference is in<br>
&gt; /var/db/pkg/local.sqlite. If I compare SQL dumps of the databases from each<br>
&gt; image, I can see a timestamp embedded in the sqlite file:<br>
&gt; <br>
&gt; diff dump1 dump2<br>
&gt; <br>
&gt; <br>
&gt; 4c4<br>
&gt; &lt; INSERT INTO packages<br>
&gt; VALUES(1,&#39;base&#39;,&#39;FreeBSD-zoneinfo&#39;,&#39;13.2p2&#39;,&#39;zoneinfo package&#39;,&#39;zoneinfo<br>
&gt; package&#39;,NULL,NULL,&#39;FreeBSD:13:amd64&#39;,&#39;re@FreeBSD.org&#39;,&#39;<br>
&gt; <a href="https://www.FreeBSD.org" rel="noreferrer" target="_blank">https://www.FreeBSD.org</a><br>;
&gt; &#39;,&#39;/&#39;,731014,0,0,1,1692446701,&#39;2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d&#39;,NULL,NULL,0);<br>
&gt; ---<br>
&gt; &gt; INSERT INTO packages<br>
&gt; VALUES(1,&#39;base&#39;,&#39;FreeBSD-zoneinfo&#39;,&#39;13.2p2&#39;,&#39;zoneinfo package&#39;,&#39;zoneinfo<br>
&gt; package&#39;,NULL,NULL,&#39;FreeBSD:13:amd64&#39;,&#39;re@FreeBSD.org&#39;,&#39;<br>
&gt; <a href="https://www.FreeBSD.org" rel="noreferrer" target="_blank">https://www.FreeBSD.org</a><br>;
&gt; &#39;,&#39;/&#39;,731014,0,0,1,1692622924,&#39;2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d&#39;,NULL,NULL,0);<br>
&gt; <br>
&gt; <br>
&gt; Looking at the pkg source, I can see that the prepared statement for<br>
&gt; inserting into the packages table explicitly uses NOW() for this column.<br>
&gt; Would it be reasonable to allow changing this, e.g. by adding a command<br>
&gt; line argument to pkg to override the default? I haven&#39;t tried this to see<br>
&gt; if that makes the two databases identical - if not, I guess I&#39;ll just<br>
&gt; remove pkg metadata altogether.<br>
<br>
yes this would be reasonable, if you use en env var, please respect<br>
SOURCE_DATE_EPOCH.<br>
<br></blockquote><div>I&#39;ll try this out, probably using an env var as you suggest. Hopefully there is nothing non-deterministic in sqlite which would stop this from being reproducible.</div></div></div></blockquote><div><br></div><div>Sadly, even if I override the timestamp written to the packages table, the resulting local.sqlite files on two consecutive runs are still different. If I compare the two using &#39;sqlite3 local.sqlite .dump&#39;, the sql dumps are identical so there is something else in sqlite which is making things non-reproducible. I guess I&#39;ll have to fall back to plan B and remove the package metadata from my images.<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div></div></div>
</blockquote></div></div>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACA0VUj=nPLpn6MCkxdUxG90bw_06Asj_oS0GSNkshyGVxK%2BgA>