Date: Wed, 30 Aug 2023 16:09:30 +0100 From: Doug Rabson <dfr@rabson.org> To: Baptiste Daroussin <bapt@freebsd.org> Cc: freebsd-pkgbase@freebsd.org Subject: Re: Repeatable builds using pkgbase Message-ID: <CACA0VUhWfmhC9TeMWouMy=pBnW4dgEakND0sn2JgZ7Pvt5_Vig@mail.gmail.com> In-Reply-To: <CACA0VUj=nPLpn6MCkxdUxG90bw_06Asj_oS0GSNkshyGVxK%2BgA@mail.gmail.com> References: <CACA0VUgd0Az-=vj2qwirY081YEQ%2BVPutWhjU596qj05r6m%2BZyA@mail.gmail.com> <gwuqh5ghnlgvp2yizrlhiljabl65vv5illsusrvizpioihczbb@2h5kd6xmcouf> <CACA0VUi2rZ757wq_WRkaXnDZGED_%2BtDnWPUa-oWK=vNefEQYsg@mail.gmail.com> <CACA0VUj=nPLpn6MCkxdUxG90bw_06Asj_oS0GSNkshyGVxK%2BgA@mail.gmail.com>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On Wed, 30 Aug 2023 at 15:59, Doug Rabson <dfr@rabson.org> wrote: > > > On Mon, 21 Aug 2023 at 17:26, Doug Rabson <dfr@rabson.org> wrote: > >> >> >> On Mon, 21 Aug 2023 at 17:23, Baptiste Daroussin <bapt@freebsd.org> >> wrote: >> >>> On Mon, Aug 21, 2023 at 02:33:24PM +0100, Doug Rabson wrote: >>> > While working on build scripts for FreeBSD container images, I wanted >>> to >>> > get to the point where my builds are repeatable, i.e. if I create two >>> > images with the same set of packages installed in the same order, they >>> > should be identical. >>> > >>> > The main stumbling block is timestamps. I can force all the file >>> timestamps >>> > to a fixed value with buildah using the '--timestamp' argument to >>> either >>> > 'buildah commit' or 'buildah build' but even then, the two images have >>> > different hashes. Looking deeper, the difference is in >>> > /var/db/pkg/local.sqlite. If I compare SQL dumps of the databases from >>> each >>> > image, I can see a timestamp embedded in the sqlite file: >>> > >>> > diff dump1 dump2 >>> > >>> > >>> > 4c4 >>> > < INSERT INTO packages >>> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo >>> package','zoneinfo >>> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org',' >>> > https://www.FreeBSD.org >>> > >>> ','/',731014,0,0,1,1692446701,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0); >>> > --- >>> > > INSERT INTO packages >>> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo >>> package','zoneinfo >>> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org',' >>> > https://www.FreeBSD.org >>> > >>> ','/',731014,0,0,1,1692622924,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0); >>> > >>> > >>> > Looking at the pkg source, I can see that the prepared statement for >>> > inserting into the packages table explicitly uses NOW() for this >>> column. >>> > Would it be reasonable to allow changing this, e.g. by adding a command >>> > line argument to pkg to override the default? I haven't tried this to >>> see >>> > if that makes the two databases identical - if not, I guess I'll just >>> > remove pkg metadata altogether. >>> >>> yes this would be reasonable, if you use en env var, please respect >>> SOURCE_DATE_EPOCH. >>> >>> I'll try this out, probably using an env var as you suggest. Hopefully >> there is nothing non-deterministic in sqlite which would stop this from >> being reproducible. >> > > Sadly, even if I override the timestamp written to the packages table, the > resulting local.sqlite files on two consecutive runs are still different. > If I compare the two using 'sqlite3 local.sqlite .dump', the sql dumps are > identical so there is something else in sqlite which is making things > non-reproducible. I guess I'll have to fall back to plan B and remove the > package metadata from my images. > Weirdly, if I regenerate the local.sqlite file using sqlite3's .dump and .read commands, the resulting DB file does have a consistent hash so that might be a plan C. [-- Attachment #2 --] <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 30 Aug 2023 at 15:59, Doug Rabson <<a href="mailto:dfr@rabson.org">dfr@rabson.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 21 Aug 2023 at 17:26, Doug Rabson <<a href="mailto:dfr@rabson.org" target="_blank">dfr@rabson.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, 21 Aug 2023 at 17:23, Baptiste Daroussin <<a href="mailto:bapt@freebsd.org" target="_blank">bapt@freebsd.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On Mon, Aug 21, 2023 at 02:33:24PM +0100, Doug Rabson wrote:<br> > While working on build scripts for FreeBSD container images, I wanted to<br> > get to the point where my builds are repeatable, i.e. if I create two<br> > images with the same set of packages installed in the same order, they<br> > should be identical.<br> > <br> > The main stumbling block is timestamps. I can force all the file timestamps<br> > to a fixed value with buildah using the '--timestamp' argument to either<br> > 'buildah commit' or 'buildah build' but even then, the two images have<br> > different hashes. Looking deeper, the difference is in<br> > /var/db/pkg/local.sqlite. If I compare SQL dumps of the databases from each<br> > image, I can see a timestamp embedded in the sqlite file:<br> > <br> > diff dump1 dump2<br> > <br> > <br> > 4c4<br> > < INSERT INTO packages<br> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo package','zoneinfo<br> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org','<br> > <a href="https://www.FreeBSD.org" rel="noreferrer" target="_blank">https://www.FreeBSD.org</a><br> > ','/',731014,0,0,1,1692446701,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0);<br> > ---<br> > > INSERT INTO packages<br> > VALUES(1,'base','FreeBSD-zoneinfo','13.2p2','zoneinfo package','zoneinfo<br> > package',NULL,NULL,'FreeBSD:13:amd64','re@FreeBSD.org','<br> > <a href="https://www.FreeBSD.org" rel="noreferrer" target="_blank">https://www.FreeBSD.org</a><br> > ','/',731014,0,0,1,1692622924,'2$2$c9w95oqai9bwhny1k4pcg8mji77xgk43zjxxb69j1duzq5jao18wak4deer85epmfpc8ngyysyt9wu74pg7sczkqc3ekyawkfgwzi8d',NULL,NULL,0);<br> > <br> > <br> > Looking at the pkg source, I can see that the prepared statement for<br> > inserting into the packages table explicitly uses NOW() for this column.<br> > Would it be reasonable to allow changing this, e.g. by adding a command<br> > line argument to pkg to override the default? I haven't tried this to see<br> > if that makes the two databases identical - if not, I guess I'll just<br> > remove pkg metadata altogether.<br> <br> yes this would be reasonable, if you use en env var, please respect<br> SOURCE_DATE_EPOCH.<br> <br></blockquote><div>I'll try this out, probably using an env var as you suggest. Hopefully there is nothing non-deterministic in sqlite which would stop this from being reproducible.</div></div></div></blockquote><div><br></div><div>Sadly, even if I override the timestamp written to the packages table, the resulting local.sqlite files on two consecutive runs are still different. If I compare the two using 'sqlite3 local.sqlite .dump', the sql dumps are identical so there is something else in sqlite which is making things non-reproducible. I guess I'll have to fall back to plan B and remove the package metadata from my images.<br></div></div></div></blockquote><div><br></div><div>Weirdly, if I regenerate the local.sqlite file using sqlite3's .dump and .read commands, the resulting DB file does have a consistent hash so that might be a plan C.</div><div><br></div><div> </div></div></div>help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CACA0VUhWfmhC9TeMWouMy=pBnW4dgEakND0sn2JgZ7Pvt5_Vig>
