Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 14 May 2015 00:12:07 +0200
From:      Baptiste Daroussin <bapt@freebsd.org>
To:        Scott Bennett <bennett@sdf.org>
Cc:        bsd-lists@bsdforge.com, freebsd-ports@freebsd.org
Subject:   Re: damage to pkg's sqlite data base
Message-ID:  <20150513221207.GE90457@ivaldir.etoilebsd.net>
In-Reply-To: <201505132152.t4DLqbLK000601@sdf.org>
References:  <201505120617.t4C6HkA3019096@sdf.org> <0dcbdcb9fdfdebc53a2563b11494f61b@ultimatedns.net> <201505130512.t4D5Cpqa011080@sdf.org> <5f1749d3e664fbb487aa98f160e6cf92@ultimatedns.net> <201505130720.t4D7KtMd016553@sdf.org> <a65a967965bd406b674ca618265d8b22@ultimatedns.net> <20150513145004.GA90457@ivaldir.etoilebsd.net> <201505132152.t4DLqbLK000601@sdf.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--zjcmjzIkjQU2rmur
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, May 13, 2015 at 04:52:37PM -0500, Scott Bennett wrote:
>      I'm short on time right now, so I'll defer my replies to Andrew Clark
> and Chris H until late tonight after I (hope to) have had a chance to try
> out Andrew's suggested procedure.  I do have sqlite installed, though (see
> far below).
>=20
> Baptiste Daroussin <bapt@freebsd.org> wrote:
> > On Wed, May 13, 2015 at 06:48:12AM -0700, Chris H wrote:
> > > On Wed, 13 May 2015 02:20:55 -0500 Scott Bennett <bennett@sdf.org> wr=
ote
> > >=20
> > > > "Chris H" <bsd-lists@bsdforge.com> wrote:
> > > > > On Wed, 13 May 2015 00:12:51 -0500 Scott Bennett <bennett@sdf.org=
> wrote
> > > > > > "Chris H" <bsd-lists@bsdforge.com> wrote:
> > > > > > > On Tue, 12 May 2015 01:17:46 -0500 Scott Bennett <bennett@sdf=
=2Eorg>
> > > > > > wrote >
> > > > > > > > For nearly two weeks I've been stymied by an apparently dam=
aged
> > > > > > record > > in the sqlite data base used by pkg(8) and pkg-stati=
c(8).=20
> > > > > > Unfortunately, > > it is a record for a port that is depended u=
pon rather
> > > > > > heavily, lang/gcc. > > lang/gcc compiled and linked just fine, =
but any
> > > > > > attempt to install the > > result ends up like this.
> > > > > > > >=20
> > > > > > > > =3D=3D=3D>  Checking if gcc already installed
> > > > > > > > =3D=3D=3D>   Registering installation for gcc-4.8.4_3
> > > > > > > > Installing gcc-4.8.4_3...
> > > > > > > > pkg-static: sqlite error while executing iterator in file
> > > > > > > > pkgdb_iterator.c:931: database disk image is malformed pkg-=
static:
> > > > > > sqlite > > error while executing INSERT OR REPLACE INTO files (=
path,
> > > > > > sha256, > > package_id) VALUES (?1, ?2, ?3) in file pkgdb.c:172=
2:
> > > > > > database disk image > > is malformed *** Error code 70
> > > > > > > >=20
> > > > > > > > Stop.
> > > > > > > > make: stopped in /usr/ports/lang/gcc
> > > > > > > >=20
> > > > > > > >      I'm not familiar with sqlite commands, nor with the st=
ructure of
> > > > > > the > > pkg data base, nor any of several other relevant topics=
=2E  But I
> > > > > > do have=20
> > > >
> > > >      If someone knows sqlite well enough to look at the error messa=
ges above
> > > > and then give me directions to delete the offending records, then p=
erhaps the
> > > > next attempt to install lang/gcc might actually work.  Anyone?
> > > >=20
> > > > > > > > questions:
> > > > > > > >=20
> > > > > > > >     1) how might the damage have happened?
> > > > > > > >=20
> > > > > > > >     2) how might such an event be avoided/prevented in the =
future?
> > > > > > > >=20
> > > > > > > >     3) most urgently, how can I *fix* it?
> > > > > > > You should get a backup made on the nightly cron(8) (periodic=
(8)).
> > > > > > > It'll be located in /var/backups
> > > > > > > Off the top of my head I can't remember which of the 2, is wh=
ich.
> > > > > > > But a simple size comparison should tell the tale. :)
> > > > > >=20
> > > > > >      The problem began much farther back than two days ago, but=
 I should
> > > > > > be able to restore it from two weeks or more in the past and th=
en find
> > > > > > the most recent version that is not corrupted.
> > > > > >=20
> > > > > > > Simply rename your (now) corrupt db, and copy the backup over.
> > > > > >=20
> > > > > >      However, if I do that, then what happens to all the ports =
that have
> > > > > > been updated or added since that version of the data base was b=
acked up?
> > > > > > I have run "portmaster -a" (with some additional options) quite=
 a few
> > > > > > times since the lang/gcc problem first appeared, so an old loca=
l.sqlite
> > > > > > will no longer accurately reflect what is currently installed.
> > > > > > >
> > > > > > > I ran into this problem as well, and this was how I was final=
ly
> > > > > > > able to overcome it.
> > > > > > >
> > > > > > > Hope this helps!
> > > > > > >
> > > > > > > --Chris
> > > > > > >
> > > > > > > >=20
> > > > > > > >     4) I was unable to find any instructions for recreating=
 a pkg
> > > > > > data > >     base if the data base gets damaged/destroyed.  Is =
there a
> > > > > > way to > >     do that that I missed?
> > > > > >=20
> > > > > >      There must be a way to do this, right?  I mean, really, it=
's pretty
> > > > > > fundamental that no new data base be put into production withou=
t a way to
> > > > > > rebuild it.  The FreeBSD developers haven't really broken so an=
cient and
> > > > > > basic a principle, have they?  So what's the trick?  What is th=
e method
> > > > > > to rebuild /var/db/pkg/local.sqlite from scratch based upon the=
 currently
> > > > > > installed ports/packages?
> > > > > I whined about it the first time my DB blew up. It's become
> > > > > corrupted several times since on different boxes/versions. *but*
> > > > > after the first time, I made it a habit of making a copy of it *b=
efore*
> > > > > embarking on an upgrade, or install of any ports. Seems it's the =
only
> > > > > way to save yourself. pkg(8), to the best of my knowledge doesn't
> > > > > perform a backup prior to any of this. So I *do*.
> > > > > Annoying, but it beats attempting to recover from a failed port
> > > > > upgrade/install.
> > > >=20
> > > >      Chris, thanks for your responses, depressing though they be.
> > > >      I have been using portmaster for ages to update my installed p=
orts
> > > > and usually to install them in the first place.  Does your suggesti=
on
> > > > mean abandoning portmaster?
> > > No. I'm an avid user of portmaster(8). In fact I intend to be it's
> > > maintainer. :)
> > > >  If it is necessary to back local.sqlite up
> > > > before each installation, that would seem to imply that portmaster =
would
> > > > only be useful in getting a list of ports to be updated, after whic=
h each
> > > > port would have to be updated individually, so that local.sqlite co=
uld be
> > > > backed up before each one.  I have over 1900 ports installed, so th=
at would
> > > > mean just never updating my ports at all, except by rebuilding all =
of them
> > > > upon upgrading to a new major release of FreeBSD.
> > > I
> > > >      Bapt, oh, Bapt, whatever were you folks thinking?  Did you rea=
lly
> > > > leave us no practical recovery method?
> > > In Bapt's humble defense. It's difficult, to impossible for him to fix
> > > a problem w/o decent clues. So I wouldn't go pointing any fingers, ju=
st
> > > yet. :) In other words; w/o a cause, what should he fix? :)
> > > What recovery would you suggest? Truth is, if you brushed up on sqlit=
e3
> > > you could create a fresh DB, using output from portmaster(8). Trouble=
some,
> > > perhaps. But at least an alternative.
>=20
>      Chris?  You already responded to my question 4) above, which answers
> your question.  Why are you now asking this?  Any data base intended for
> production use MUST be accompanied by a method of recreating it before
> the data base goes into production.  This has been the case as long as
> data bases have been used.  The Wallendas may walk without nets between
> skyscraper roofs, but I assure you that a) not all of them arrive at the
> intended destinations and b) normal mortals have a different perspective
> on that sort of risk.
>      Further, if portmaster produces that output *from the corrupted data
> base*, then it stands to reason that other cases of corruption that diffe=
red
> in particulars from this one might prevent portmaster from providing said
> output.
>=20
> > > >=20
> > Coming very late to the discussion I would like to understand what lead=
 you to a
> > broken database, from what I am aware of the only way to end up with a =
broken db
> > is using pkg over nfs without rpc lock properly running (note that the =
default
> >
> > nfs client service on freebsd does not start rpc lock...)
> >
> > pkg work around this case by detecting the filesystem it is running on =
and
> > changing the locking system it uses when running on network filesystem =
so this
> > problem cannot in theory happen anymore even if one forgot to start rpc=
 lockd.
> >
> > It remains however one case where this can happen: it is one the host i=
s running
> > over nfs and it starts a jail, in that case, the jail does not know it =
is
> > running over nfs so pkg cannot know it needs to switch the locking syst=
em.
> >
> > Are you in one of those situation, if not can you describe a a bit more=
 your
> > system?
>=20
> FreeBSD hellas 10.1-STABLE FreeBSD 10.1-STABLE #54 r282368: Sun May  3 15=
:48:37 CDT 2015     bennett@hellas:/usr/obj/usr/src/sys/hellas  amd64
>      I have only one machine running these days, so there is no NFS.  At
> present, I have no jails set up.
>      Here are data concerning the relevant drive, which is the boot drive.
> There are many other file systems on this machine, but they all involve
> external devices and are unrelated to this issue, except for /buildwork,
> where WRKDIRPREFIX=3D/buildwork/ports and CCACHE_DIR=3D/buildwork/ccache.=
  The
> same sort of information for /buildwork is included below.
>=20
> /dev/mirror/fbsds1a on / (ufs, local)
> devfs on /dev (devfs, local, multilabel)
> /dev/mirror/fbsds1d on /var (ufs, local)
> /dev/mirror/fbsds1e on /usr (ufs, local, soft-updates)
> /dev/mirror/fbsds1f on /usr/local (ufs, local)
> /dev/mirror/fbsds3a on /usr/home (ufs, local)
> /dev/mirror/fbsds3d on /usr/ports (ufs, local, soft-updates)
> /dev/ufs/stripework on /buildwork (ufs, local, soft-updates)
>=20
>  mirror/fbsds1  COMPLETE  ada0s1 (ACTIVE)
> mirror/fbsds3a  COMPLETE  ada0s3a (ACTIVE)
> mirror/fbsds3d  COMPLETE  ada0s3d (ACTIVE)
>                             ufs/stripework     N/A  stripe/5x15g1
> stripe/5x15g1      UP  da0p5
>                        da2p5
>                        da3p5
>                        da5p5
>                        da6p5
>=20
> Note that the mirrors listed above have only one device in each.  At one
> point a few weeks ago, there were two devices in each for a bit under
> eight hours until the brand-new second drive failed.  Unfortunately, it
> turns out that the manufacturer does not provide any support for its
> internal hard drive products, other than replacement under warranty via
> a web page that requires an unsecured web browser to apply for the
> replacement.  Grrr.  Until I get that straightened out and have a second
> drive, all of the mirrors listed above on the boot drive are unreplicated.
> >
> > Now by default pkg creates back up of the db daily to allow recovery, t=
here was
> > an issue before pkg 1.5 which requires a manual intervention and fixed =
after pkg
> > 1.5 so it is easier to recover.
>=20
>      One serious problem for recovery is that I don't know when the
> corruption actually occurred.  All I can tell you is that it must have
> happened no earlier than the previous update to lang/gcc (lang/gcc48).
> The error messages first appeared with the update to gcc-4.8.4_3.
> >
> > sqlite is a pretty solid db I would like to understand how you ended up=
 with a
> > corrupted db if not what I described above.
>=20
>      So would I because I would seriously like it not to happen again.
> Long ago, I used to use portupgrade, but I eventually switched to
> portmaster for three reasons:  speed (ruby is awfully slow), dougb@
> massively reworked portmaster and made it work much more correctly
> than portupgrade, and portupgrade's frustrating habit of always deciding
> that the package data base had to be rebuilt from nothing, which always
> took a long time and failed as often as it succeeded.  Because portmaster
> now has to use a data base instead of direct evidence, it seems that that
> fourth complaint has partly returned to haunt us.
>      sqlite3-3.8.9_1 is what is installed on my system.  I have no idea
> which version is incorporated into pkg or pkg-static.
>=20
pkg incorporates a sqlite3 3.8.9 version but that is not a problem as all is
compatible.

if your date were corrupted during the last backups then the backups will be
empty.

Try
xzcat /var/backups/pkg.sql.xz | head -1
if the output is:
pragma user_version=3D 31;

then via command line:
unxz /var/backup/pkg.sql.xz | sqlite3 /var/db/local.sqlite

If not then
unxz /var/backup/pkg.sql.xz | sqlite3 /var/db/local.sqlite
sqlite3 /var/db/local.sqlite "pragma user_version=3D 31;"

That should just work
pkg info should now show you your packages.

Best regards,
Bapt

--zjcmjzIkjQU2rmur
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlVTzDcACgkQ8kTtMUmk6EwRWACeMBHIBXCxETVekNhZumBvNocu
4vMAnAwy/nOiXms7+BX2kneY0HKCgfRt
=9wI1
-----END PGP SIGNATURE-----

--zjcmjzIkjQU2rmur--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150513221207.GE90457>