Date: Thu, 14 May 2015 00:12:07 +0200 From: Baptiste Daroussin <bapt@freebsd.org> To: Scott Bennett <bennett@sdf.org> Cc: bsd-lists@bsdforge.com, freebsd-ports@freebsd.org Subject: Re: damage to pkg's sqlite data base Message-ID: <20150513221207.GE90457@ivaldir.etoilebsd.net> In-Reply-To: <201505132152.t4DLqbLK000601@sdf.org> References: <201505120617.t4C6HkA3019096@sdf.org> <0dcbdcb9fdfdebc53a2563b11494f61b@ultimatedns.net> <201505130512.t4D5Cpqa011080@sdf.org> <5f1749d3e664fbb487aa98f160e6cf92@ultimatedns.net> <201505130720.t4D7KtMd016553@sdf.org> <a65a967965bd406b674ca618265d8b22@ultimatedns.net> <20150513145004.GA90457@ivaldir.etoilebsd.net> <201505132152.t4DLqbLK000601@sdf.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--zjcmjzIkjQU2rmur Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 13, 2015 at 04:52:37PM -0500, Scott Bennett wrote: > I'm short on time right now, so I'll defer my replies to Andrew Clark > and Chris H until late tonight after I (hope to) have had a chance to try > out Andrew's suggested procedure. I do have sqlite installed, though (see > far below). >=20 > Baptiste Daroussin <bapt@freebsd.org> wrote: > > On Wed, May 13, 2015 at 06:48:12AM -0700, Chris H wrote: > > > On Wed, 13 May 2015 02:20:55 -0500 Scott Bennett <bennett@sdf.org> wr= ote > > >=20 > > > > "Chris H" <bsd-lists@bsdforge.com> wrote: > > > > > On Wed, 13 May 2015 00:12:51 -0500 Scott Bennett <bennett@sdf.org= > wrote > > > > > > "Chris H" <bsd-lists@bsdforge.com> wrote: > > > > > > > On Tue, 12 May 2015 01:17:46 -0500 Scott Bennett <bennett@sdf= =2Eorg> > > > > > > wrote > > > > > > > > > For nearly two weeks I've been stymied by an apparently dam= aged > > > > > > record > > in the sqlite data base used by pkg(8) and pkg-stati= c(8).=20 > > > > > > Unfortunately, > > it is a record for a port that is depended u= pon rather > > > > > > heavily, lang/gcc. > > lang/gcc compiled and linked just fine, = but any > > > > > > attempt to install the > > result ends up like this. > > > > > > > >=20 > > > > > > > > =3D=3D=3D> Checking if gcc already installed > > > > > > > > =3D=3D=3D> Registering installation for gcc-4.8.4_3 > > > > > > > > Installing gcc-4.8.4_3... > > > > > > > > pkg-static: sqlite error while executing iterator in file > > > > > > > > pkgdb_iterator.c:931: database disk image is malformed pkg-= static: > > > > > > sqlite > > error while executing INSERT OR REPLACE INTO files (= path, > > > > > > sha256, > > package_id) VALUES (?1, ?2, ?3) in file pkgdb.c:172= 2: > > > > > > database disk image > > is malformed *** Error code 70 > > > > > > > >=20 > > > > > > > > Stop. > > > > > > > > make: stopped in /usr/ports/lang/gcc > > > > > > > >=20 > > > > > > > > I'm not familiar with sqlite commands, nor with the st= ructure of > > > > > > the > > pkg data base, nor any of several other relevant topics= =2E But I > > > > > > do have=20 > > > > > > > > If someone knows sqlite well enough to look at the error messa= ges above > > > > and then give me directions to delete the offending records, then p= erhaps the > > > > next attempt to install lang/gcc might actually work. Anyone? > > > >=20 > > > > > > > > questions: > > > > > > > >=20 > > > > > > > > 1) how might the damage have happened? > > > > > > > >=20 > > > > > > > > 2) how might such an event be avoided/prevented in the = future? > > > > > > > >=20 > > > > > > > > 3) most urgently, how can I *fix* it? > > > > > > > You should get a backup made on the nightly cron(8) (periodic= (8)). > > > > > > > It'll be located in /var/backups > > > > > > > Off the top of my head I can't remember which of the 2, is wh= ich. > > > > > > > But a simple size comparison should tell the tale. :) > > > > > >=20 > > > > > > The problem began much farther back than two days ago, but= I should > > > > > > be able to restore it from two weeks or more in the past and th= en find > > > > > > the most recent version that is not corrupted. > > > > > >=20 > > > > > > > Simply rename your (now) corrupt db, and copy the backup over. > > > > > >=20 > > > > > > However, if I do that, then what happens to all the ports = that have > > > > > > been updated or added since that version of the data base was b= acked up? > > > > > > I have run "portmaster -a" (with some additional options) quite= a few > > > > > > times since the lang/gcc problem first appeared, so an old loca= l.sqlite > > > > > > will no longer accurately reflect what is currently installed. > > > > > > > > > > > > > > I ran into this problem as well, and this was how I was final= ly > > > > > > > able to overcome it. > > > > > > > > > > > > > > Hope this helps! > > > > > > > > > > > > > > --Chris > > > > > > > > > > > > > > >=20 > > > > > > > > 4) I was unable to find any instructions for recreating= a pkg > > > > > > data > > base if the data base gets damaged/destroyed. Is = there a > > > > > > way to > > do that that I missed? > > > > > >=20 > > > > > > There must be a way to do this, right? I mean, really, it= 's pretty > > > > > > fundamental that no new data base be put into production withou= t a way to > > > > > > rebuild it. The FreeBSD developers haven't really broken so an= cient and > > > > > > basic a principle, have they? So what's the trick? What is th= e method > > > > > > to rebuild /var/db/pkg/local.sqlite from scratch based upon the= currently > > > > > > installed ports/packages? > > > > > I whined about it the first time my DB blew up. It's become > > > > > corrupted several times since on different boxes/versions. *but* > > > > > after the first time, I made it a habit of making a copy of it *b= efore* > > > > > embarking on an upgrade, or install of any ports. Seems it's the = only > > > > > way to save yourself. pkg(8), to the best of my knowledge doesn't > > > > > perform a backup prior to any of this. So I *do*. > > > > > Annoying, but it beats attempting to recover from a failed port > > > > > upgrade/install. > > > >=20 > > > > Chris, thanks for your responses, depressing though they be. > > > > I have been using portmaster for ages to update my installed p= orts > > > > and usually to install them in the first place. Does your suggesti= on > > > > mean abandoning portmaster? > > > No. I'm an avid user of portmaster(8). In fact I intend to be it's > > > maintainer. :) > > > > If it is necessary to back local.sqlite up > > > > before each installation, that would seem to imply that portmaster = would > > > > only be useful in getting a list of ports to be updated, after whic= h each > > > > port would have to be updated individually, so that local.sqlite co= uld be > > > > backed up before each one. I have over 1900 ports installed, so th= at would > > > > mean just never updating my ports at all, except by rebuilding all = of them > > > > upon upgrading to a new major release of FreeBSD. > > > I > > > > Bapt, oh, Bapt, whatever were you folks thinking? Did you rea= lly > > > > leave us no practical recovery method? > > > In Bapt's humble defense. It's difficult, to impossible for him to fix > > > a problem w/o decent clues. So I wouldn't go pointing any fingers, ju= st > > > yet. :) In other words; w/o a cause, what should he fix? :) > > > What recovery would you suggest? Truth is, if you brushed up on sqlit= e3 > > > you could create a fresh DB, using output from portmaster(8). Trouble= some, > > > perhaps. But at least an alternative. >=20 > Chris? You already responded to my question 4) above, which answers > your question. Why are you now asking this? Any data base intended for > production use MUST be accompanied by a method of recreating it before > the data base goes into production. This has been the case as long as > data bases have been used. The Wallendas may walk without nets between > skyscraper roofs, but I assure you that a) not all of them arrive at the > intended destinations and b) normal mortals have a different perspective > on that sort of risk. > Further, if portmaster produces that output *from the corrupted data > base*, then it stands to reason that other cases of corruption that diffe= red > in particulars from this one might prevent portmaster from providing said > output. >=20 > > > >=20 > > Coming very late to the discussion I would like to understand what lead= you to a > > broken database, from what I am aware of the only way to end up with a = broken db > > is using pkg over nfs without rpc lock properly running (note that the = default > > > > nfs client service on freebsd does not start rpc lock...) > > > > pkg work around this case by detecting the filesystem it is running on = and > > changing the locking system it uses when running on network filesystem = so this > > problem cannot in theory happen anymore even if one forgot to start rpc= lockd. > > > > It remains however one case where this can happen: it is one the host i= s running > > over nfs and it starts a jail, in that case, the jail does not know it = is > > running over nfs so pkg cannot know it needs to switch the locking syst= em. > > > > Are you in one of those situation, if not can you describe a a bit more= your > > system? >=20 > FreeBSD hellas 10.1-STABLE FreeBSD 10.1-STABLE #54 r282368: Sun May 3 15= :48:37 CDT 2015 bennett@hellas:/usr/obj/usr/src/sys/hellas amd64 > I have only one machine running these days, so there is no NFS. At > present, I have no jails set up. > Here are data concerning the relevant drive, which is the boot drive. > There are many other file systems on this machine, but they all involve > external devices and are unrelated to this issue, except for /buildwork, > where WRKDIRPREFIX=3D/buildwork/ports and CCACHE_DIR=3D/buildwork/ccache.= The > same sort of information for /buildwork is included below. >=20 > /dev/mirror/fbsds1a on / (ufs, local) > devfs on /dev (devfs, local, multilabel) > /dev/mirror/fbsds1d on /var (ufs, local) > /dev/mirror/fbsds1e on /usr (ufs, local, soft-updates) > /dev/mirror/fbsds1f on /usr/local (ufs, local) > /dev/mirror/fbsds3a on /usr/home (ufs, local) > /dev/mirror/fbsds3d on /usr/ports (ufs, local, soft-updates) > /dev/ufs/stripework on /buildwork (ufs, local, soft-updates) >=20 > mirror/fbsds1 COMPLETE ada0s1 (ACTIVE) > mirror/fbsds3a COMPLETE ada0s3a (ACTIVE) > mirror/fbsds3d COMPLETE ada0s3d (ACTIVE) > ufs/stripework N/A stripe/5x15g1 > stripe/5x15g1 UP da0p5 > da2p5 > da3p5 > da5p5 > da6p5 >=20 > Note that the mirrors listed above have only one device in each. At one > point a few weeks ago, there were two devices in each for a bit under > eight hours until the brand-new second drive failed. Unfortunately, it > turns out that the manufacturer does not provide any support for its > internal hard drive products, other than replacement under warranty via > a web page that requires an unsecured web browser to apply for the > replacement. Grrr. Until I get that straightened out and have a second > drive, all of the mirrors listed above on the boot drive are unreplicated. > > > > Now by default pkg creates back up of the db daily to allow recovery, t= here was > > an issue before pkg 1.5 which requires a manual intervention and fixed = after pkg > > 1.5 so it is easier to recover. >=20 > One serious problem for recovery is that I don't know when the > corruption actually occurred. All I can tell you is that it must have > happened no earlier than the previous update to lang/gcc (lang/gcc48). > The error messages first appeared with the update to gcc-4.8.4_3. > > > > sqlite is a pretty solid db I would like to understand how you ended up= with a > > corrupted db if not what I described above. >=20 > So would I because I would seriously like it not to happen again. > Long ago, I used to use portupgrade, but I eventually switched to > portmaster for three reasons: speed (ruby is awfully slow), dougb@ > massively reworked portmaster and made it work much more correctly > than portupgrade, and portupgrade's frustrating habit of always deciding > that the package data base had to be rebuilt from nothing, which always > took a long time and failed as often as it succeeded. Because portmaster > now has to use a data base instead of direct evidence, it seems that that > fourth complaint has partly returned to haunt us. > sqlite3-3.8.9_1 is what is installed on my system. I have no idea > which version is incorporated into pkg or pkg-static. >=20 pkg incorporates a sqlite3 3.8.9 version but that is not a problem as all is compatible. if your date were corrupted during the last backups then the backups will be empty. Try xzcat /var/backups/pkg.sql.xz | head -1 if the output is: pragma user_version=3D 31; then via command line: unxz /var/backup/pkg.sql.xz | sqlite3 /var/db/local.sqlite If not then unxz /var/backup/pkg.sql.xz | sqlite3 /var/db/local.sqlite sqlite3 /var/db/local.sqlite "pragma user_version=3D 31;" That should just work pkg info should now show you your packages. Best regards, Bapt --zjcmjzIkjQU2rmur Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlVTzDcACgkQ8kTtMUmk6EwRWACeMBHIBXCxETVekNhZumBvNocu 4vMAnAwy/nOiXms7+BX2kneY0HKCgfRt =9wI1 -----END PGP SIGNATURE----- --zjcmjzIkjQU2rmur--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150513221207.GE90457>