Date: Tue, 22 Feb 2005 11:57:50 -0800 From: Sean Chittenden <sean@gigave.com> To: Scott Long <scottl@samsco.org> Cc: amd64@freebsd.org Subject: Re: Crash dumps not working correctly for amd64? Message-ID: <20050222195750.GA40969@sean.gigave.com> In-Reply-To: <421ACBFF.6030606@samsco.org> References: <20050221211056.GC826@sean.gigave.com> <421ACBFF.6030606@samsco.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--gKMricLos+KVdGMg Content-Type: text/plain; charset=us-ascii Content-Disposition: inline > >Howdy. I've got myself an interesting situation. It seems as though > >amd64 is unable to collect crash dumps via savecore(8). Has anyone > >else seen this? From dmesg(1): > > > >Checking for core dump on /dev/da0s1b ... > >savecore: first and last dump headers disagree on /dev/da0s1b > >Feb 21 12:45:59 host savecore: first and last dump headers disagree on > >/dev/da0s1b > >savecore: unsaved dumps found but not saved > > > >??? sys/amd64/amd64/dump_machdep.c and sys/i386/i386/dump_machdep.c > >are essentially identical. I'm not familiar enough with these > >innards, but reviewing savecore(8) didn't point out anything obvious. > >I'm dumping onto a twa(4) controller. > > > >Are there any known workarounds to get this info? I'm tempted to turn > >off swap in fstab(5) that way the next time the machine comes up after > >a crash, it'll still have the dump in tact and could poke at it as > >time permitted. Other suggestions? -sc > > Can you modify savecore to dump the headers anyways so they can be > inspected? Yup... yikes! This is far from good or correct. Hrm... I'm at a loss as to the reason, however. It's like the last dump header is never overwritten in the dump process and is massively stale. I've made some changes to savecore(8) (can someone give me approval to commit these?). The resulting output is below using the new format/verbose output: # savecore -vf bounds number: 9 checking for kernel dump on device /dev/da0s1b mediasize = 3221225472 sectorsize = 512 magic mismatch on last dump header on /dev/da0s1b forcing magic on /dev/da0s1b savecore: first and last dump headers disagree on /dev/da0s1b savecore: reboot after panic: vrele: negative ref cnt Checking for available free space Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 16777216 Dump Length: 2146631680B (2047 MB) Blocksize: 512 Dumptime: Thu Dec 16 03:06:24 2004 Hostname: nfs.example.com Magic: FreeBSD Kernel Dump Version String: FreeBSD 5.3-STABLE #1: Wed Dec 8 22:20:38 PST 2004 root@nfs.example.com:/usr/obj/usr/src/sys/NFS Panic String: vrele: negative ref cnt Dump Parity: 1999448632 Bounds: 9 Dump Status: bad savecore: writing core to vmcore.9 2146631680 If you run it w/ two -v's, you get the first and last header info: # savecore -vvf bounds number: 10 checking for kernel dump on device /dev/da0s1b mediasize = 3221225472 sectorsize = 512 magic mismatch on last dump header on /dev/da0s1b forcing magic on /dev/da0s1b First dump headers: Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 16777216 Dump Length: 2146631680B (2047 MB) Blocksize: 512 Dumptime: Mon Feb 21 19:12:48 2005 Hostname: www.example.com Magic: FreeBSD Kernel Dump Version String: FreeBSD 5.3-STABLE #0: Wed Feb 16 21:42:19 PST 2005 root@www.example.com:/usr/obj/usr/src/sys/WWW Panic String: page fault Dump Parity: 1475841892 Bounds: 10 Dump Status: unknown Last dump headers: Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 16777216 Dump Length: 2146631680B (2047 MB) Blocksize: 512 Dumptime: Thu Dec 16 03:06:24 2004 Hostname: nfs.example.com Magic: FreeBSD Kernel Dump Version String: FreeBSD 5.3-STABLE #1: Wed Dec 8 22:20:38 PST 2004 root@nfs.example.com:/usr/obj/usr/src/sys/NFS Panic String: vrele: negative ref cnt Dump Parity: 1999448632 Bounds: 10 Dump Status: unknown savecore: first and last dump headers disagree on /dev/da0s1b savecore: reboot after panic: vrele: negative ref cnt Checking for available free space Dump header from device /dev/da0s1b Architecture: amd64 Architecture Version: 16777216 Dump Length: 2146631680B (2047 MB) Blocksize: 512 Dumptime: Thu Dec 16 03:06:24 2004 Hostname: nfs.example.com Magic: FreeBSD Kernel Dump Version String: FreeBSD 5.3-STABLE #1: Wed Dec 8 22:20:38 PST 2004 root@nfs.example.com:/usr/obj/usr/src/sys/NFS Panic String: vrele: negative ref cnt Dump Parity: 1999448632 Bounds: 10 Dump Status: bad savecore: writing core to vmcore.10 2146631680 Why there are different dump header values, I'm not sure. But, with the -f option, you can get a core regardless of the state of the dump and its header values. That doesn't mean you get a good dump, but you at least get something. It looks like my core dumps are hosed or swapon(8) has clobbered some data on the image such that kgdb(1) can't extract anything useful. kgdb: core file: vmcore.0 kgdb: kernel image: /boot/kernel/kernel kgdb: cannot read KPML4phys Ah well, savecore_flags="-vvf" should do the trick next time this box dumps. -sc -- Sean Chittenden --gKMricLos+KVdGMg Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="savecore.patch" Index: savecore.8 =================================================================== RCS file: /home/ncvs/src/sbin/savecore/savecore.8,v retrieving revision 1.22 diff -u -r1.22 savecore.8 --- savecore.8 18 Jan 2005 10:09:37 -0000 1.22 +++ savecore.8 22 Feb 2005 19:24:27 -0000 @@ -71,11 +71,13 @@ .Nm will ignore it. .It Fl f -Force a dump to be taken even if the dump was cleared. +Force a dump to be taken even if either the dump was cleared or if the +dump header information is inconsistent. .It Fl k Do not clear the dump after saving it. .It Fl v Print out some additional debugging information. +Speicify twice for more information. .It Fl z Compress the core dump and kernel (see .Xr gzip 1 ) . Index: savecore.c =================================================================== RCS file: /home/ncvs/src/sbin/savecore/savecore.c,v retrieving revision 1.71 diff -u -r1.71 savecore.c --- savecore.c 10 Feb 2005 09:19:33 -0000 1.71 +++ savecore.c 22 Feb 2005 19:41:53 -0000 @@ -88,6 +88,10 @@ /* The size of the buffer used for I/O. */ #define BUFFERSIZE (1024*1024) +#define STATUS_BAD 0 +#define STATUS_GOOD 1 +#define STATUS_UNKNOWN 2 + static int checkfor, compress, clear, force, keep, verbose; /* flags */ static int nfound, nsaved, nerr; /* statistics */ @@ -95,25 +99,39 @@ static void printheader(FILE *f, const struct kerneldumpheader *h, const char *device, - int bounds) + int bounds, const int status) { uint64_t dumplen; time_t t; + const char *stat_str; - fprintf(f, "Good dump found on device %s\n", device); + fprintf(f, "Dump header from device %s\n", device); fprintf(f, " Architecture: %s\n", h->architecture); - fprintf(f, " Architecture version: %d\n", - dtoh32(h->architectureversion)); + fprintf(f, " Architecture Version: %u\n", h->architectureversion); dumplen = dtoh64(h->dumplength); - fprintf(f, " Dump length: %lldB (%lld MB)\n", (long long)dumplen, + fprintf(f, " Dump Length: %lldB (%lld MB)\n", (long long)dumplen, (long long)(dumplen >> 20)); fprintf(f, " Blocksize: %d\n", dtoh32(h->blocksize)); t = dtoh64(h->dumptime); fprintf(f, " Dumptime: %s", ctime(&t)); fprintf(f, " Hostname: %s\n", h->hostname); - fprintf(f, " Versionstring: %s", h->versionstring); - fprintf(f, " Panicstring: %s\n", h->panicstring); + fprintf(f, " Magic: %s\n", h->magic); + fprintf(f, " Version String: %s", h->versionstring); + fprintf(f, " Panic String: %s\n", h->panicstring); + fprintf(f, " Dump Parity: %u\n", h->parity); fprintf(f, " Bounds: %d\n", bounds); + + switch(status) { + case STATUS_BAD: + stat_str = "bad"; + break; + case STATUS_GOOD: + stat_str = "good"; + break; + default: + stat_str = "unknown"; + } + fprintf(f, " Dump Status: %s\n", stat_str); fflush(f); } @@ -214,12 +232,14 @@ FILE *info, *fp; int fd, fdinfo, error, wl; int nr, nw, hs, he = 0; - int bounds; + int bounds, status; u_int sectorsize; mode_t oumask; + bounds = getbounds(); dmpcnt = 0; mediasize = 0; + status = STATUS_UNKNOWN; if (buf == NULL) { buf = malloc(BUFFERSIZE); @@ -266,6 +286,7 @@ printf("magic mismatch on last dump header on %s\n", device); + status = STATUS_BAD; if (force == 0) goto closefd; @@ -284,7 +305,10 @@ syslog(LOG_ERR, "unknown version (%d) in last dump header on %s", dtoh32(kdhl.version), device); - goto closefd; + + status = STATUS_BAD; + if (force == 0) + goto closefd; } nfound++; @@ -295,7 +319,9 @@ syslog(LOG_ERR, "parity error on last dump header on %s", device); nerr++; - goto closefd; + status = STATUS_BAD; + if (force == 0) + goto closefd; } dumpsize = dtoh64(kdhl.dumplength); firsthd = lasthd - dumpsize - sizeof kdhf; @@ -308,11 +334,25 @@ nerr++; goto closefd; } + + if (verbose >= 2) { + printf("First dump headers:\n"); + printheader(stdout, &kdhf, device, bounds, -1); + + printf("\nLast dump headers:\n"); + printheader(stdout, &kdhl, device, bounds, -1); + printf("\n"); + } + if (memcmp(&kdhl, &kdhf, sizeof kdhl)) { syslog(LOG_ERR, "first and last dump headers disagree on %s", device); nerr++; - goto closefd; + status = STATUS_BAD; + if (force == 0) + goto closefd; + } else { + status = STATUS_GOOD; } if (checkfor) { @@ -333,12 +373,10 @@ goto closefd; } - bounds = getbounds(); - sprintf(buf, "info.%d", bounds); /* - * Create or overwrite any existing files. + * Create or overwrite any existing dump header files. */ fdinfo = open(buf, O_WRONLY | O_CREAT | O_TRUNC, 0600); if (fdinfo < 0) { @@ -365,9 +403,9 @@ info = fdopen(fdinfo, "w"); if (verbose) - printheader(stdout, &kdhl, device, bounds); + printheader(stdout, &kdhl, device, bounds, status); - printheader(info, &kdhl, device, bounds); + printheader(info, &kdhl, device, bounds, status); fclose(info); syslog(LOG_NOTICE, "writing %score to %s", @@ -492,6 +530,9 @@ struct fstab *fsp; char *savedir; + checkfor = compress = clear = force = keep = verbose = 0; + nfound = nsaved = nerr = 0; + openlog("savecore", LOG_PERROR, LOG_DAEMON); savedir = strdup("."); @@ -511,7 +552,7 @@ keep = 1; break; case 'v': - verbose = 1; + verbose++; break; case 'f': force = 1; --gKMricLos+KVdGMg--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050222195750.GA40969>