From owner-freebsd-current@FreeBSD.ORG Fri May 29 17:44:46 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9ABF8106564A; Fri, 29 May 2009 17:44:46 +0000 (UTC) (envelope-from ler@lerctr.org) Received: from thebighonker.lerctr.org (thebighonker.lerctr.org [192.147.25.65]) by mx1.freebsd.org (Postfix) with ESMTP id 65B738FC1C; Fri, 29 May 2009 17:44:46 +0000 (UTC) (envelope-from ler@lerctr.org) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=lerami; d=lerctr.org; h=Received:Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:User-Agent:MIME-Version:Content-Type:X-Spam-Score:X-LERCTR-Spam-Score:X-Spam-Report:X-LERCTR-Spam-Report:DomainKey-Status; b=rFV2e2x9taqZzJwUnM8ARR9Tf/Em5VFkmsaZS1r6oqfzag2jKrWbLcZqi3svOK9AVpF+9ljlUmwrgPIBjlqXuqWSoGCDquwylDN3YdQcuxGwGdPqWWJDL4mv4co7INoCMTczQm3A/4rodfBo4+gn/xBUJubGEF1iukOhKJncEMY=; Received: from thebighonker.lerctr.org ([192.147.25.65]:50427) by thebighonker.lerctr.org with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69 (FreeBSD)) (envelope-from ) id 1MA68H-000KH1-6x; Fri, 29 May 2009 12:44:46 -0500 Date: Fri, 29 May 2009 12:44:39 -0500 (CDT) From: Larry Rosenman To: Kip Macy In-Reply-To: Message-ID: References: <3c1674c90905242253n544c3f0cqb10952f349391ce7@mail.gmail.com> <454b8cc37c60ab7af2663ba70ddbfd59.squirrel@webmail.lerctr.org> <5a9a181a12e9e4ef864d23ae063f7277.squirrel@webmail.lerctr.org> <3c1674c90905280055h740bce23p33b18fefacf31196@mail.gmail.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Spam-Score: -4.3 (----) X-LERCTR-Spam-Score: -4.3 (----) X-Spam-Report: SpamScore (-4.3/5.0) ALL_TRUSTED=-1.8, BAYES_00=-2.599, SARE_SUB_OBFU_OTHER=0.135 X-LERCTR-Spam-Report: SpamScore (-4.3/5.0) ALL_TRUSTED=-1.8, BAYES_00=-2.599, SARE_SUB_OBFU_OTHER=0.135 DomainKey-Status: no signature Cc: freebsd-current@freebsd.org Subject: Re: ZFS Crash X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 May 2009 17:44:47 -0000 On Thu, 28 May 2009, Larry Rosenman wrote: > On Thu, 28 May 2009, Kip Macy wrote: > >> On Tue, May 26, 2009 at 5:04 AM, Larry Rosenman wrote: >>> On Mon, 25 May 2009, Larry Rosenman wrote: >>> >>>> On Mon, 25 May 2009, Larry Rosenman wrote: >>>> >>>>> after looking at the code, never mind the "don't call doadump", so we'll >>>>> get the textdump. >>>>> >>>>> Thanks rwatson for the textdump stuff! >>>>> >>>> Here is current stats before we crash. Does any of this look totally >>>> out of line? >>>> >>> It crashed again, but did *NOT* make it into ddb enough to do the >>> textdump. >>> >>> It was hung with the backtrace (looks like the same, but I couldn't >>> scroll the screen back). >>> >>> Ideas? >>> >>> I'm really concerned that there is a problem. >>> >>> >>> >> >> >> - Type of disks? > 6 SATA Seagate 400GB (5) / 500 GB (1). > > > ATA channel 0: > Master: acd0 ATA/ATAPI revision 7 > Slave: no device present > ATA channel 2: > Master: ad4 SATA revision 2.x > Slave: no device present > ATA channel 3: > Master: ad6 SATA revision 2.x > Slave: no device present > ATA channel 4: > Master: ad8 SATA revision 2.x > Slave: no device present > ATA channel 5: > Master: ad10 SATA revision 2.x > Slave: no device present > ATA channel 6: > Master: ad12 SATA revision 2.x > Slave: no device present > ATA channel 7: > Master: ad14 SATA revision 2.x > Slave: no device present >> >> >> - Size of zpools? > All 6. > > pool: vault > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vault ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > ad4s1f ONLINE 0 0 0 > ad4s1e ONLINE 0 0 0 > ad4s1d ONLINE 0 0 0 > > errors: 10 data errors, use '-v' for a list > > > pool: vault > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vault ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > ad4s1f ONLINE 0 0 0 > ad4s1e ONLINE 0 0 0 > ad4s1d ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > /usr/local/sbin/p4d > /var/db/bacula/borg-dir.conmsg > vault/usr/obj:<0x16c3a> > vault/usr/obj:<0x169bb> > /usr/obj/usr/src/lib/libc/random.o > >> >> >> - Compression enabled? > Yes. > > > Ok, it just crashed. Unfortunately, I'm at work and the box is at home. I did have my script running every minute of that entire boot. What I saw was a full backup running, and then we started paging, and then the backup jobs got pager errors, and were killed. I'm not sure what else went on, so I restarted the bacula daemons that got killed, and was in the bacula console when it died. I'll see if I can get a cell-phone camera shot of the console. I'll also tar up the vmstat outputs and put them on my web server. What other forensics should I get? Bear in mind the system is probably locked up with no dump taken :( -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler@lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893