From owner-freebsd-current@FreeBSD.ORG Fri May 29 18:29:16 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F35310656A8 for ; Fri, 29 May 2009 18:29:16 +0000 (UTC) (envelope-from mat.macy@gmail.com) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.247]) by mx1.freebsd.org (Postfix) with ESMTP id 2204C8FC1B for ; Fri, 29 May 2009 18:29:16 +0000 (UTC) (envelope-from mat.macy@gmail.com) Received: by an-out-0708.google.com with SMTP id c3so3329865ana.13 for ; Fri, 29 May 2009 11:29:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to:cc :content-type:content-transfer-encoding; bh=ztgMErR+j0DaBNMvLIJLNDz4zyvkTmv3qRxv2EmGbWw=; b=B+kCnQ02APz/OYsb9fzcWdlURulL6zEhuudjT2QNIw6XozO9sXDoHfl/gii8zK1z68 cX3Ic4MZohAHnhfUDiawM6OTiJ9JdCEf3e9W8E0Xb3rflbqGpJrfvHcT7mWTvDWtBsqq ScmlEk87VOEN660y1tu8IN3GTZxV0NYtA9Kxk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=vZEQsF4F5Bi2wfAgOhQNOjttD2WNWT8P5QOwzOSRBHsCfrKwQNWezQhtOAjHdpoP0P Dtco8FBkEV0lQn08O4zzM75n4BAZOgObdZC9P4JUdjkaPLpxy+RE3cW0XxT9jC4bs13C VvE77p1/gHYhHb95PGyH+/SGRK8HE/NQCeWFc= MIME-Version: 1.0 Sender: mat.macy@gmail.com Received: by 10.100.255.7 with SMTP id c7mr2773066ani.137.1243621755439; Fri, 29 May 2009 11:29:15 -0700 (PDT) In-Reply-To: References: <3c1674c90905242253n544c3f0cqb10952f349391ce7@mail.gmail.com> <454b8cc37c60ab7af2663ba70ddbfd59.squirrel@webmail.lerctr.org> <5a9a181a12e9e4ef864d23ae063f7277.squirrel@webmail.lerctr.org> <3c1674c90905280055h740bce23p33b18fefacf31196@mail.gmail.com> Date: Fri, 29 May 2009 11:29:15 -0700 X-Google-Sender-Auth: 685255cdfed07643 Message-ID: <3c1674c90905291129h7bd6fb6ai6ab772e3aed624d@mail.gmail.com> From: Kip Macy To: Larry Rosenman Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: freebsd-current@freebsd.org Subject: Re: ZFS Crash X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 May 2009 18:29:17 -0000 I'm fairly certain I know what the problem is. The (de)compress functions allocate their own memory completely independently of the arc limits. The allocations are blocking so the system will try to page in attempt to provide the requested memory. Cheers, Kip On Fri, May 29, 2009 at 10:44 AM, Larry Rosenman wrote: > On Thu, 28 May 2009, Larry Rosenman wrote: > >> On Thu, 28 May 2009, Kip Macy wrote: >> >>> On Tue, May 26, 2009 at 5:04 AM, Larry Rosenman wrote: >>>> >>>> On Mon, 25 May 2009, Larry Rosenman wrote: >>>> >>>>> On Mon, 25 May 2009, Larry Rosenman wrote: >>>>> >>>>>> after looking at the code, never mind the "don't call doadump", so >>>>>> we'll >>>>>> get the textdump. >>>>>> >>>>>> Thanks rwatson for the textdump stuff! >>>>>> >>>>> Here is current stats before we crash. =A0Does any of this look total= ly >>>>> out of line? >>>>> >>>> It crashed again, but did *NOT* make it into ddb enough to do the >>>> textdump. >>>> >>>> It was hung with the backtrace (looks like the same, but I couldn't >>>> scroll the screen back). >>>> >>>> Ideas? >>>> >>>> I'm really concerned that there is a problem. >>>> >>>> >>>> >>> >>> >>> - Type of disks? >> >> 6 SATA Seagate 400GB (5) / 500 GB (1). >> >> >> ATA channel 0: >> =A0 Master: acd0 ATA/ATAPI revision 7 >> =A0 Slave: =A0 =A0 =A0 no device present >> ATA channel 2: >> =A0 Master: =A0ad4 SATA revision 2.x >> =A0 Slave: =A0 =A0 =A0 no device present >> ATA channel 3: >> =A0 Master: =A0ad6 SATA revision 2.x >> =A0 Slave: =A0 =A0 =A0 no device present >> ATA channel 4: >> =A0 Master: =A0ad8 SATA revision 2.x >> =A0 Slave: =A0 =A0 =A0 no device present >> ATA channel 5: >> =A0 Master: ad10 SATA revision 2.x >> =A0 Slave: =A0 =A0 =A0 no device present >> ATA channel 6: >> =A0 Master: ad12 SATA revision 2.x >> =A0 Slave: =A0 =A0 =A0 no device present >> ATA channel 7: >> =A0 Master: ad14 SATA revision 2.x >> =A0 Slave: =A0 =A0 =A0 no device present >>> >>> >>> - Size of zpools? >> >> All 6. >> >> =A0pool: vault >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> =A0 =A0 =A0 =A0corruption. =A0Applications may be affected. >> action: Restore the file in question if possible. =A0Otherwise restore t= he >> =A0 =A0 =A0 =A0entire pool from backup. >> =A0see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: none requested >> config: >> >> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM >> =A0 =A0 =A0 =A0vault =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0raidz1 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad6 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad8 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad10 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad12 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad14 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0ad4s1f =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0ad4s1e =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0ad4s1d =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> >> errors: 10 data errors, use '-v' for a list >> >> >> =A0pool: vault >> state: ONLINE >> status: One or more devices has experienced an error resulting in data >> =A0 =A0 =A0 =A0corruption. =A0Applications may be affected. >> action: Restore the file in question if possible. =A0Otherwise restore t= he >> =A0 =A0 =A0 =A0entire pool from backup. >> =A0see: http://www.sun.com/msg/ZFS-8000-8A >> scrub: none requested >> config: >> >> =A0 =A0 =A0 =A0NAME =A0 =A0 =A0 =A0STATE =A0 =A0 READ WRITE CKSUM >> =A0 =A0 =A0 =A0vault =A0 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0raidz1 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad6 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad8 =A0 =A0 ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad10 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad12 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0 =A0ad14 =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 = =A0 0 >> =A0 =A0 =A0 =A0 =A0ad4s1f =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0ad4s1e =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> =A0 =A0 =A0 =A0 =A0ad4s1d =A0 =A0ONLINE =A0 =A0 =A0 0 =A0 =A0 0 =A0 =A0 = 0 >> >> errors: Permanent errors have been detected in the following files: >> >> =A0 =A0 =A0 /usr/local/sbin/p4d >> =A0 =A0 =A0 /var/db/bacula/borg-dir.conmsg >> =A0 =A0 =A0 vault/usr/obj:<0x16c3a> >> =A0 =A0 =A0 vault/usr/obj:<0x169bb> >> =A0 =A0 =A0 /usr/obj/usr/src/lib/libc/random.o >> >>> >>> >>> - Compression enabled? >> >> Yes. >> >> >> > > Ok, it just crashed. =A0Unfortunately, I'm at work and the box is at home= . > > I did have my script running every minute of that entire boot. > > What I saw was a full backup running, and then we started paging, and the= n > the backup jobs got pager errors, and were killed. > > I'm not sure what else went on, so I restarted the bacula daemons that > got killed, and was in the bacula console when it died. > > I'll see if I can get a cell-phone camera shot of the console. > > I'll also tar up the vmstat outputs and put them on my web server. > > What other forensics should I get? =A0Bear in mind the system is probably > locked up with no dump taken :( > > > -- > Larry Rosenman =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 http://www.lerctr.= org/~ler > Phone: +1 512-248-2683 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 E-Mail: ler@lerctr= .org > US Mail: 430 Valona Loop, Round Rock, TX 78681-3893 > --=20 When bad men combine, the good must associate; else they will fall one by one, an unpitied sacrifice in a contemptible struggle. Edmund Burke