From owner-freebsd-fs@FreeBSD.ORG Fri Aug 7 13:46:14 2009 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC7B9106566C for ; Fri, 7 Aug 2009 13:46:14 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: from acm.poly.edu (acm.poly.edu [128.238.9.200]) by mx1.freebsd.org (Postfix) with ESMTP id 889048FC20 for ; Fri, 7 Aug 2009 13:46:14 +0000 (UTC) Received: (qmail 48517 invoked from network); 7 Aug 2009 13:46:13 -0000 Received: from unknown (HELO ?10.0.0.135?) (spawk@128.238.64.31) by acm.poly.edu with AES256-SHA encrypted SMTP; 7 Aug 2009 13:46:13 -0000 Message-ID: <4A7C3002.8000003@acm.poly.edu> Date: Fri, 07 Aug 2009 09:45:38 -0400 From: Boris Kochergin User-Agent: Thunderbird 2.0.0.19 (X11/20090108) MIME-Version: 1.0 To: Pawel Jakub Dawidek References: <4A78AA71.9050107@acm.poly.edu> <4A78AFB2.10103@acm.poly.edu> <20090805115621.GG1784@garage.freebsd.pl> <4A798A12.4070408@acm.poly.edu> <20090807073738.GA1607@garage.freebsd.pl> <20090807074400.GB1607@garage.freebsd.pl> In-Reply-To: <20090807074400.GB1607@garage.freebsd.pl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: ZFS RAID-Z panic on vdev failure + subsequent panics and hangs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Aug 2009 13:46:15 -0000 Pawel Jakub Dawidek wrote: > On Fri, Aug 07, 2009 at 09:37:38AM +0200, Pawel Jakub Dawidek wrote: > >> On Wed, Aug 05, 2009 at 09:33:06AM -0400, Boris Kochergin wrote: >> >>> Fatal trap 12: page fault while in kernel mode >>> fault virtual address = 0xffffffffffffffe9 >>> fault code = supervisor read data, page not present >>> instruction pointer = 0x20:0xffffffff8103a9e7 >>> stack pointer = 0x28:0xffffff8077f26430 >>> frame pointer = 0x28:0xffffff8077f26500 >>> code segment = base 0x0, limit 0xfffff, type 0x1b >>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>> processor eflags = interrupt enabled, resume, IOPL = 0 >>> current process = 972 (cp) >>> >> [...] >> >>> /usr/src/sys/amd64/amd64/trap.c:494 >>> #11 0xffffffff80854d73 in calltrap () at >>> /usr/src/sys/amd64/amd64/exception.S:224 >>> #12 0xffffffff8103a9e7 in arc_evict (state=Variable "state" is not >>> available. >>> ) at >>> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1489 >>> >> Could you tell me what do you have at this line in your source? I don't >> think you use HEAD... What exact FreeBSD version are you using? >> > > You already gave version number in your first mail, sorry about that. > 8.0-BETA2 should be very close to HEAD (or it actually was HEAD), so I > guess this is the code we are looking at: > > 1488: /* "lookahead" for better eviction candidate */ > 1489: if (recycle && ab->b_size != bytes && > 1490: ab_prev && ab_prev->b_size == bytes) > 1491: continue; > > If 'ab' is corrupted it should panic earlier, so it seems ab_prev is > corrupted, can you see what it points to in gdb? > > Yeah, that's what the code looks like. For convenience, I've put the source tree the system was built using up at: http://acm.poly.edu/~spawk/src/ Maybe my kgdb chops aren't up to par, but I can't seem to see what ab_prev points to: (kgdb) up #12 0xffffffff8103a9e7 in arc_evict (state=Variable "state" is not available. ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1489 1489 if (recycle && ab->b_size != bytes && Current language: auto; currently c (kgdb) list 1484 LBOLT - ab->b_arc_access < arc_min_prefetch_lifespan)) { 1485 skipped++; 1486 continue; 1487 } 1488 /* "lookahead" for better eviction candidate */ 1489 if (recycle && ab->b_size != bytes && 1490 ab_prev && ab_prev->b_size == bytes) 1491 continue; 1492 hash_lock = HDR_LOCK(ab); 1493 have_lock = MUTEX_HELD(hash_lock); (kgdb) print ab $13 = (arc_buf_hdr_t *) 0xffffff0003ebc410 (kgdb) print ab->b_size $14 = 1 (kgdb) print bytes $15 = 16384 (kgdb) print ab_prev No symbol "ab_prev" in current context. -Boris